GB2506593A

GB2506593A - Adaptive post-filtering of reconstructed image data in a video encoder

Info

Publication number: GB2506593A
Application number: GB201217456A
Authority: GB
Inventors: Sebastien Lasserre; Fabrice Le Leannec; Christophe Gisquet
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-09-28
Filing date: 2012-09-28
Publication date: 2014-04-09
Anticipated expiration: 2032-09-28
Also published as: GB2506593B; GB201217456D0

Abstract

Disclosed is a method of encoding image data in which adaptive post-filtering is applied to a reconstructed image. The reconstructed (or rough) image is obtained by first encoding pixel values in an original image into coefficients (such as in a Discrete Cosine Transform, DCT encoder) and then subjecting the coded image data to an inverse transform to obtain the rough decoded image. This rough decoded image is then processed through an adaptive post filter that is adjustable depending on an input parameter derived based on values of the original image. In one embodiment the post filtering is adjustable as a function of an input parameter that depends on a frame merit. The post filter may be a de-blocking filter (DBF), a sample adaptive offset filter (SAO) or an adaptive loop filter (ALF) and the input parameter may be a rate-distortion slope or a quantisation parameter. The filter parameters may be added to the encoded bit-stream to be used by an adaptive post-filter in a HEVC (High Efficiency Video Coding) decoder.

Description

METHODS FOR ENCODING IMAGE DATA, METHODS FOR DECODING

IMAGE DATA AND METHODS FOR POST-FILTERING RECONSTRUCTED IMAGE

DATA, AND CORRESPONDING DEVICES

FIELD OF THE INVENTION

The present invention concerns methods for segmenting and encoding an image comprising blocks of pixels, and associated devices.

The invention is particularly useful for the encoding of digital video sequences made of images or "frames".

BACKGROUND OF THE INVENTION

Video compression algorithms, such as those standardized by the standardization organizations ITU, ISO, and SMPTE, exploit the spatial and temporal redundancies of images in order to generate bit-streams of data of smaller size than original video sequences. These powerful video compression tools, known as spatial (or intra) and temporal (or inter) predictions, make the transmission and/or the storage of video sequences more efficient.

Video encoders and/or decoders (codecs) are often embedded in portable devices with limited resources, such as cameras or camcorders. Conventional embedded codecs can process at best high definition (HD) digital videos, i.e lOBOxi 920 pixel frames.

Real time encoding is however limited by the limited resources of the portable devices, especially regarding slow access to the working memory (e.g. random access memory, or RAM) and regarding the central processing unit (CPU).

This is particularly striking for the encoding of ultra-high definition (UHD) digital videos that are about to be handled by the latest cameras. This is because the amount of pixel data to encode or to consider for spatial or temporal prediction is huge.

UHD is typically four times (4k2k pixels) the definition of an HD video which is the current standard definition video. Furthermore, very ultra high definition, which is sixteen times that definition (Le. 8k4k pixels), is even being considered in a more long-term future.

SUMMARY OF THE INVENTION

Faced with these encoding constraints in terms of limited power and memory access bandwidth, the inventors provide a UHD codec with low complexity based on scalable encoding.

Basically, the UHD video is encoded into a base layer and one or more enhancement layers.

The base layer results from the encoding of a reduced version of the UHD images, in particular having a HD resolution, with a standard existing codec (e.g. H.264 or HEVC -High Efficiency Video Coding). As stated above, the compression efficiency of such a codec relies on spatial and temporal predictions.

Further to the encoding of the base layer, an enhancement image is obtained from subtracting an interpolated (or up-scaled) decoded image of the base layer from the corresponding original UHD image. The enhancement images, which are residuals or pixel differences with UHD resolution, are then encoded into an enhancement layer.

Figure 1 illustrates such approach at the encoder 10.

An input raw video 11, in particular a UHD video, is down-sampled 12 to obtain a so-called base layer, for example with HD resolution, which is encoded by a standard base video coder 13, for instance H.264/AVC or HEVC. This results in a base layer bit stream 14.

To generate the enhancement layer, the encoded base layer is decoded 15 and up-sampled 16 into the initial resolution (UHD in the example) to obtain the up-sampled decoded base layer.

The latter is then subtracted 17, in the pixel domain, from the original raw video to get the residual enhancement layer X. The information contained in X is the error or pixel difference due to the base layer encoding and the up-sampling. It is also known as a "residuaf'.

A conventional block division is then applied, for instance a homogenous 8x8 block division (but other divisions with non-constant block size are also possible).

Next, a DCI transform 18 is applied to each block to generate DCI blocks forming the DCT image XDCT having the initial UHD resolution.

This DCT image XDCT is encoded in XçQ by an enhancement video encoding module 19 into an enhancement layer bit stream 20.

As visible on Figure 1, the encoded OCT image is also decoded and inverse transformed 25 to obtain the decoded residual image in the pixel domain (also computed at the decoder). This decoded residual image is summed 26 with the up-sampled decoded base layer image in order to obtain the rough enhanced version of the image.

Adaptive post filtering is then applied to this rough decoded image such that the post-filtered decoded image is as close as possible to the original image (raw video). In practice, the filters are for instance selected to minimize a rate-distortion cost.

Parameters of the applied post-filters (for instance a deblocking filter, a sample adaptive offset filter and an adaptive loop filter, as described in more detail below with reference to Figure 17) are thus adjusted to obtain a post-filtered decoded image as close as possible to the raw video and the post-filter parameters thus determined are sent to the decoder in a dedicated bit stream 22.

It may be noted that the resulting image (post-filtered decoded image) is the reference image to be used in the encoding loop of systems using temporal prediction as it is the representation eventually used at the decoder as explained below.

The encoded bit-stream EBS resulting from the encoding of the raw video 11 is made of: -the base layer bit-stream 14 produced by the base video encoder 13; -the enhancement layer bit-stream 20 encoded by the enhancement video encoder 19; and -parameters 21, 22 determined and used by the enhancement video encoder.

Examples of those parameters are given here below.

Figure 2 illustrates the associated processing at the decoder 30 receiving the encoded bit-stream EBS.

Part of the processing consists in decoding the base layer bit-stream 14 by the standard base video decoder 31 to produce a decoded base layer. This decoded base layer is up-sampled 32 into the initial resolution, i.e. UHD resolution.

In another part of the processing, both the enhancement layer bit-stream 20 and the parameters 21 are used by the enhancement video decoding module 33 to generate a dequantized OCT image The image XQDt is the result of the quantization and then the inverse quantization on the image X,,1.

An inverse OCT transform 34 is then applied to each block of the image X to obtain the decoded residual XçQI (of UHD resolution) in the pixel domain.

This decoded residual X,)CQI is added 35 to the up-sampled decoded base layer to obtain decoded images of the video.

Filter post-processing 36, for instance with a deblocking filter, a sample adaptive offset filter and an adaptive loop filter as described below, is finally applied to obtain the decoded video 37 which is output by the decoder 30.

Reducing UI-ID encoding complexity relies on simplifying the encoding of the enhancement images at the enhancement video encoding module 19 compared to the conventional encoding scheme.

To that end, the inventors dispense with the temporal prediction and possibly the spatial prediction when encoding the UHO enhancement images. This is because the temporal prediction is very expensive in terms of memory bandwidth consumption, since it often requires accessing other enhancement images.

While this simplification reduces by 80% the slow memory random access bandwidth consumption during the encoding process, not using those powerful video compression tools may deteriorate the compression efficiency, compared to the conventional standards.

In this respect, the inventors have developed several additional tools for increasing the efficiency of the encoding of those enhancement images.

Figure 3 illustrates an embodiment of the enhancement video encoding module 19 (or "enhancement layer encodet') that is provided by the inventors.

In this embodiment, the enhancement layer encoder models 190 the statistical distribution of the OCT coefficients within the OCT blocks of a current enhancement image by fitting a parametric probabilistic model.

This fitted model becomes the channel model of DCT coefficients and the fitted parameters are output in the parameter bit-stream 21 coded by the enhancement layer encoder. As will become more clearly apparent below, a channel model may be obtained for each DCT coefficient position within a OCT block, i.e. each type of coefficient or each OCT channel, based on fitting the parametric probabilistic model onto the corresponding collocated DCT coefficients throughout all the OCT blocks of the image XDCT or of part of it.

Based on the channel models, quantizers may be chosen 191 from a pool of pre-computed quantizers dedicated to each OCT channel as further explained below.

The chosen quantizers are used to perform the quantization 192 of the DCT image to obtain the quantized OCT image XTQ.

Lastly, an entropy encoder 193 is applied to the quantized OCT image XDCTQ to compress data and generate the encoded DCT image X5Q which constitutes the enhancement layer bit-stream 20.

The associated enhancement video decoder 33 is shown in Figure 4.

From the received parameters 21, the channel models are reconstructed and quantizers are chosen 330 from the pool of quantizers. As further explained below, quantizers used for dequantization may be selected at the decoder side using a process similar to the selection process used at the encoder side, based on parameters defining the channel models (which parameters are received in the data stream). Alternatively, the parameters transmitted in the data stream could directly identify the quantizers to be used for the various DCI channels.

An entropy decoder 331 is applied to the received enhancement layer bit-stream 20 (X = XtQ) to obtain the quantized OCT image 1DEC A dequantization 332 is then performed by using the chosen quantizers, to obtain a dequantized version of the OCT image Xt.

The channel modeling and the selection of quantizers are some of the additional tools as introduced above.

As will become apparent from the explanation below, those additional tools may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.

As briefly introduced above, the invention is particularly advantageous when encoding images without prediction.

The invention provides a method for encoding image data representing at least one original image, comprising the steps of: -encoding pixel values into coefficients forming at least a layer; -decoding the coefficients to obtain a rough decoded image corresponding to the original image; -processing the rough decoded image through at least one adaptive post-filter adjustable depending on a parameter, wherein said parameter is derived based on pixel values and input to the adaptive post-filter.

The parameter used when adjusting the post-filter is thus adapted to the content of the image.

According to a possible embodiment, the adaptive post-filter may be defined (or, in some embodiments, selected) to minimise a rate-distortion cost computed based on said parameter. The adaptive post-filter is for instance a sample adaptive offset filter, in which case the parameter may be a rate-distortion slope.

According to another possibility, the adaptive post-filter may be an adaptive loop filter; in this case also, the parameter may be a rate-distortion slope.

The post-filter may also be a deblocking filter, in which case the parameter may be a quantization parameter.

According to the embodiment described below, the step of encoding pixel values may include encoding a base layer version of the image into base layer coefficients and encoding a residual image, obtained by difference between the original image and the base layer version, into enhancement layer coefficients.

In this context, the rough decoded image may be obtained by summing a first version obtained by decoding the base layer coefficients and a second version obtained by decoding the enhancement layer coefficients.

As explained further below, the method may comprise some of the following steps: -for each block of pixels of a plurality of blocks of pixels comprised in the image, transforming samples representative of pixel values of the residual image into a set of coefficients each having a coefficient type; -for each coefficient type, determining statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -based on the determined statistics parameters, determining quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; -determining the parameter input to the adaptive post-filter based on the frame merit.

The quantizers and corresponding frame merit are for instance determined such that a video merit computed based on the frame merit and on distortions respectively resulting from the use of the quantizers, corresponds to a target video merit.

Encoding the pixel values may include quantizing coefficients having a given type with the quantizer determined for this coefficient type.

The invention also provides a method for post-filtering reconstructed image data, wherein said image data are representative of at least one original image comprising a plurality of blocks of pixels encoded according to an encoding method, the method comprising the following steps: -for each block of pixels, transforming samples representative of pixel values into a set of coefficients each having a coefficient type; -for each coefficient type, determining statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -based on the determined statistics parameters, determining quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; wherein the post filtering is adjustable as a function of an input parameter depending on the frame merit.

The post filtering is for instance performed on an image encoder side. In this respect, the post filtering is for instance performed in an image encoder, for instance of the type described above. According to a possible variation, the post filtering is performed outside the encoder, for instance in a module external to the encoder; it applies in this case on the image resulting from data produced by the encoder.

The post filtering may be performed on an INTRA image in such a video encoder, and the resulting post filtered image may be introduced in a temporal prediction loop to be used as a reference image for a next image.

The post filtering may also be performed on an image decoder side. In this respect, the post filtering is for instance performed in an image decoder, for instance of the type described above. According to a possible variation, the post filtering is performed outside the decoder, for instance in a module external to the decoder; it applies in this case on the image produced by the decoder.

The post filtering may performed on an INTRA image in such a video decoder, and the resulting post filtered image may be introduced in a temporal prediction loop to be used as a reference image for a next image.

In a corresponding manner, the invention provides a device for encoding image data representing at least one original image, comprising: -an encoding module for encoding pixel values into coefficients forming at least a layer; -a decoding module for decoding the coefficients to obtain a rough decoded image corresponding to the original image; -a processing module for processing the rough decoded image through at least one adaptive post-filter adjustable depending on a parameter, wherein the processing module is configured to derive said parameter based on pixel values and to input the derived parameter to the adaptive post-filter.

The invention also provides a device for post-filtering reconstructed image data, wherein said image data are representative of at least one original image comprising a plurality of blocks of pixels encoded according to an encoding method, the device comprising: -a transforming module for transforming, for each block of pixels, samples representative of pixel values into a set of coefficients each having a coefficient type; -a statistic determining module for determining, for each coefficient type, statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -a quantizer determining module for determining, based on the determined statistic parameters, quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; -an adjusting module for adjusting the post flitering as a function of an input parameter depending on the frame merit.

Optional features proposed above in connection with the methods may also apply to the devices just mentioned.

The invention also provides information storage means, possibly totally or partially removable, able to be read by a computer system, comprising instructions for a computer program adapted to implement a method as mentioned above, when this program is loaded into and executed by the computer system.

The invention also provides a computer program product able to be read by a microprocessor, comprising portions of software code adapted to implement an a method as mentioned above, when it is loaded into and executed by the microprocessor.

The invention also provides an encoding device for encoding an image substantially as herein described with reference to, and as shown in, Figures 1 and 3 of the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which: -Figure 1 schematically shows an encoder for a scalable codec; -Figure 2 schematically shows the corresponding decoder; -Figure 3 schematically illustrates the enhancement video encoding module of the encoder of Figure 1; -Figure 4 schematically illustrates the enhancement video decoding module of the encoder of Figure 2; -Figure 5 illustrates an example of a quantizer based on Voronoi cells; -Figure 6 shows the correspondence between data in the spatial domain (pixels) and data in the frequency domain; -Figure 7 illustrates an exemplary distribution over two quanta; -Figure 8 shows exemplary rate-distortion curves, each curve corresponding to a specific number of quanta; -Figure 9 shows the rate-distortion curve obtained by taking the upper envelope of the curves of Figure 8; -Figure 10 depicts several rate-distortion curves obtained for various possible parameters of the DCT coefficient distribution; -Figure 11 shows an exemplary embodiment of a process for determining optimal quantizers according to the teachings of the invention at the block level;; -Figure 12 shows an exemplary embodiment of a process for determining optimal quantizers according to the teachings of the invention at the frame level; -Figure 13 shows a first possible embodiment of a process for determining optimal quantizers according to the teachings of the invention at the level of a video sequence; -Figure 14 shows a second possible embodiment of a process for determining optimal quantizers according to the teachings of the invention at the level of a video sequence; -Figure 15 shows an exemplary embodiment of an encoding process according to the teachings of the invention; -Figure 16 illustrates a bottom-to-top algorithm used in the frame of the encoding process of Figure 15; -Figure 17 shows the adaptive post-filtering applied at the encoder; -Figure 18 shows the post-filtering applied at the decoder; -Figure 19 shows a particular hardware configuration of a device able to implement methods according to the invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

For the detailed description below, focus is made on the encoding of a UHD video as introduced above with reference to Figures Ito 4. It is however to be recalled that the invention applies to the encoding of any image from which a probabilistic distribution of transformed block coefficients can be obtained (e.g. statistically). In particular, it applies to the encoding of an image without temporal prediction and possibly without spatial prediction.

Referring again to Figure 3, a low resolution version of the initial image has been encoded into an encoded low resolution image, referred above as the base layer; and a residual enhancement image has been obtained by subtracting an interpolated decoded version of the encoded low resolution image from said initial image.

The encoding of the residual enhancement image is now described. As explained in more details below, it is proposed to determine an initial segmentation of the image to be encoded, then to change this segmentation in order to optimize an encoding cost and to use the optimizing segmentation for encoding.

The main steps of this optimized encoding process are now described one by one, before a presentation of the whole process in given with reference to Figure 14.

Conventionally, the residual enhancement image is to be transformed, using for example a DCT transform, to obtain an image of transformed block coefficients. In the Figure, that image is referenced which comprises a plurality of OCT blocks, each comprising DCT coefficients.

As an example, the residual enhancement image may be divided by the initial segmentation just mentioned into blocks Bk, each having a particular block type.

Several block types may be considered, owing in particular to various possible sizes for the block. Other parameters than size may be used to distinguish between block types.

In particular, as there may be a big disparity of activity (or energy) between blocks with the same size, a segmentation of a frame by using only block size is not fine enough to obtain an optimal performance of classification of parts of the frame.

This is why it is proposed to add a label to the block size in order to distinguish various levels and/or characteristics of a block activity.

It is proposed for instance to use only square blocks, here blocks of dimensions 32x32, 16x16 and 8x8. and the following block types for luminance residual frames, each block type being defined by a size and a label (corresponding to an index of energy for instance, but possibly also to other parameters as explained below): -32x32 label 1; -32x32 label 2; -etc. -32x32 label N32; -16x16 label I (e.g. bottom); -16x16 label 2 (e.g. low); -etc.; -16x16 label N16; -8x8 label 1 (e.g. low); -8x8 label 2; -etc.; -8x8 label N8 (e.g. high).

In addition, a further block type may be introduced for each block size, with a label skip" meaning that the corresponding block of data is not encoded and that corresponding residual pixels, or equivalently OCT coefficients, are considered to have a null value (value zero). It is however proposed here not to use these types with skip-label in the initial segmentation, but to introduce them during the segmentation optimisation process, as described below.

There are thus N32+1 block types of size 32x32, N16+1 block types of size 16x16 and N5+1 block types of size 8x8. The choice of the parameters N32, N16, N8 depends on the residual frame content and, as a general rule, high quality coding requires more block types than low quality coding.

For the initial segmentation, the choice of the block size is performed here by computing the L2 integral I of a morphological gradient (measuring residual activity, e.g. residual morphological activity) on each 32x32 block, before applying the DCT transform. (Such a morphological gradient corresponds to the difference between a dilatation and an erosion of the luminance residual frame, as explained for instance in "Image Analysis and Mathematical Morphologf, Vol. 1, by Jean Serra, Academic Press, February 11, 1984.) If the integral computed for a block is higher than a predetermined threshold, the concerned block is divided into four smaller, here 16x16-, blocks; this process is applied on each obtained 16x16 block to decide whether or not it is divided into 8x8 blocks (top-down algorithm).

Once the block size of a given block is decided, the block type of this block is determined based on the morphological integral computed for this block, for instance here by comparing the morphological integral I with thresholds defining three bands of residual activity (Le. three indices of energy) for each possible size (as exemplified above: bottom, low or normal residual activity for 16x16-blocks and low, normal, high residual activity for 8x8-blocks).

It may be noted that the morphological gradient is used in the present example to measure the residual activity but that other measures of the residual activity may be used, instead or in combination, such as local energy or Laplace's operator.

In a possible embodiment, the decision to attribute a given label to a particular block (once its size is determined as above) may be based not only on the magnitude of the integral I, but also on the ratio of vertical activity vs. horizontal activity, e.g. thanks to the ratio lWI, where 1h is the L2 integral of the horizontal morphological gradient and l is the L2 integral of the vertical morphological gradient.

For instance, the concerned block will be attributed a label (Le. a block type) depending on whether the ratio 1h'1v is below 0.5 (corresponding to a block with residual activity oriented in the vertical direction), between 0.5 and 2 (corresponding to a block with non-oriented residual activity) and above 2 (corresponding to a block with residual activity oriented in the horizontal direction).

It is proposed here that chrominance blocks each have a block type inferred from the block type of the corresponding luminance block in the frame. For instance chrominance block types can be inferred by dividing in each direction the size of luminance block types by a factor depending on the resolution ratio between the luminance and the chrominance.

In the present case where use is made of 4:2:0 videos, where chrominance (U and V) frames are down-sampled by a factor two both vertically and horizontally compared to the corresponding luminance frame, blocks in chrominance frames have a size (among 16x16, 8x8 and 4x4) and a label both inferred from the size and label of the corresponding block in the luminance frame.

In addition, ills proposed here as just explained to define the block type in function of its size and an index of the energy, also possibly considering orientation of the residual activity. Other characteristics can also be considered such as for example the encoding mode used for the collocated block of the base layer, referred below as to the "base coding mode". Typically, Intra blocks of the base layer do not behave the same way as Inter blocks, and blocks with a coded residual in the base layer do not behave the same way as blocks without such a residual (to. Skipped blocks).

Figure 11 shows an exemplary process for determining optimal quantizers (based on a given segmentation, e.g. the initial segmentation or a modified segmentation during the optimising process) focusing on steps performed at the block level.

Once a segmentation is determined, including the definition of a block type associated to each block (steps S2). a DCT transform is then applied to each of the concerned blocks (step S4) in order to obtain a corresponding block of DCT coefficients.

Within a block, the OCT coefficients are associated with an index i (e.g. i = 1 to 64), following an ordering used for successive handling when encoding, for

example.

Blocks are grouped into macroblocks MBk. A very common case for so-called 4:2:0 YUV video streams is a macroblock made of 4 blocks of luminance Y, I block of chrominance U and 1 block of chrominance V. Here too, other configurations may be considered.

To simplify the explanations, only the coding of the luminance component is described here with reference to Figure 11. However, the same approach can be used for coding the chrominance components. In addition, it will be further explained with reference to Figures 13 and 14 how to process luminance and chrominance in relation with each other.

Starting from the image XDCT, a probabilistic distribution P of each DCT coefficient is determined using a parametric probabilistic model at step SB. This is referenced 190 in Figure 3.

Since, in the present example, the image X,. is a residual image, i.e. information is about a noise residual, it is efficiently modelled by Generalized Gaussian Distributions (GGD) having a zero mean: DCT (X) GGD(a,/J), where a,f3 are two parameters to be determined and the GGD follows the following two-parameter distribution: GGDQz/3,x):= exp(-x/aI'5, 2a1 (1/fl) and where F is the well-known Gamma function: F(z) = feletdt.

The DOT coefficients cannot be all modelled by the same parameters and, practically, the two parameters a, f3 depend on: -the video content. This means that the parameters must be computed for each image or for every group of n images for instance; -the index i of the DCT coefficient within a DOT block Bk. Indeed, each DCI coefficient has its own behaviour. A DCT channel is thus defined for the DCT coefficients collocated (i.e. having the same index) within a plurality of DCI blocks (possibly all the blocks of the image). A DCT channel can therefore be identified by the corresponding coefficient index i. For illustrative purposes, if the residual enhancement image x. is divided into 8x8 pixel blocks, the modelling 190 has to determine the parameters of 64 DCT channels for each base coding mode.

-the block type defined above. The content of the image, and then the statistics of the OCT coefficients, may be strongly related to the block type because, as explained above, the block type is selected in function of the image content, for instance to use large blocks for parts of the image containing little information.

In addition, since the luminance component Y and the chrominance components U and V have dramatically different source contents, they must be encoded in different DOT channels. For example, if it is decided to encode the luminance component Y on one channel and to encode jointly the chrominance components UV on another channel, 64 channels are needed for the luminance of a block type of size 8x8 and 16 channels are needed for the joint UV chrominance (made of 4x4 blocks) in a case of a 4:2:0 video where the chrominance is down-sampled by a factor two in each direction compared to the luminance. Alternatively, one may choose to encode U and V separately and 64 channels are needed for V, 16 for U and 16 for V. At least 64 pairs of parameters for each block type may appear as a substantial amount of data to transmit to the decoder (see parameter bit-stream 21).

However, experience proves that this is quite negligible compared to the volume of data needed to encode the residuals of Ultra High Definition (4k2k or more) videos. As a consequence, one may understand that such a technique is preferably implemented on large videos, rather than on very small videos because the parametric data would take too much volume in the encoded bitstream.

For sake of simplicity of explanation, a set of DCI blocks corresponding to the same block type are now considered.

To obtain the two parameters ct, defining the probabilistic distribution P for a DCT channel i, the Generalized Gaussian Distribution model is fitted onto the DCT block coefficients of the DCT channel, i.e. the DCT coefficients collocated within the DCT blocks of the same block type. Since this fitting is based on the values of the DOT coefficients, the probabilistic distribution is a statistical distribution of the DOT coefficients within a considered channel i.

For example, the fitting may be simply and robustly obtained using the moment of order k of the absolute value of a GGD: EGGD(a,fl,)j() = flXIkGGD(a,,fl,x)dx= kr((i±k)/p) Determining the moments of order I and of order 2 from the DCT coefficients of channel i makes it possible to directly obtain the value of parameter f3: M2 F(1/fl1)F(3//31) (M1)2 -F(21/31)2 The value of the parameter can thus be estimated by computing the above ratio of the two first and second moments, and then the inverse of the above function of ri', Practically, this inverse function may be tabulated in memory of the encoder instead of computing Gamma functions in real time, which is costly.

The second parameter a, may then be determined from the first parameter 13i and the second moment, using the equation: M2 = a2 = a2F(3 /fl,)/F(1/ fl).

The two parameters a,, being determined for the DCI coefficients i, the probabilistic distribution P, of each DCT coefficient i is defined by (x) = GGD(a,,fl,,x) = 2a1F(1!) exp(Hx/aj) Referring to Figure 3, a quantization 193 of the DCT coefficients is to be performed in order to obtain quantized symbols or values. As explained below, it is proposed here to first determine a quantizer per DCT channel so as to optimize a rate-distortion criterion.

Figure 5 illustrates an exemplary Voronoi cell based quantizer.

A quantizer is made of M Voronoi cells distributed along the values of the DCT coefficients. Each cell corresponds to an interval [tn,,tm+iI, called quantum Qm.

Each cell has a centroid c,,,, as shown in the Figure.

The intervals are used for quantization: a DCI coefficient comprised in the interval [rn,tm+I1 is quantized to a symbol am associated with that interval.

For their part, the centroids are used for de-quantization: a symbol am associated with an interval is de-quantized into the centroid value Cm of that interval.

The quality of a video or still image may be measured by the so-called Peak-Signal-to-Noise-Ratio or PSNR, which is dependent upon a measure of the L2-norm of the error of encoding in the pixel domain, i.e. the sum over the pixels of the squared difference between the original pixel value and the decoded pixel value. It may be recalled in this respect that the PSNR may be expressed in dB as: where MAX is the maximal pixel value (in the spatial domain) and MSE is the mean squared error (i.e. the above sum divided by the number of pixels concerned).

However, as noted above, most of video codecs compress the data in the DCT-transformed domain in which the energy of the signal is much better compacted.

The direct link between the PSNR and the error on DCI coefficients is now explained.

For a residual block, we note çe' its inverse DCI (or IDCT) pixel base in the pixel domain as shown on Figure 6. If one uses the so-called DCI Ill for the inverse transform, this base is orthonormal: = 1.

On the other hand, in the DCT domain, the unity coefficient values form a base q which is orthogonal. One writes the DCT transform of the pixel block X as follows: XDCT = dflço where d is the value of the n-th DCT coefficient. A simple base change leads to the expression of the pixel block as a function of the DCI coefficient values: X = IDCT(XDCT) = IDCT d'1ço =d1DCT(q') =d71qc.

If the value of the de-quantized coefficient d" after decoding is denoted one sees that (by linearity) the pixel error block is given by = (d" -d)yi The mean Lrnorm error on all blocks, is thus: EsiI)= E(dtt d;2J = E4n d2) = where D is the mean quadratic error of quantization on the n-th DCT coefficient, or squared distortion for this type of coefficient. The distortion is thus a measure of the distance between the original coefficient (here the coefficient before quantization) and the decoded coefficient (here the dequantized coefficient).

It is thus proposed below to control the video quality by controlling the sum of the quadratic errors on the DCT coefficients. In particular, this control is preferable compared to the individual control of each of the OCT coefficient, which is a priori a sub-optimal control.

In the embodiment described here, it is proposed to determine (i.e. to select in step 191 of Figure 3) a set of quantizers (to be used each for a corresponding DCI channel), the use of which results in a mean quadratic error having a target value DE2 while minimising the rate obtained. This corresponds to step 316 in Figure 11.

In view of the above correspondence between PSNR and the mean quadratic error on OCT coefficients, these constraints can be written as follows: minimize I? = R (13,3 s.t. = D (A) where R is the total rate made of the sum of individual rates R for each OCT coefficient. In case the quantization is made independently for each DCT coefficient, the rate R11 depends only on the distortion 1J of the associated n-th DCT coefficient.

It may be noted that the above minimization problem (A) may only be fulfilled by optimal quantizers which are solution of the problem minimize J? (Dr) s.t. Ejd" -d 2) = D (B).

This statement is simply proven by the fact that, assuming a first quantizer would not be optimal following (B) but would fulfil (A). then a second quantizer with less rate but the same distortion can be constructed (or obtained). So, if one uses this second quantizer, the total rateR has been diminished without changing the total distortion this is in contradiction with the first quantifier being a minimal solution of the problem (A).

As a consequence, the rate-distortion minimization problem (A) can be split into two consecutive sub-problems without losing the optimality of the solution: -first, determining optimal quantizers and their associated rate-distortion curves R,3(D) following the problem (B), which will be done in the present case for GGD channels as explained below; -second, by using optimal quantizers, the problem (A) is changed into the problem (A_opt): minimize R = R (Dr) s.t. = D,E and R (D) is optimal (A_opt).

Based on this analysis, it is proposed as further explained below: -to compute off-line (step SB in Figure 11) optimal quantizers adapted to possible probabilistic distributions of each OCT channel (thus resulting in the pool of quantizers of Figure 3); -to select (step S16) one of these pre-computed optimal quantizers for each OCT channel (La each type of DCT coefficient) such that using the set of selected quantizers results in a global distortion corresponding to the target distortion D? with a minimal rate (Le. a set of quantizers which solves the problem A_opt).

It is now described a possible embodiment for the first step SB of computing optimal quantizers for possible probabilistic distributions, here Generalised Gaussian Distributions.

It is proposed to change the previous complex formulation of problem (B) into the so-called Lagrange formulation of the problem: for a given parameter 2 >0, we determine the quantization in order to minimize a cost function such as + AR.

We thus get an optimal rate-distortion couple (D2,R2). In case of a rate control (Le.

rate minimisation) for a given target distortion A,, the optimal parameter A >0 is determined by,% = argminR2 (i.e. the value of A for which the rate is minimum while A. D«=A, fulfilling the constraint on distortion) and the associated minimum rate is R = R2 As a consequence, by solving the problem in its Lagrange formulation, for instance following the method proposed below, it is possible to plot a rate distortion curve associating a resulting minimum rate to each distortion value (A, I-> R, ) which may be computed off-line as well as the associated quantization, i.e. quantizer, making it possible to obtain this rate-disortion pair.

It is precisely proposed here to formulate problem (B) into a continuum of problems (B Jambda) having the following Lagrange formulation minimize D + AR (D) s.t. Ex dm2)= D (B_lambda).

The well-known Chou-Lookabaugh-Gray algorithm is a good practical way to perform the required minimisation. It may be used with any distortion distance d we describe here a simplified version of the algorithm for the L2-distance. This is an iterative process from any given starting guessed quantization.

As noted above, this algorithm is performed here for each of a plurality of possible probabilistic distributions (in order to obtain the pre-computed optimal quantizers for the possible distributions to be encountered in practice), and for a plurality of possible numbers M of quanta. It is described below when applied for a given probabistic distribution P and a given number M of quanta.

In this respect, as the parameter alpha a (or equivalently the standard deviation a of the Generalized Gaussian Definition) can be moved out of the distortion parameter D because it is a homothetic parameter, only optimal quantizers with unity standard deviation a = 1 need to be determined in the pool of quantizers.

Taking advantage of this remark, in the proposed embodiment, the GGD representing a given OCT channel will be normalized before quantization (Le.

homothetically transformed into a unity standard deviation GGD), and will be de-normalized after de-quantization. Of course, this is possible because the parameters (in particular here the parameter a or equivalently the standard deviation a) of the concerned GGD model are sent to the decoder in the video bit-stream.

Before describing the algorithm itself, the following should be noted.

The position of the centroids c, is such that they minimize the distortion S, inside a quantum, in particular one must verify that = 0 (as the derivative is zero at a minimum).

As the distortion 8,,, of the quantization, on the quantum Qm' is the mean error E(d(x;cm)) for a given distortion function or distance d, the distortion on one quantum when using the L2-distance is given by 8 = ,{ Ix_cm2P(x) and the nulhfication of the derivative thus gives: Cm = xP(x)dx/P,,,, where Pm is the probability of x to be in the quantum Q, and is simply the following integral = P(x)dx.

Turning now to minimisation of the cost function C=D2 +21?, and considering that the rate reaches the entropy of the quantized data: B = log2 1,, n,l the nullification of the derivatives of the cost function for an optimal solution can be written as: 0= a,c= a,, -AJ 1nJ,, +A,,1 -2J inij Let us set F= P(ç÷1)the value of the probability distribution at the point ç÷1. From simple variational considerations, see Figure 7, we get and t)ffl Then, a bit of calculation leads to 8L\2 3 tk_1)th =Ic,+1 Cm2 + jô,JX_CmJP(X)dX = -Cm -28,c,,, f'' (x -cm)P(4dX =PIc+1 Cmi aswellas = PItm,I -Cm+j As the derivative of the cost is now explicitly calculated, its cancellation gives:O=Pft _dm:2 -1n -2P-ç dm÷j2 +2Thn1 which leads to a useful relation between the quantum boundaries c +c hiP -lnP and the centroids c: t = " -2 m#1 fit 2 2(Cm+i -Cm) Thanks to these formulae, the Chou-Lookabaugh-Gray algorithm can be implemented by the following iterative process: 1. Start with arbitrary quanta Qm defined by a plurality of limits m 2. Compute the probabilities P,, by the formula 1, = ,f P(x)dx 3. Compute the centroids c,,, by the formula Cm = xP(x)dx/ 4. Compute the limits t, of new quanta by the formula -C,,, +C,,,1 -hiP,,, m*J - 2 2(Cm*i -Cm) 5. Compute the cost C = D2 +AR bythe formula C = &rn 6. Loop to 2. until convergence of the cost C When the cost C has converged, the current values of limits t,,, and centroids Cm define a quantization, i.e. a quantizer. with M quanta, which solves the problem (B Jambda), i.e. minimises the cost function for a given value A, and has an associated rate value RA and an distortion value D2.

Such a process is implemented for many values of the Lagrange parameter A (for instance 100 values comprised between 0 and 50). It may be noted that for 2 equal to 0, there is no rate constraint, which corresponds to the so-called Lloyd quantizer.

In order to obtain optimal quantizers for a given parameter /3 of the corresponding GGD, the problems (B Jambda) are to be solved for various odd (by symmetry) values of the number M of quanta and for the many values of the parameterA. A rate-distortion diagram for the optimal quantizers with varying M is thus obtained, as shown on Figure 8.

It turns out that, for a given distortion, there is an optimal number M of needed quanta for the quantization associated to an optimal parameter 2. In brief, one may say that optimal quantizers of the general problem (B) are those associated to a point of the upper envelope of the rate-distortion curves making this diagram, each point being associated with a number of quanta (i.e. the number of quanta of the quantizer leading to this point of the rate-distortion curve). This upper envelope is illustrated on Figure 9. At this stage, we have now lost the dependency on 2 of the optimal quantizers: for a given rate (or a given distortion) corresponds only one optimal quantizer whose number of quanta M is fixed.

Based on observations that the GGD modelling provides a value of /3 almost always between 0.5 and 2 in practice, and that only a few discrete values are enough for the precision of encoding, it is proposed here to tabulate /3 every 0.1 in the interval between 0.2 and 2.5. Considering these values of /3 (to. here for each of the 24 values of /3 taken in consideration between 0.2 and 2.5), rate-distortion curves, depending on /3, are obtained (step 310) as shown on Figure 10. It is of course possible to obtain according to the same process rate-distortion curves for a larger number of possible values of /3.

Each curve may in practice be stored in the encoder in a table containing, for a plurality of points on the curve, the rate and distortion (coordinates) of the point concerned, as well as features defining the associated quantizer (here the number of quanta and the values of limits Em and centroids Cm for the various quanta). For instance, a few hundreds of quantizers may be stored for each /3 up to a maximum rate, e.g. of 5 bits per OCT coefficient, thus forming the pool of quantizers mentioned in Figure 3. It may be noted that a maximum rate of 5 bits per coefficient in the enhancement layer makes it possible to obtain good quality in the decoded image.

Generally speaking, it is proposed to use a maximum rate per DCT coefficient equal or less than 10 bits, for which value near lossless coding is provided.

Before turning to the selection of quantizers (step 316), for the various OCT channels and among these optimal quantizers stored in association with their corresponding rate and distortion when applied to the concerned distribution (GGD with a specific parameter /3), it is proposed here to select which part of the DCT channels are to be encoded.

Based on the observation that the rate decreases monotonously as a function of the distortion induced by the quantizer, precisely in each case in the manner shown by the curves just mentioned, it is possible to write the relationship between rate and distortion as follows: R =f,,(-1n(D /cr)), where a is the normalization factor of the DCT coefficient, to. the GGD model associated to the DCI coefficient has c for standard deviation, and where L »= 0 in view of the monotonicity just mentioned.

In particular, without encoding (equivalently zero rate) leads to a quadratic distortion of value o and we deduce that 0 = f,, (0).

Finally, one observes that the curves are convex for parameters /3 lower thantwo:/1«=2 = f"»=0 It is proposed here to consider the merit of encoding a DCI coefficient.

More encoding basically results in more rate R (in other words, the corresponding cost) and less distortion D (in other words the resulting gain or advantage).

Thus, when dedicating a further bit to the encoding of the video (rate increase), it should be determined on which DCT coefficient this extra rate is the most efficient. In view of the analysis above, an estimation of the merit M of encoding may be obtained by computing the ratio of the benefit on distortion to the cost of encoding: Al:= AD,, Considering the distortion decreases by an amounts, then a first order development of distortion and rates gives (D-c)2 =D2-2cD+ofr) and R(D -= f,, (-ln((D -e) / a)) = f, (-ln(D / a) -ln(1 -c / D)) =f(-ln(D/cr)+e/D+o(c)) = As a consequence, the ratio of the first order variations provides an explicit 2D2 formula for the merit of encoding: M (D,,) f,,'(-ln(D,, /a)) If the initial merit M° is defined as the merit of encoding at zero rate, La before any encoding, this initial merit M° can thus be expressed as follows using the preceding formula:M,? :=M,,(c) = (because as noted above no encoding leads to a quadratic distortion of value c-fl.

It is thus possible, starting from the pre-computed and stored rate-distortion curves, to determine the function 1,, associated with a given DCT channel and to compute the initial merit M of encoding the corresponding DCI coefficient (the value f!() being determined by approximation thanks to the stored coordinates of rate-distortion curves).

It may further be noted that, for ft lower than two (which is in practice almost always true), the convexity of the rate distortion curves teaches us that the merit is an increasing function of the distortion.

In particular, the initial merit is thus an upper bound of the merit: M,JD)«= M. It will now be shown that, when satisfying the optimisation criteria defined above, all encoded DCT coefficients in the block have the same merit after encoding.

Furthermore, this does not only apply to one block only, but as long as the various functions f, used in each DCI channel are the unchanged, i.e. in particular for all blocks in a given block type. Hence the common merit value for encoded DCT coefficients will now be referred to as the merit of the block type.

The above property of equal merit after encoding may be shown for instance using the Karush-Kuhn-Iucker (KKT) necessary conditions of optimality. In this goal, the quality constraint =D can be rewritten as h=O with The distortion of each DCT coefficient is upper bounded by the distortion without coding: Da «= o-, and the domain of definition of the problem is thus a multi-dimensional box 0 = {(D1,D); /J «= a,, } = {(D1,D2,. . .);g «= o}, defined by the functions g,, (D,1) := D -a,,.

Thus, the problem can be restated as follows: minimize R(D1,D2,...) s.t. h = O,g «= 0 (A_opt').

Such an optimization problem under inequality constrains can effectively be solved using so-called Karush-Kuhn-Tucker (KKT) necessary conditions of optimality.

In this goal. the relevant KKT function A is defined as follows: The KKT necessary conditions of minimization are -stationarity: dA = 0, -equality: h = 0, -inequality: g «= 0, -dual feasibility: p -saturation: = 0.

It may be noted that the parameter 2 in the KKI function above is unrelated to the parameter 2 used above in the Lagrange formulation of the optimization problem meant to determine optimal quantizers.

If g = 0, the n-th condition is said to be saturated. In the present case, it indicates that the n -th DCT coefficient is not encoded.

By using the specific formulation R =f,2(-1n(DIcr)) of the rate depending on the distortion discussed above, the stationarity condition gives: o = = ÔDRfl -280h--p,,O g = -f'ID -22D,, -p La 22D, =--pD -f,'.

By summing on n and taking benefit of the equality condition, this leads to 22D =-pD (i') In order to take into account the possible encoding of part of the coefficients only as proposed above, the various possible indices n are distributed into two subsets: -the set jo = {n;p = o} of non-saturated DCT coefficients (La of encoded DCI coefficients) for which we have,uD, = 0 and D2 = -f'/22, and -the set 1 = {n;p > o} of saturated DCT coefficients (i.e. of DCI coefficients not encoded) for which we have,uD, = -L -22cy,,2.

From (), we deduce 22D and by gathering the A. s 11_ III As a consequence, for a non-saturated coefficient (n 1), i.e. a coefficient to be encoded, we obtain: D =[D Ic))/ fm'(mfl(Dm em)) This formula for the distortion makes it possible to rewrite the above formula giving the meritM (D) as follows for non-saturated coefficients: M(D0) = 2.[D _xcJi J;'(-1n(D, 1Cm)) ft mel° Clearly, the right side of the equality does not depend on the DCI channel concerned. Thus, for a block type k, for any OCT channel for which coefficients are encoded, the merit associated with said channel after encoding is the same: M = Another proof of the property of common merit after encoding is the following: supposing that there are two encoded DCI coefficients with two different merits Ml <M2, if an infinitesimal amount of rate from coefficient 1 is put on coefficient 2 (which is possible because coefficient 1 is one of the encoded coefficients and this does not change the total rate), the distortion gain on coefficient 2 would then be strictly bigger than the distortion loss on coefficient 1 (because Ml C M2). This would thus provide a better distortion with the same rate, which is in contradiction with the optimality of the initial condition with two different merits.

As a conclusion, if the two coefficients 1 and 2 are encoded and if their respective merits Ml and M2 are such that Ml <M2, then the solution is not optimal.

Furthermore, all non-coded coefficients have a merit smaller than the merit of the block type (Le. the merit of coded coefficients after encoding).

In view of the property of equal merits of encoded coefficients when optimisation is satisfied, it is proposed here to encode only coefficients for which the initial encoding merit M = is greater than a predetermined target block merit mk -For each coefficient to be encoded, a quantizer is selected to obtain the target block merit as the merit of the coefficient after encoding: first, the corresponding 2D2 distortion, which is thus such that M (D) = " = m/, can be found by f'(-Jxi(D, Iafl)) dichotomy using stored rate-distortion curves (step Si 4); the quantizer associated (see steps SB and 310 above) with the distortion found is then selected (step 516).

Figure 12 shows the process for determining optimal quantizers implemented in the present example at the level of the frame, which includes in particular determining the target block merit for the various block types.

First, the frame is segmented at step 530 into a plurality of blocks each having a given block type k, for instance in accordance with the process described above based on residual activity, or as a result of a change in the segmentation as explained below.

A parameter k designating the block type currently considered is then initialised at step S32.

The target block merit m, for the block type k currently considered is the computed at step S34 based on a predetermined frame merit n( and on a number of blocks Vk of the given block type per area unit, here according to the formula: mk=vk.m For instance, one may choose the area unit as being the area of a 16x16 block, he. 256 pixels. In this case, Vk = 1 for block types of size 16x16, Vk = 4 for block types of size 8x8 etc. One also understands that the method is not limited to square blocks; for instance Vk = 2 for block types of size 16x8.

This type of computation makes it possible to obtain a balanced encoding between block types, he. here a common merit of encoding per pixel (equal to the frame merit m') for all block types.

This is because the variation of the pixel distortion A82 k for the block type k is the sum of the distortion variations provided by the various encoded codedir DCT coefficients, and can thus be rewritten as follows thanks to the (common) block merit: LXSF2k = mk. Ak = k.Mk (where ARk is the rate variation for a block of codedn type k). Thus, the merit of encoding per pixel is: A5k = mk.ARk = m (where Uk is AU Vk.AR* the rate per area unit for the block type concerned) and has a common value over the various block types.

Optima! quantizers are then determined for the block type k currently considered by the process described above with reference to Figure 11 using the data in blocks having the current block type k when computing parameters of the probabilistic distribution (GGD statistics) and using the block merit m just determined as the target block merit in step Si 4 of Figure 11.

The next block type is then considered by incrementing k (step S38), checking whether all block types have been considered (step S40) and looping to step S34 if all block types have not been considered.

If all block types have been considered, the whole frame has been processed (step S42), which ends the encoding process at the frame level presented here.

Figure 13 shows a process for determining optimal quantizers according to a first possible embodiment, which includes in particular determining the frame merit for luminance frames V as well as for chrominance frames U,V of the video sequence.

The process shown in Figure 13 applies to a specific frame and is to be applied at each iteration of the optimisation process described below with reference to Figure 15.

The frame is segmented into blocks each having a block type at step S50; in a similar manner as was explained above for step S30, this can result from the initial segmentation or from a segmentation obtained during the optimization process. As mentioned above, the initial segmentation is determined based on the residual activity of the luminance frame V and is also applied to the chrominance frames U,V.

A DCI transform is then applied (step S52) to each block thus defined. The DCT transform is adapted to the type of the block concerned, in particular to its size.

Parameters representative of the statistical distribution of coefficients (here a1, as explained above) are then computed (step 554) both for luminance frames and for chrominance frames, in each case for each block type, each time for the various coefficient types.

A loop is then entered (at step S58 described below) to determine by dichotomy a luminance frame merit mT and a chrominance frame merit m'' linked by the following relationship: 1 2 1 VIDEO VIDEO D2 -= -f-, where p is a selectable video merit obtained p m m for instance based on user selection of a quality level at step S56 and D is the frame distortion for the luminance frame after encoding and decoding.

Each of the determined luminance frame merit mT and chrominance frame merit m may then be used as the frame merit d in a process similar to the process described above with reference to Figure 12. as further explained below.

The relationship given above makes it possible to adjust (to the value VIDEO) the local video merit defined as the ratio between the variation of the PSNR (already defined above) of the luminance APSNR1 and the corresponding variation of the total rate ARWV (including not only luminance but also chrominance frames). This ratio is generally considered when measuring the efficiency of a coding method.

This relationship is also based on the following choices made in the present embodiment: -the quality of luminance frames is the same as the quality of chrominance frames: -the merit of U chrominance frames is the same as the merit of V chrominance frames: ,nt = = in.

As explained above, the merit m1 of encoding per pixel is the same whatever the block in a frame and the relationship between distortion and rate thus remains valid at the frame level (by summing over the frame the distortions of the one hand and the rates on the other hand, each corresponding distortion and rate defining a constant ratio mT'): ADJ =m1ARy, AD = m.ARu and AD = mZV.ARv, where AR1, AR and AJ?V are the rate variations respectively for the luminance frame, the U chrominance frame and the V chrominance frame.

Thus, =LV1 A ___ As the PSNR is the logarithm of the distortion D, its variation APSNR can be written as follows at the first order: APSNR = 2 and the video merit can thus be restated as follows based on the above asumptions and remarks: APSNR.I -APSNR M -AD.m AD -I vv -AR M, DJXD mAD(-21r+ )D.e-+ ,) in m m in This ratio is equal to the chosen value J/" when the above relationship 1 2 1 PVJDEOD2 -m' = is satisfied.

Going back to the loop process implemented to determine the luminance frame merit mT and the chrominance frame merit m as mentioned above, a lower bound mf and an upper bound ml for the luminance frame merit are initialized at step 558 at predetermined values. The lower bound m and the upper bound ml define an interval, which includes the luminance frame merit and which will be reduced in size (divided by two) at each step of the dichotomy process. At initialization step 558, the lower bound m may be chosen as strictly positive but small, corresponding to a nearly lossless encoding, while the upper bound ml is chosen for instance greater than all initial encoding merits (over all DCT channels and all block types).

A temporary luminance frame merit m1' is computed (step S60) as equal to m1 + (i.e. in the middle of the interval).

A block merit is then computed at step 562 for each of the various block types, as explained above with reference to Figure 12 (see in particular step S34) according to the formula: mk = Block merits are computed based on the temporary luminance frame merit defined above. The next steps are thus based on this temporary value which is thus a tentative value for the luminance frame merit.

For each block type k in the luminance frame, the distortions Dky after encoding of the various OCT channels n are then determined at step S64 in accordance with what was described with reference to Figure 11, in particular step 514, based on the block merit m& just computed and on optimal rate-distortion curves determined beforehand at step S67, in the same manner as in step SI 0 of Figure 11.

The frame distortion for the luminance frame D can then be determined at step 566 by summing over the block types thanks to the formula: = P, 8PkY = P ( Dky) where Pk is the density of a block type in the frame, Le. the ratio between the total area for blocks having the concerned block type k and the total area of the frame.

It is then sought, for instance by dichotomy at step S68 and also based on optimal rate-distortion curves predetermined at step S67, a temporary chrominance frame merit m such that the distortions after encoding DkU,D,kv, obtained by implementing a process according to Figure 12 using m'' as the frame merit, result in chrominance frame distortions Dt,D satisfying D = (D + D)/2.

It may be noted in this respect that the relationship between distortions of the DCI channels and the frame distortion, given above for the luminance frame, is also valid for each of the chrominance frames U,V.

It is then checked at step 570 whether the interval defined by the lower bound ml and the upper bound in have reached a predetermined required accuracy a, i.e. whether -ml <a If this is not the case, the dichotomy process will be continued by selecting one of the first half of the interval and the second half of the interval as the new interval 1 1 2 to be considered, depending on the sign of -i--ViDEO 2 + L which will thus m p.D in converge towards zero such that the relationship defined above is satisfied. The lower 2 bound ml and the upper bound are adapted consistently with the selected interval (step S72) and the process loops at step 560.

If the required accuracy is reached, the process continues at step S74 where quantizers are selected in a pool of quantizers predetermined at step S65 and associated with points of the optimal rate-distortion curves already used (see explanations relating to step S8 in Figure 11), based on the distortions values DkU, Dkv obtained during the last iteration of the dichotomy process (steps S64 and S68 described above).

Figure 14 shows a process for determining optimal quantizers according to a second possible embodiment, which includes in particular determining the frame merit for luminance component Y as well as for each of chrominance components U,V for each frame of the video sequence.

It is proposed in the present embodiment to consider the following video quality function: where R is the rate for the component * of a frame. PSNR is the PSNR for the component * of a frame, and e, e are balancing parameters provided by the user in order to select the acceptable degree of distortion in the concerned chrominance component (U or V) relative to the degree of distortion in the luminance component.

In order to unify the explanations in the various components, use is made below of O=l and the video quality function considered here can thus be rewritten as: Q(R,R,R)= As already noted, the PSNR is the logarithm of the frame distortion: PSNR = ln(D3) (D2 being the frame distortion for the frame of the component *) and it AD2 can thus be written at the first order that APSNR. = * As the merit d of encoding per pixel is the same whatever the block in a frame, the relationship between distortion and rate thus remains valid at the frame level (by summing over the frame the distortions of the one hand and the rates on the other hand, each corresponding distortion and rate defining a constant ratio m') and it can be written that: AD? = m.AR..

The variation of the video quality Q defined above depending on the attribution of the rate R to a given component * can thus be estimated to: = It is proposed in the process below to encode the residual data such that no component is favoured compared to another one (taking into account the video quality function Q), i.e. such that = = Q. As described below, the encoding process will thus be designed to obtain a value p"° for this common merit, which value defines the video merit and is selectable by the user. In view of the above formulation for -, the process below is thus designed such that: VJDEO = = = La to obtain, for each of the three components, a frame merit tiC such that the function e(m*) = pvWEO.Di2(m*) -is null (the distortion at the frame level D.2 being here noted D(m*) in order to explicit the fact that it depends on the frame merit rn).

The process shown in Figure 14 applies to a particular component, denoted * below, of a specific frame and is to be applied to each of the three components V. U, Vat each iteration of the optimisation process described below with reference to Figure 15.

The process of Figure 14 applies to a frame which is segmented into blocks according to a current segmentation (which can be either an initial segmentation as defined above or a segmentation produced at any step by the optimization process described below with reference to Figure 15).

A DCT transform is applied (step S80) to each block thus defined in the concerned frame.

Parameters representative of the statistical distribution of coefficients (here ct, P' as explained above) are then computed (step S82) for each block type, each time for the various coefficient types. As noted above, this applies to a given component * only.

Before entering a loop implemented to determine the frame merit mt, a lower bound m and an upper bound m for the frame merit are initialized at step SM at predetermined values. The lower bound in and the upper bound in define an interval, which includes the sought frame merit and which will be reduced in size (divided by two) at each step of the dichotomy process. At initialization step S84, the lower bound m may be chosen as strictly positive but small, corresponding to a nearly lossless encoding, while the upper bound m, is chosen for instance greater than all initial encoding merits (over all DCI channels and all block types).

A temporary luminance frame merit mt is computed (step S86) as equal to ni; m,J (to. in the middle of the interval).

A block merit is then computed at step 388 for each of the various block types, as explained above with reference to Figure 12 (see in particular step 334) according to the formula: mk = . Block merits are computed based on the temporary frame merit defined above. The next steps are thus based on this temporary value which is thus a tentative value for the frame merit for the concerned component.

For each block type k in the frame, the distortions Dk. after encoding of the various DCT channels n are then determined at step 388 in accordance with what was described with reference to Figure 11, in particular step 514, based on the block merit mk just computed and on optimal rate-distortion curves determined beforehand at step 389, in the same manner as in step Si 0 of Figure 11.

The frame distortion for the luminance frame]J2 can then be determined at step 392 by summing over the block types thanks to the formula: = P = P.( where is the density of a block type in the frame, i.e. the ratio between the total area for blocks having the concerned block type k and the total area of the frame.

It is then checked at step 394 whether the interval defined by the lower bound m and the upper bound have reached a predetermined required accuracy a, i.e. whether m, -<a.

If this is not the case, the dichotomy process will be continued by selecting one of the first half of the interval and the second half of the interval as the new interval to be considered, depending on the sign of e(m*), i.e. here the sign of VIDEO D(m*) -O.mt,which will thus converge towards zero as required to fulfill the criterion defined above. It may be noted that the selected video merit /J" (see selection step S81) and, in the case of chrominance frames U, V. the selected balancing parameter 9. (i.e. 9u or th,) are introduced at this stage in the process for determining the frame merit nC.

The lower bound m and the upper bound tn are adapted consistently with the selected interval (step S98) and the process loops at step 386.

If the required accuracy is reached, the process continues at step S96 where quantizers are selected in a pool of quantizers predetermined at step 381 and associated with points of the optimal rate-distortion curves already used (see explanations relating to step 58 in Figure 11), based on the distortions values Dk, obtained during the last iteration of the dichotomy process (step 590 described above).

These selected quantizers may be used for encoding coefficients in an encoding process or in the frame of a segmentation optimization method as described below (see step 5104 in particular).

The process just described for determining optimal quantizers uses a function e(m*) resulting in an encoded frame having a given video merit (denoted vwEo above), with the possible influence of balancing parameters O. As a possible variation, it is possible to use a different function e(m), which will result in the encoded frame fulfilling a different criterion. For instance, if it is sought to obtain a target distortion D, the function e(m) = D(m*) -D? could be used instead.

In a similar manner, if it is sought to control the rate of a frame (for a given component) to a target rate R, the function e(m*) = -R could be used. In this case, step S90 would include determining the rate for encoding each of the various channels (also considering each of the various blocks of the current segmentation) using the rate-distortion curves (389) and step 392 would include summing the determined rates to obtain the rate R for the frame.

In addition, although the process of Figure 14 has been described in the context of a video sequence with three colour components, it also applies in the context of a video sequence with a single colour component, e.g. luminance, in which case no balancing parameter is used (9, = 1, which is by the way the case for the luminance component in the example just described where B was defined as equal to 1).

Figure 15 shows an exemplary embodiment of an encoding process according to the teachings of the invention. As briefly mentioned above, the process is an optimization process using the processes described above, in particular one of the two embodiments described respectively with reference to Figures 13 and 14.

This process applies here to a video sequence comprising a luminance component Y and two luminance components U,V.

The process starts at step S100 with determining an initial segmentation for the luminance frame V based on the content of the blocks of the frame, e.g. in accordance with the initial segmentation method described above using a measure of residual activity. As already explained, this segmentation defines a block type for each block obtained by the segmentation, which block type refers not only to the size of the block but also to other possible parameters, such as a label derived for instance from the measure of residual activity.

It is possible in addition to force this initial segmentation to provide at least one block for each possible block type (except possibly for the block types having a skip-label), for instance by forcing some blocks to have the block types not encountered by use of the segmentation method based on residual activity, whatever the content of these blocks. As will be understood from the following description, forcing the presence of each and every possible block type in the segmentation makes it possible to obtain statistics and optimal quantizers for each and every block type and thus to enlarge the field of the optimization process.

The process then enters a loop (optimization loop).

At step S102, DCT coefficients are computed for blocks defined in the current segmentation (which is the initial segmentation the first time step S102 is implemented) and, for each block type, parameters (GGD statistics) representing the probabilistic distributions of the various DCT channels are computed. This is done in conformity with steps S4 and 86 of Figure 11 described above.

The computation of DCT coefficients and GGD statistics is performed for the luminance frame Y and for chrominance frames U.V (each time using the same current segmentation associating a block type to each block of the segmentation).

Frame merits (mr, m' in the first embodiment, m in the second embodiment), block merits mk (for each block type) and optimal quantizers for the various block types and DCT channels can thus be determined at step 8104 thanks to either the process of Figure 13 or the process of Figure 14.

These elements can then be used at step 8106 in an encoding cost competition between possible segmentations, each defining a block type for each block of the segmentation. It may be noted that block types with a skip label, i.e. corresponding to non-encoded blocks, may easily be introduced at this stage (when they are not considered at the time of determining the initial segmentation) as their distortion equals the distortion of the block in the base layer and their rate is null.

It is proposed here to use a Lagrangian cost of the type -+ R, or in an equivalent manner 9-+R, (as an encoding cost in the encoding cost competition) computed from the bit rate needed for encoding by using the quantizers of the concerned (competing) block type and the distortion after quantization and dequantization by using the quantizers of the concerned (competing) block type. As a possible variation, the encoding cost may be estimated differently, such as for instance using only the bit rate just mentioned (Le. not taking into account the distortion parameter).

The Lagrangian cost generated by encoding blocks having a particular block type will be estimated as follows.

The cost of encoding for the luminance is 8F,k + Rky where is the pixel distortion for the block type k introduced above and Rky is the associated rate.

It is known that, as rate and distortion values are constrained on a given rate-distortion curve, Lagrange's parameter can be written as follows: 2 = -_____ ôRky and thus approximated as follows: 2 - = = v[ .m (where v[ is the number of blocks of the given block type per area unit in the luminance frame).

It is thus proposed to estimate the luminance cost as follows: Cky = 8P,k.Y + Rky + where kQT is the bit rate associated to the parsing of the generalized quad-tree (representing the segmentation) to mark the type of the concerned block in the bit stream. A possible manner to encode the quad tree in the bit stream is described below. This bit rate RSQT is computed at step Si 05.

When using the first embodiment (Figure 13), it is proposed to estimate the cost for chrominance components as follows.

If the cost of encoding for the chrominance is written 8Pk,UV Lagrange's parameter is given by 2 = -88kUJt and can thus be approximated as: 2 -A8kUJ, ARkW.

As explained for the process of Figure 13, it is proposed here that: -the quality of luminance frames is the same as the quality of chrominance frames: D = = (D + D)/2, which gives at the block level: 2 -8kU + 8P,k,UY - -the merit of U chrominance frames is the same as the merit of V chrominance frames: mU = mV = mw', which results in an equal merit vt.mW' for U and V frames at the block level (where yr is the number of blocks of the given block type per area unit in the chrominance frame).

Thus, Lagrange's parameter can be estimated (based in particular on the definition of the merit) as: 2A8kUV A(8,, +SPkV)vL.m A(RkU+Rkv) vr.m'-"t 2.ARk 2 2 It is thus proposed to estimate the chrominance cost as follows: ) C h..UPkUV k,UV uv uY+ id/V Vk,m It may be noted that no rate is dedicated to a chrominance quad-tree as it is considered here that the segmentation for chrominance components follows the segmentation for the luminance frame.

Still in the frame of the first embodiment (Figure 13), the combined cost.

taking into account both luminance and chrominance, is the sum of the two associated costs. However, in order to also take into consideration the coupling between luminance and chrominance: the merit of chrominance is computed such that the quality (on the whole frame) of the chrominance matches the quality of the luminance.

As a consequence, a variation of the luminance distortion in one block has a global impact on the average distortion of the chrominance on the whole frame. Due to the quality equality, this impact is A8JUV = 8pky and it is thus proposed to introduce a corresponding coupling term in the combined cost, which can thus be estimated by the following formula: ____ 2.(52.

IC,YUV = V Y + uv UV + ky + k,UV+ LQT v.m This formula thus makes it possible to compute the Lagrangian cost in the competition between possible segmentations mentioned above and described in more details below, in the frame of the first embodiment (Figure 13).

When using the second embodiment (Figure 14) where distinct frame merits &, m' are determined respectively for the U component and for the V component, the estimation of the Lagrangian cost presented above applies in a similar manner in the case of colour components U,V, except that no rate is dedicated to a chrominance quad-tree as it is considered here that the segmentation for chrominance frames follows the segmentation for the luminance frame. The Lagrangian cost for chrominance components can be estimated as follows: C-_______ -P,IC.V u U k.U kY -V V ic Vk.m and Vk.tfl The combined cost, taking into account luminance and chrominance, can thus be estimated by the following formula: = + + + + + RkV + RkQT.

This formula thus makes it possible to compute the Lagrangian cost in the competition between possible segmentations mentioned above and described in more details below, in the frame of the second embodiment (Figure 14).

For both embodiments, the distortions SkY and (or kUV in the first embodiment) are computed in practice by applying the quantizers selected at step S104 for the concerned block type, then by applying the associated dequantization and finally by comparing the result with the original residual. This last step can e.g. be done in the DCT transform domain because the IDCT is a L2 isometry and total distortion in the DCT domain is the same as the total pixel distortion, as already explained above.

Bit-rates R RkU and RkV R. . . can be IcY, (or k. in the first embodiment) evaluated without performing the entropy encoding of the quantized coefficients. This is because one knows the rate cost of each quantum of the quantifiers; this rate is simply computed from the probability of falling into this quantum and the probability is provided by the GGD channel modeling associated with the concerned block type.

Lastly, the size (more precisely the area) of a block impacts the cost formula through the geometrical parameters v vf and Yk (or Vft in the first embodiment) For instance, in the case of a 16x1 6-pixel unit area and a 4:2:0 YUV colour format, the number of blocks per area unit for 16x16 blocks is v[ = 1 for luminance blocks and v' = = = 2 for chrominance blocks. This last value comes from the fact that one needs two couples of 4x4 UV blocks to cover a unit area of size 16x16 pixels.

Similarly, the number of blocks per area unit for BxS blocks is v = 4 for luminance blocks and v = 8 for chrominance blocks.

In the case considered here were possible block sizes are 32x32, 16x16 and 8x8, the competition between possible segmentations performed at step Si 06 (already mentioned above) seeks to determine for each 32x32 area both: -the segmentation of this area into 32x32, I 6x1 6 or 8x8 blocks, -the choice of the type for each block, such that the cost is minimized.

This may lead to a very big number of possible configurations to evaluate.

Fortunately, by using the classical so-called bottom-to-top competition technique (based on the additivity of costs), one can dramatically decrease the number of configurations to deal with.

As shown in Figure 16 (left part), a i6x16 block is segmented into four 8x8 blocks. By using 8x8 cost competition (where the cost for each 8x8 block is computed based on the above formula for each possible block types of size OxO, including for the block type having a skip label, for which the rate is nil), the most competitive type (Le.

the type with the smallest cost) can be selected for each 8x8 block. Then, the cost C16.besto*8 associated with the 8x8 (best) segmentation is just the addition of the four underlying best 8x8 costs.

The bottom-to-top process can be used by comparing this best cost C16 using 8x8 blocks for the 1 6x1 6 block to costs computed for block types of size 16x16.

Figure 15 is based on the assumption (for clarity of presentation) that there are two possible i6x16 block types. Three costs are then to be compared: -the best 8x8 cost C16,'8 deduced from cost additivity; -the i6x16 cost Cis.ei using 16x16 block type 1, -the i6x16 cost Ci6,e2 using 16x16 block type 2.

The smallest cost among these 3 costs decides the segmentation and the types of the 16x16 block.

The bottom-to-top process is continued at a larger scale (in the present case where 32x32 blocks are to be considered); it may be noted that the process could have started at a lower scale (considering first 4x4 blocks) In this respect, the bottom-to-top competition is not limited to two different sizes, not even to square blocks.

By doing so for each 32x32 block of the frame, it is thus possible to define a new segmentation, defining a block type for each block of the segmentation (step S108).

Then, if the segmentation does not evolve anymore (Le. if the new segmentation is the same as the previous segmentation) or if a predetermined number of iterations has been reached, the process quits the loop and step Silo (described below) is proceeded with. Else, the process loops to step 5102 where OCT coefficients and GGD statistics will be computed based on the new segmentation.

It may be noted in this respect that the loop is needed because, after the first iteration, the statistics are not consistent anymore with the new segmentation (after having performed block type competition). However, after a small number of iterations (typically from 5 to 10), one observes a convergence of the iterative process to a local optimum for the segmentation.

The block type competition helps improving the compression performance of about 10%.

At step SilO, DCT coefficients are computed for the blocks defined in the (optimized) segmentation resulting from the optimization process (loop just described), Le. the new segmentation obtained at the last iteration of step 5108 and, for each block type defined in this segmentation, parameters (GGD statistics) representing the probabilistic distributions of the various DCT channels are computed. As noted above, this is done in conformity with steps 54 and 36 of Figure 11 described above.

Frame merits (ni1', m in the first embodiment, m* in the second embodiment)' block merits mk (for each block type) and optimal quantizers for the various block types and DCT channels can thus be determined at step 5112 thanks to the process of Figure 13 (first embodiment) or Figure 14 (second embodiment), using GGD statistics provided at step 5110 and based on the optimized segmentation.

The DCT coefficients of the blocks of the frames (which coefficients where computed at step Si 10) are then quantized at step S114 using the selected quantizers.

The quantized coefficients are then entropy encoded at step 5116 by any known coding technique like VLC coding or arithmetic coding. Context adaptive coding (CAVLC or CABAC) may also be used.

A bit stream to be transmitted can thus be computed based on encoded coefficients. The bit stream also includes parameters a, representative of the statistical distribution of coefficients computed at step Si 10, as well as a representation of the segmentation (quad tree) determined by the optimization process described above.

The bit stream may also include frame merits m7, rn, il (or determined at step Si 12.

Transmitting the frame merits makes it possible to select the quantizers for dequantization at the decoder according to a process similar to Figure 12 (with respect to the selection of quantizers), without the need to perform the dichotomy process.

According to a first possible embodiment (as just mentioned), the transmitted parameters may include the parameters defining the distribution for each OCT channel, La the parameter a (or equivalently the standard deviation c) and the parameter /3 computed at the encoder side for each DCI channel. as shown in step 322.

Based on these parameters received in the data stream, the decoder may deduce the quantizers to be used (a quantizer for each OCT channel) thanks to the selection process explained above at the encoder side (the only difference being that the parameters /3 are for instance computed from the original data at the encoder side whereas they are received at the decoder side).

Dequantization (step 332 of Figure 4) can thus be performed with the selected quantizers (which are the same as those used at encoding because they are selected the same way).

According to a second possible embodiment, the transmitted parameters may include a flag per DCT channel indicating whether the coefficients of the concerned OCT channel are encoded or not, and, for encoded channels, the parameters /3 and the standard deviation c (or equivalently the parameter a). This helps minimizing the amount of information to be sent because channel parameters are sent only for encoded channels. According to a possible variation, in addition to flags indicating whether the coefficients of a given OCT channel are encoded or not, information can be transmitted that designates, for each encoded OCT channel, the quantizer used at encoding. In this case, there is thus no need to perform a quantizer selection process at the decoder side.

Dequantization (step 332 of Figure 4) can thus be performed at the decoder by use of the identified quantizers for DCT channels having a received flag indicating the DCI channel was encoded.

Figure 17 shows the adaptive post-filtering applied at the encoder as mentioned above (see also Figure 1) in order to determine the parameters of post-filters to be used at the decoder.

As explained in the context of Figure 1, the enhanced image (he. the sum of the image obtained by decoding the base layer and by upsampling, and of the image obtained by decoding the enhancement layer) is reconstructed at the encoder (according to a process similar to the decoding process implemented at the decoder) in order to produce a rough decoded image.

A deblocking filter DBF, a sample adaptive offset filter SAO and an adaptive loop filter ALF are successively applied to obtain the (post-filtered) decoded version of the image. As already noted, parameters of these filters (in particular for the sample offset filter and then for the adaptive loop filter) are selected at this stage such that the post-filtered version is as close as possible to the original image (raw video), according to a proximity criterion, which is for instance in practice a Lagrangian cost (rate-distortion cost).

The deblocking filter DBF is a conventional HEVC deblocking filter as described for instance in JCTVC H1003. Such a deblocking filter receives as an input a quantization parameter QP. The quantization parameter is for instance used to adjust tap filters in the deblocking filter.

It is proposed here that the quantization parameter QP input to the deblocking filter DBF is determined by a first converter CONV1 based on the luminance frame merit m1' determined during the encoding process (see above step Si 12).

The quantization parameter QP input to the deblocking filter is for instance deduced from the luminance frame merit using a high rate asymptotic approximation on uniform quantifiers: QF=INT(3.1og,(m')+9). where INT is the integer truncation.

It may be noted that the same quantization parameter QP is used for the application of the deblocking filter to the luminance and chrominance components.

It is also possible to process in a distinct manner IDR ("Instantaneous Decoder Refresh") frames: the blocks of such a frame are processed considering the blocks as intra with their coded block flags set to 0. The quantization parameter QP input to the deblocking filter for IDR frames is computed by the first converter CONV1 in accordance with the following formula: QP=JNT(3.Iog2(-)+9) where INT is the integer truncation.

The sample adaptive offset filter SAO is a picture based SÃO filter having for instance the features proposed in JCTVC-G490 and JCTVC-G246.

It is proposed here that the rate-distortion slope (also called "lambda paramete?') )L input to the SÃO filter for a colour component * is determined by a second converter CONV2 based on the frame merit m determined during the encoding process (see above step SI 12) for the concerned colour component.

The rate-distortion slope is for instance used in the SÃO filter when computing the rate-distortion cost C (C=distortion+ A. .rate. where distortion is the distortion after SAC filtering and rate is the rate of coding the SÃO filter parameters) of each of a plurality of configurations of the filter, in order to select the configuration (pair geometry-intensity) having the lower cost.

The second converter CONV2 computes the rate-distortion slope A. by applying a conversion ratio to the corresponding frame merit n. This ratio depends on the area unit used when computing the block merits (and hence when taking into account the merit at pixel level) as described in connection with Figure 12.

In the present example, as mentioned above, the area unit is the area of a 16x16 block (vk = 1 for block types of size 16x16). For each component * the rate-distortion slope A. input to the SÃO filter is thus here taken as equal to the frame merit tnt. (It may be reminded here that the frame merit for both colour components in the first embodiment described in Figure 13 is the common chrominance merit mtTrt) The adaptive loop filter ALF is a conventional picture-based HEVC adaptive loop filter, for instance as described in JCTVC H0274 and JCTVC H0068.

It is proposed here that the rate-distortion slope 2 input to the adaptive loop filter ALF for the luminance component is the rate-distortion slope determined by the second converter CONV2 as described above (La based on the luminance frame merit,n' determined during the encoding process).

As for the SÃO filter, the rate-distortion slope is for instance used in the ALF filter when computing the rate-distortion cost C of each of a plurality of configurations of the filter, in order to select the configuration having the lower cost.

The common rate-distortion slope 2 input to the adaptive loop filter ALE for the chrominance components is determined by the second converter CONV2 based on the frame merit(s) for chrominance components.

In the context of the first embodiment mentioned above (Figure 13), the chrominance rate-distortion slope 2, is taken (by the second converter CONV2) as equal to the common chrominance merit mrn'.

In the context of the second embodiment mentioned above (Figure 14), the chrominance rate-distortion slope is for instance determined by the second converter CONV2 as the harmonic mean of the two chrominance merits m' ,m" resulting from the encoding process (step S112): = 1 2 1 in m Figure 18 shows the post-filtering applied at the decoder as mentioned above with reference to Figure 2.

As shown in Figure 2 and explained above, post-filtering is applied to the version of the image resulting from the sum of the decoded and upsampled base layer image and of the enhancement layer (residual) image.

As shown in Figure 18, the deblocking filter DBF, the sample adaptive offset filter SÃO and the adaptive loop filter ALF, each adjusted based on the parameters received in the bit stream 22 from the encoder, are successively applied to obtain the (post-filtered) decoded version of the image.

The sample adaptive offset filter SAO and the adaptive loop filter ALF do not need other input than the received parameters to proceed. The deblocking filter OBE on the other hand must be provided the quantification parameter OP.

In conformity with what was done at the encoder, the quantification parameter OP input to the deblocking filter DBF is provided by a first converter CONV1 based on the luminance frame merit mi'. As already explained above, the luminance frame merit in' is available at the decoder either because it was received in the channel model parameter bit stream 21 or because it is computed at the decoder based on the received statistic parameters a (or a) and /3 thanks to a method identical to the method performed at the encoder.

With reference now to Figure 19, a particular hardware configuration of a device for encoding or decoding images able to implement methods according to the invention is now described by way of example.

A device implementing the invention is for example a microcomputer 50, a workstation, a personal digital assistant, or a mobile telephone connected to various peripherals. According to yet another embodiment of the invention, the device is in the form of a photographic apparatus provided with a communication interface for allowing connection to a network.

The peripherals connected to the device comprise for example a digital camera 64, or a scanner or any other image acquisition or storage means, connected to an input/output card (not shown) and supplying image data to the device.

The device 50 comprises a communication bus 51 to which there are connected: -a central processing unit CPU 52 taking for example the form of a microprocessor; -a read only memory 53 in which may be contained the programs whose execution enables the methods according to the invention. It may be a flash memory or EEPROM; -a random access memory 54, which, after powering up of the device 50, contains the executable code of the programs of the invention necessary for the implementation of the invention. As this memory 54 is of random access type (RAM), it provides fast access compared to the read only memory 53. This RAM memory 54 stores in particular the various images and the various blocks of pixels as the processing is carried out (transform, quantization, storage of the reference images) on the video sequences; -a screen 55 for displaying data, in particular video andfor serving as a graphical interface with the user, who may thus interact with the programs according to the invention, using a keyboard 56 or any other means such as a pointing device, for example a mouse 57 or an optical stylus; -a hard disk 58 or a storage memory, such as a memory of compact flash type, able to contain the programs of the invention as well as data used or produced on implementation of the invention; -an optional diskette drive 59, or another reader for a removable data carrier, adapted to receive a diskette 63 and to read/write thereon data processed or to process in accordance with the invention; and -a communication interface 60 connected to the telecommunications network 61, the interface 60 being adapted to transmit and receive data.

In the case of audio data, the device 50 is preferably equipped with an input/output card (not shown) which is connected to a microphone 62.

The communication bus 51 permits communication and interoperability between the different elements included in the device 50 or connected to it. The representation of the bus 51 is non-limiting and, in particular, the central processing unit 52 unit may communicate instructions to any element of the device 50 directly or by means of another element of the device 50.

The diskettes 63 can be replaced by any information carrier such as a compact disc (CD-ROM) rewritable or not, a ZIP disk or a memory card. Generally, an information storage means, which can be read by a micro-computer or microprocessor, integrated or not into the device for processing a video sequence, and which may possibly be removable, is adapted to store one or more programs whose execution permits the implementation of the method according to the invention.

The executable code enabling the coding device to implement the invention may equally well be stored in read only memory 53, on the hard disk 58 or on a removable digital medium such as a diskette 63 as described earlier. According to a variant, the executable code of the programs is received by the intermediary of the telecommunications network 61, via the interface 60, to be stored in one of the storage means of the device 50 (such as the hard disk 58) before being executed.

The central processing unit 52 controls and directs the execution of the instructions or portions of software code of the program or programs of the invention, the instructions or portions of software code being stored in one of the aforementioned storage means. On powering up of the device 50, the program or programs which are stored in a non-volatile memory, for example the hard disk 58 or the read only memory 53, are transferred into the random-access memory 54, which then contains the executable code of the program or programs of the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.

It will also be noted that the device implementing the invention or incorporating it may be implemented in the form of a programmed apparatus. For example, such a device may then contain the code of the computer program(s) in a fixed form in an application specific integrated circuit (ASIC).

The device described here and, particularly, the central processing unit 52, may implement all or part of the processing operations described in relation with Figures 1 to 16, to implement methods according to the present invention and constitute devices according to the present invention.

The above examples are merely embodiments of the invention, which is not limited thereby.

Claims

CLAIMS1. A method for encoding image data representing at least one original image, comprising the steps of: -encoding pixel values into coefficients forming at least a layer; -decoding the coefficients to obtain a rough decoded image corresponding to the original image; -processing the rough decoded image through at least one adaptive post-filter adjustable depending on a parameter, wherein said parameter is derived based on pixel values and input to the adaptive post-filter.
2. A method according to claim 1, wherein the adaptive post-filter is defined to minimise a rate-distortion cost computed based on said parameter.
3. A method according to claim 1 or 2, wherein the adaptive post-filter is a sample adaptive offset filter and wherein the parameter is a rate-distortion slope.
4. A method according to claim 1 or 2, wherein the adaptive post-filter is an adaptive loop filter and wherein the parameter is a rate-distortion slope.
5. A method according to claim 1, wherein the post-filter is a deblocking filter and wherein the parameter is a quantization parameter.
6. A method according to any of claims 1 to 5, wherein the step of encoding pixel values includes encoding a base layer version of the image into base layer coefficients and encoding a residual image, obtained by difference between the original image and the base layer version, into enhancement layer coefficients.
7. A method according to claim 6, wherein the rough decoded image is obtained by summing a first version obtained by decoding the base layer coefficients and a second version obtained by decoding the enhancement layer coefficients.
8. A method according to any of claims 6 or 7. comprising the following steps: -for each block of pixels of a plurality of blocks of pixels comprised in the image, transforming samples representative of pixel values of the residual image into a set of coefficients each having a coefficient type; -for each coefficient type, determining statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -based on the determined statistics parameters, determining quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; -determining the parameter input to the adaptive post-filter based on the frame merit.
9. A method according to claim 8, wherein the quantizers and corresponding frame merit are determined such that a video merit, computed based on the frame merit and on distortions respectively resulting from the use of the quantizers, corresponds to a target video merit.
10. A method according to claim 8 or 9, wherein encoding the pixel values includes quantizing coefficients having a given type with the quantizer determined for this coefficient type.
11. A method for post-filtering reconstructed image data, wherein said image data are representative of at least one original image comprising a plurality of blocks of pixels encoded according to an encoding method, the method comprising the following steps: -for each block of pixels, transforming samples representative of pixel values into a set of coefficients each having a coefficient type; -for each coefficient type, determining statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -based on the determined statistics parameters, determining quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; wherein the post filtering is adjustable as a function of an input parameter depending on the frame merit.
12. A method according to claim 11 wherein the post filtering is performed on an image encoder side.
13. A method according to claim 11, wherein the post filtering is performed on an INJTRA image in a video encoder, and the resulting post filtered image is introduced in a temporal prediction loop to be used as a reference image for a next image.
14. A method according to claim 11, wherein the post filtering is performed on an image decoder side.
15. A method according to claim 11, wherein the post filtering is performed on an INTRA image in a video decoder, and the resulting post filtered image is introduced in a temporal prediction loop to be used as a reference image for a next image.
16. A method according to any previous claim from claim 11 to claim 15, wherein the post-filtering is defined to minimise a rate-distortion cost computed based on said parameter.
17. A method according to any previous claim from claim 11 to claim 16, wherein the post-filtering comprises a deblocking filter and wherein the parameter is a quantization parameter.
18. A method according to any previous claim from claim 11 to claim 17, wherein the post-filtering comprises a sample adaptive offset filter and wherein the parameter is a rate-distortion slope.
19. A method according to any previous claim from claim 11 to claim 18, wherein the post-filtering comprises an adaptive loop filter and wherein the parameter is a rate-distortion slope.
20. A method for decoding image data representing at least one original image, wherein the method comprises the method of post filtering as described in claim 11 or anyof claims l4to 19.
21. A device for encoding image data representing at least one original image, comprising: -an encoding module for encoding pixel values into coefficients forming at least a layer; -a decoding module for decoding the coefficients to obtain a rough decoded image corresponding to the original image; -a processing module for processing the rough decoded image through at least one adaptive post-filter adjustable depending on a parameter, wherein the processing module is configured to derive said parameter based on pixel values and to input the derived parameter to the adaptive post-filter.
22. A device according to claim 21, wherein the processing module is configured to define the adaptive post-filter to minimise a rate-distortion cost computed based on said parameter.
23. A device according to claim 21 or 22, wherein the adaptive post-filter is a sample adaptive offset filter and wherein the parameter is a rate-distortion slope.
24. A device according to claim 21 or 22, wherein the adaptive post-filter is an adaptive loop filter and wherein the parameter is a rate-distortion slope.
25. A device according to claim 21, wherein the post-filter is a deblocking filter and wherein the parameter is a quantization parameter.
26. A device according to any of claims 21 to 25, wherein the encoding module is configured to encode a base layer version of the image into base layer coefficients and to encode a residual image, obtained by difference between the original image and the base layer version, into enhancement layer coefficients.
27. A device according to claim 26, wherein the decoding module is configured to obtain the rough decoded image by summing a first version obtained by decoding the base layer coefficients and a second version obtained by decoding the enhancement layer coefficients.
28. A device according to any of claims 26 or 27, comprising: -a transforming module for transforming, for each block of pixels of a plurality of blocks of pixels comprised in the image, samples representative of pixel values of the residual image into a set of coefficients each having a coefficient type; -a statistic determining module for determining, for each coefficient type, statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -a quantizer determining module for determining, based on the determined statistics parameters, quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; -a parameter determining module for determining the parameter input to the adaptive post-filter based on the frame merit.
29. A device according to claim 28, wherein the quantizer determining module is configured to determine the quantizers and corresponding frame merit such that a video merit, computed based on the frame merit and on distortions respectively resulting from the use of the quantizers, corresponds to a target video merit.
30. A device according to claim 28 or 29, wherein the encoding module is configured to quantize coefficients having a given type with the quantizer determined for this coefficient type.
31. A device for post-filtering reconstructed image data, wherein said image data are representative of at least one original image comprising a plurality of blocks of pixels encoded according to an encoding method, the device comprising: -a transforming module for transforming, for each block of pixels, samples representative of pixel values into a set of coefficients each having a coefficient type; -a statistic determining module for determining, for each coefficient type, statistic parameters representative of a statistical distribution of the coefficients having the concerned coefficient type; -a quantizer determining module for determining, based on the determined statistic parameters, quantizers associated with coefficient types and a corresponding frame merit such that a distortion and/or rate at the image level fulfils a given criterion; -an adjusting module for adjusting the post filtering as a function of an input parameter depending on the frame merit.
32. A device according to claim 31, wherein the post filtering is performed on an image encoder side.
33. A device according to claim 31, wherein the post filtering is performed on an INTRA image in a video encoder, and the resulting post filtered image is introduced in a temporal prediction loop to be used as a reference image for a next image.
34. A device according to claim 31, wherein the post filtering is performed on an image decoder side.
35. A device according to claim 31 wherein the post filtering is performed on an INTRA image in a video decoder, and the resulting post filtered image is introduced in a temporal prediction loop to be used as a reference image for a next image.
36. A device according to any previous claim from claim 31 to claim 35.wherein the processing module is configured to define the post-filtering in order to minimise a rate-distortion cost computed based on said parameter.
37. A device according to any previous claim from claim 31 to claim 36, wherein the post-filtering comprises a deblocking filter and wherein the parameter is a quantization parameter.
38. A device according to any previous claim from claim 31 to claim 37, wherein the post-filtering comprises a sample adaptive offset filter and wherein the parameter is a rate-distortion slope.
39. A device according to any previous claim from claim 31 to claim 38, wherein the post-filtering comprises an adaptive loop filter and wherein the parameter is a rate-distortion slope.
40. A device for decoding image data representing at least one original image, wherein the device is configure to perform the method of post filtering as described in claim 11 or any of claims 14 to 19.
41. Information storage means, possibly totally or partially removable, able to be read by a computer system, comprising instructions for a computer program adapted to implement a method according to any one of the Claims ito 19, when this program is loaded into and executed by the computer system.
42. Computer program product able to be read by a microprocessor, comprising portions of software code adapted to implement a method according to any one of the Claims Ito 19, when it is loaded into and executed by the microprocessor.
43. An encoding device for encoding an image substantially as herein described with reference to, and as shown in, Figures 1 and 3 of the accompanying drawings.