GB2492397A

GB2492397A - Encoding and decoding residual image data using probabilistic models

Info

Publication number: GB2492397A
Application number: GB201111199A
Authority: GB
Inventors: Fabrice Le Leannec; Sebastien Lasserre
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-06-30
Filing date: 2011-06-30
Publication date: 2013-01-02
Also published as: WO2013000575A1; GB201111199D0

Abstract

A method of encoding video data comprises encoding video data having a first resolution in conformity with HEVC 13 to obtain video data of a base layer 14 and decoding the base-layer video data in conformity with HEVC 15. The decoded base-layer video data is up sampled 16 to generate decoded video data having a second resolution higher than the first resolution. A difference is formed 17 between the generated decoded video data having the second resolution and the original video data to generate data of a residual image. The residual-image data is compressed 18, 19 to generate video data of an enhancement layer 20. The compression of the residual data does not involve temporal or spatial prediction but is carried out by the application of a discrete cosine transform to the data and the employment of a parametric probabilistic model of the discrete cosine transform coefficients. This parametric model is then used to choose quantisers from a pool of available quantisers for quantizing the residual data. The method can be used to provide a Ultra high definition (UHD) codec with low complexity based on scalable encoding.

Description

ENCODING AND DECODING AN IMAGE

FIELD OF THE INVENTION

The present invention concerns methods for encoding and decoding an image, and associated apparatuses and programs.

The invention is particularly useful for the encoding of digital video sequences made of images or "frames".

BACKGROUND OF THE INVENTION

Video compression algorithms, such as those standardized by the standardization organizations ITU, ISO, and SMPTE, exploit the spatial and temporal redundancies of images in order to generate bitstreams of data of smaller size than original video sequences. These powerful video compression tools, known as spatial (or intra) and temporal (or inter) predictions, make the transmission and/or the storage of video sequences more efficient.

ITU-T and ISO/MPEG decided to launch in January 2010 a new video coding standard project, named High Efficiency Video Coding (HEVC).

The HEVC codec design is similar to that of most previous so-called block-based hybrid transform codecs such as H.263, H.264, MPEG-i, MPEG-2, MPEG-4, SVC. The HEVC codec uses the spatial and temporal redundancies of the images in order to generate data bit streams of reduced size compared with these video sequences. Such compressions make the transmission and/or storage of the video sequences more effective.

Video encoders and/or decoders (codecs) are often embedded in portable devices with limited resources, such as cameras or camcorders. Conventional embedded codecs can process at best high definition (HD) digital videos, i.e 1 080x1 920 pixel frames.

Real time encoding and decoding are however limited by the limited resources of the portable devices, especially regarding slow access to the working memory (e.g. random access memory, or RAM) and regarding the central processing unit (CPU).

This is particularly striking for the encoding or decoding of ultra-high definition (UHD) digital videos that are about to be handled by the latest cameras. This is because the amount of pixel data to consider for spatial or temporal prediction is huge.

UHD is typically four times (4k2k pixels) the definition of an HO video which is the current standard definition video. Furthermore, very ultra high definition, which is sixteen times that definition (i.e. 8k4k pixels), is even being considered in a more long-term future.

Taking account of these constraints in terms of limited power and memory access bandwidth, it is desirable to provide a UHD codec with low complexity based on scalable encoding.

SUMMARY OF THE INVENTION

cist to 4th aspects> According to a 1 aspect of the present invention there is provided a method of encoding video data comprising: encoding video data having a first resolution in conformity with HEVC to obtain video data of a base layer; and decoding the base-layer video data in conformity with HEVC, upsampling the decoded base-layer video data to generate decoded video data having a second resolution higher than said first resolution, forming a difference between the generated decoded video data having said second resolution and further video data, having said second resolution and corresponding to said first-resolution video data, to generate data of a residual image, and compressing the residual-image data to generate video data of an enhancement layer.

Preferably, the compression of the residual-image data does not involve temporal prediction.

Preferably, the compression of the residual-image data does not involve spatial prediction.

Preferably, the compression of the residual data employs a method embodying one or more of the 5th gth 111h and 151h aspects of the present invention described hereinafter.

In one embodiment, the compression of the residual-image data comprises applying a discrete cosine transformation (DCT) to obtain OCT coefficients.

Preferably the method further comprises employing a parametric probabilistic model of the DGT coefficients.

Such a parametric probabilistic model is preferably obtained for each type of DCI coefficient.

The DOT may be a block-based DOT, in which case such a parametric probabilistic model is preferably obtained for each DOT coefficient position within a DCI block.

In such a case also the method may further comprise fitting the parametric probabilistic model onto respective collocated DOT coefficients of some or all DCI blocks of the residual image.

In one embodiment, the compression of the residual-image data comprises employing the parametric probabilistic model to choose quantizers from a pool of available quantizers and quantizing the residual data using the chosen quantizers.

The available quantizers of said pool are preferably pre-computed quantizeis dedicated to a parametric probabilistic model.

In another embodiment the compression of the residual-image data comprises entropy encoding of quantized symbols and employing a parametric probabilistic model to obtain a probabilistic distribution of possible symbols of an alphabet associated with each DCI coefficient, which alphabet is used for the entropy encoding.

The same parametric probabilistic model is preferably employed both for choosing quantizers and for entropy encoding.

The parametric probabilistic model is a Generalised Gaussian Distribution GGD(c, 3) having a zero mean. This requires only two parameters, which makes it possible to supply the parameters to the decoder with low overhead.

Preferably, parameters of the parametric probabilistic model are determined in dependence upon one or more of: -a video content; -an index of the DOT coefficient within a DOT block; -an encoding mode used for a collocated block of the base layer; and -a size of block to encode.

Preferably, information about parameters of the parametric probabilistic model is supplied to a decoder.

In one embodiment, information about a set of saturated coefficients determined at the encoder to minimize a rate is supplied to a decoder.

In another embodiment, information about the chosen quantizers is supplied to a decoder.

Preferably, the quantizers are chosen based on a rate-distortion criterion.

According to a 2 aspect of the present invention there is provided a method of decoding a scalable bitstream, comprising: decoding, in conformity with HEVC, encoded video data of a base layer of the bitstream to obtain video data having a first resolution, and upsampling the first-resolution video data to generate video data having a second resolution higher than said first resolution; and decoding compressed video data of an enhancement layer of the bitstream to obtain data of a residual image, and forming a sum of the generated second-resolution video data and the residual-image data to generate decoded video data having said second resolution.

Preferably, the decoding ot the compressed video data of the enhancement layer does not involve temporal prediction.

Preferably, the decoding of the compressed video data of the enhancement layer does not involve spatial prediction.

Preferably, the decoding of the compressed video data of the enhancement layer employs a method embodying one or more of the 2nd 61h 13th 171h and 20th aspects of the present invention described hereinafter.

In one embodiment, the compressed video data of the enhancement layer comprises encoded discrete cosine transformed (DCT) coefficients.

In such a case, preferably, the method further comprises employing a parametric probabilistic model of the DCI coefficients.

Preferably, such a parametric probabilistic model is obtained for each type of DOT coefficient.

The DCT may be a block-based DOT, in which case such a parametric probabilistic model is obtained for each DCT coefficient position within a DOT block.

In one embodiment, the decoding of the compressed video data of the enhancement layer comprises employing the parametric probabilistic model to choose quantizers from a pool of available quantizers and using the chosen quantizers for inverse quantization of the encoded DCI coefficients.

Preferably, the available quantizers of said pool are pre-computed quantizers dedicated to a parametric probabilistic model.

In another embodiment, the decoding of the compressed video data of the enhancement layer comprises entropy decoding of encoded quantized symbols obtained from the compressed video data to generate quantized DOT coefficients and employing a parametric probabilistic model to obtain a probabilistic distribution of possible symbols of an alphabet associated with each DCT coefficient, which alphabet is used for the entropy decoding.

The same parametric probabilistic model is preferably employed both for choosing quantizers and for entropy decoding.

In one embodiment, the parametric probabilistic model is a Generalised Gaussian Distribution GGD(Q, 3) having a zero mean. Such a model has only two parameters which means that information about the parameters can be received from the encoder with low overhead.

Information about parameters of the parametric probabilistic model can be received from an encoder.

Alternatively, information about a set of saturated coefficients determined at an encoder to minimize a rate is received from the encoder.

In one embodiment, information about the chosen quantizers is received from an encoder.

The quantizers are preferably chosen based on a rate-distortion criterion.

According to a 3rd aspect of the present invention, there is provided apparatus for encoding video data comprising: means for encoding video data having a first resolution in conformity with HEVC to obtain video data of a base layer; means for decoding the base-layer video data in conformity with HEVC; means for upsampling the decoded base-layer video data to generate decoded video data having a second resolution higher than said first resolution; means for forming a difference between the generated decoded video data having said second resolution and further video data, having said second resolution and corresponding to said first-resolution video data, to generate data of a residual image; and means for compressing the residual-image data to generate video data of an enhancement layer.

Preferably, the means for compressing the residual data comprises apparatus embodying one or more of the 3rd 7th 101h 12th and 16th aspects of the present invention described hereinafter.

Accordng to a 41h aspect of the present invention, there is provided apparatus for decoding a scalable bitstream, comprising: means for decoding, in conformity with HEVC, encoded video data of a base layer of the bitstream to obtain video data having a first resolution; means for upsampling the first-resolution video data to generate video data having a second resolution higher than said first resolution; means for decoding compressed video data of an enhancement layer of the bitstream to obtain data of a residual image; and means for forming a sum of the generated second-resolution video data and the residual-image data to generate decoded video data having said second resolution.

Preferably, the means for decoding of the compressed video data of the enhancement layer comprises apparatus embodying one or more of the 4th gLh 14th 13th and 21' aspects of the present invention described hereinafter.

csth to 8h aspects> 5th to 8" aspects of the present invention, set out below, may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.

These aspects of the invention are particularly advantageous when encoding images without prediction.

The features of the 5th to 8th aspects of the invention may be provided in combination with the features of the 1' to aspects but this is not essential and it is possible to use the features of the 5th to 8th aspects independently of the features of the is' to 4th aspects.

According to the 5th aspect of the present invention, there is provided a method for encoding at least one block of pixels, the method comprising: -transforming pixel values in the spatial domain for said block into a plurality of coefficients each having a coefficient type; -determining a set of quantizers each associated with a corresponding coefficient type such that an estimated quality parameter in the spatial domain, expressed as a function of estimated distortions each associated with a quantizer of the set, meets a predetermined criterion and such that the sum of rates each associated with a quantizer of the set is minimised; -quantizing at least one coefficient having a given coefficient type into a quantized symbol using the quantizer determined for said given coefficient type; -encoding the quantized symbol.

Thus, although an optimal set of quantizers is determined for quantizing coefficients (in the frequency domain), optimisation is performed in view of a quality criterion in the spatial domain, where the image is eventually to be shown to the user.

The step of transforming pixel values corresponds for instance to a transformation from the spatial domain (pixels) to the frequency domain (e.g. into coefficients each corresponding to a specific spatial frequency). For example, the transforming step includes applying a block based Discrete Cosine Transform; each of said coefficient types may then correspond to a respective coefficient index.or position Said estimated quality parameter depends for instance on the sum of the squared estimated distortions associated with quantizers of the set. Such a parameter is linked to the PSNR as explained below and is thus of particular interest.

In the embodiment described below, the method comprises a step of computing said pixel values by subtracting values obtained by decoding a base layer to values representing pixels of an image. In this application, the pixel values to be encoded thus correspond to residual values and form, once encoded, an enhancement layer. When the base layer uses at least two distinct encoding modes, a set of quantizers may be determined for each of said two encoding modes. As explained below, the optimal quantizer may vary greatly depending on the encoding mode concerned.

It may be provided that the step of determining the set of quantizers includes selecting, among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type such that the sum of the rates associated with selected quantizers is minimal and the estimated quality parameter resulting from the distortions associated with said selected quantizers corresponds to a predetermined target. Selection among optimal quantizers is an interesting practical solution to perform the necessary optimisation.

In the same respect, in accordance with the embodiment proposed here and described below, said selecting the optimal quantizers for each coefficient type may for instance be performed by minimisation under Karush-Kuhn-Tucker necessary conditions, possibly by a fixed point algorithm.

The method may also include, for at least one coefficient type, a step of determining a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type, wherein optimal quantizers, among which the optimal quantizer for said at least one coefficient type is selected, are determined based on said probabilistic model. The optimisation process may thus be performed based on the probabilistic model identified for the concerned coefficient type, which is a convenient way to take into account effective values of the coefficient in the optimisation process.

Said probabilistic model is for instance one of a plurality of probabilistic models each defined by model parameters; parameters defining optimal quantizers may be stored in association with corresponding model parameters. Optimal quantizers are thus tabulated depending on the various probabilistic models that may be encountered. Distinct probabilistic models may be determined for the plurality of coefficient types. This improves flexibility in the optimisation process.

According to the embodiment proposed below, optimal quantizers, associated with a rate and distortion, and among which the optimal quantizer for said at least one coefficient type is selected, are determined off-line, i.e. in an apparatus possibly distinct from the encoding device and prior to the step of transforming pixel values. Parameters defining the resulting optimal quantizers may then be stored as mentioned above in the encoding device in order to perform the optimisation process.

A quantizer is for instance defined by at least one parameter taken among a number of quantization intervals, a limit value for a quantization interval and a centroid of a quantization interval.

Quantization and encoding may be performed for each and every coefficients resulting from the transforming step.

However, in the embodiment proposed here and described below, the method comprises a step of determining, for each coefficient type, an estimated value representative of a ratio between a distortion variation (e.g. here decrease) provided by encoding a coefficient having the concerned type and a rate increase resulting from encoding said coefficient; the step of quantizing coefficients may then be applied only to a subset of said plurality of coefficients (i.e. the coefficients subjected to the quantization step form such a subset) and the estimated values for coefficient types of coefficients in said subset may then be larger than the highest estimated value over coefficient types of coefficients not included in said subset.

Thanks to the estimated value or ratio for each coefficient type, it is possible to order the various coefficient types by decreasing estimated value, i.e. by decreasing merit of encoding as explained below, and the encoding process may be applied only to coefficients having the higher values (lie, ratios), forming the subset defined above.

The step of determining the set of quantizers may include a step of determining sets of quantizers each associated with a possible subset of said plurality of coefficients and a step of selecting, among said sets of quantizers, a set for which the sum of rates each associated with a quantizer of the set is minimal, thus selecting an associated subset. The number of coefficients to be quantized and encoded is thus determined during the optimisation process. A plurality of possible subsets are considered; however, as the coefficient types are ordered by decreasing encoding merit, only N+1 subsets need be considered if N is the total number of coefficients.

According to the 6th aspect of the present invention, there is provided a method for decoding a set of data representing at least one block of pixels, the method comprising: -decoding said data into a plurality of symbols each corresponding to a coefficient type; -determining a set of quantizers each associated with a corresponding coefficient type such that an estimated quality parameter in the spatial domain, expressed as a function of estimated distortions each associated with a quantizer of the set, meets a predetermined criterion and such that the sum of rates each associated with a quantizer of the set is minimised; -dequantizing at last one symbol corresponding to a given coefficient type into a dequantized coefficient using the quantizer determined for said given coefficient type; -transforming a plurality of dequantized coeffcient into pixel values in the spatial domain for said block.

The step of determining the set of quantizers may include estimating distortion for a coefficient type based on a parameter defining a distribution of the values of coefficients of said coefficient type and included in the set of data.

According to the 71h aspect of the present invention, there is provided apparatus for encoding at least one block of pixels, comprising: -means for transforming pixel values in the spatial domain for said block into a plurality of coefficients each having a coefficient type; -means for determining a set of quantizers each associated with a corresponding coefficient type such that an estimated quality parameter in the spatial domain, expressed as a function of estimated distortions each associated with a quantizer of the set, meets a predetermined criterion and such that the sum of rates each associated with a quantizer of the set is minimised; -means for quantizing at least one coefficient having a given coefficient type into a quantized symbol using the quantizer determined for said given coefficient type; -means for encoding the quantized symbol.

According to the 81h aspect of the present invention, there is provided apparatus for decoding a set of data representing at least one block of pixels, comprising: -means for decoding said data into a plurality of symbols each corresponding to a coefficient type; -means for determining a set of quantizers each associated with a corresponding coefficient type such that an estimated quality parameter in the spatial domain, expressed as a function of estimated distortions each associated with a quantizer of the set, meets a predetermined criterion and such that the sum of rates each associated with a quantizer of the set is minimised; -means for dequantizing at last one symbol corresponding to a given coefficient type into a dequantized coefficient using the quantizer determined for said given coefficient type; and -means for transforming a plurality of dequantized coefficient into pixel values in the spatial domain for said block.

Optional features proposed above in connection with the encoding method may also apply to the decoding method, encoding device and decoding device just mentioned.

<9th & io°' aspects> 9th and 10°' aspects of the present invention, set out below, may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.

The features of the gth and 101h aspects of the invention may be provided in combination with the features of the Vt to 41h and/or 5°' to 8°' aspects but this is not essential and it is possible to use the features of the 91h and 10th aspects independently of the features of the 1' to 4th aspects and of the 5°' to 81h aspects.

According to the 9°' aspect of the present invention, there is provided a method for encoding at least one block of pixels, the method comprising: -transforming pixel values for said block into a set of coefficients each having a coefficient type; -determining, for each coefficient type, an estimated value representative of a ratio between a distortion variation provided by encoding a coefficient having the concerned type and a rate increase resulting from encoding said coefficient; -subjecting coefficients of said set to a quantization step to produce quantized symbols, wherein the subjected coefficients form a subset of said set and wherein the estimated values for coefficient types of coefficients in said subset are larger than the highest estimated value over coefficient types of coefficients not included in said subset; and -encoding the quantized symbols.

Thanks to the estimated value or ratio tor each coefficient type, it is possible to order the various coefficient types by decreasing estimated value, Le. by decreasing merit of encoding as explained below, and the encoding process may be applied only to coefficients having the higher values (i.e. ratios), forming the subset defined above.

Said distortion variation is for instance provided when no prior encoding has occurred for the coefficient having the concerned type. This amounts to taking into account the initial merit which makes it possible to keep an optimal result, as shown below.

The method may include a step of computing said pixel values by subtracting values obtained by decoding a base layer to values representing pixels of an image. The pixel values are for instance representative of residual data to be encoded into an enhancement layer.

The estimated value for the concerned coefficient type may be computed depending on a coding mode of a corresponding block in the base layer (i.e. for each of a plurality of such coding modes).

The following steps may also be included: -for each of a plurality of possible subsets each comprising a lespective number of first coefficients when coefficients are ordered by decreasing estimated value of their respective coefficient type, selecting quantizers for coefficients of the concerned possible subset such that the distortions associated with the selected quantizers meet a predetermined criterion; -selecting, among said possible subsets, the subset minimising the rate obtained by using the quantizers selected for said subset, wherein the subset of subjected coefficients is the selected subset.

The number of coefficients to be quantized and encoded is thus determined during the optimisation process. A plurality of possible subsets are considered; however, as the coefficient types are ordered by decreasing encoding merit, only N+1 subsets need be considered if N is the total number of coefficients.

For instance, a quantizer is selected for each of a plurality of coefficient types, for each of a plurality of block sizes and for each of a plurality of base layer coding mode.

The step of selecting quantizers may be performed by selecting, among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type associated with the possible subset concerned, such that the sum of the rates associated with selected quantizers is minimal and the global distortion resulting from the distortions associated with said selected quantizers corresponds to a predetermined distortion. Such an implementation is particularly interesting in practice.

For each coefficient type, an optimal quantizer may ne selected for each of a plurality of block sizes and for each of a plurality of base layer coding mode.

The method may further include, for at least one coefficient type, a step of determining a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type, wherein said estimated value for said at least one coefficient type is computed based on said probabilistic model. Ordering of coefficients according to their encoding merit may thus be performed based on the probabilistic model identified for the various coefficient types, which is a convenient way to take into account effective values of the coefficient in the process.

Said estimated value for a given coefficient type may in practice be computed using a derivative of a function associating rate and distortion of optimal quantizers for said coefficient type. Such rate and distortion of optimal quantizers are for example stored for a great number of possible values of the various parameters, as explained below. This allows a practical implementation.

Precisely, said estimated value for a given coefficient type may be IC2 determined by computing, where a,, is the standard variation among coefficients of said type and J is a function associating rate Rr and distortion D of optimal quantizers for coefficients of said type n and defined as follows: R =f(-ln(D /g)) . , For instance, j,. (0) is determined using values stored in association with the determined probabilistic model.

The step of transforming pixel values corresponds for instance to a transformation from the spatial domain (pixels) to the frequency domain (e.g. into coefficients each corresponding to a specific spatial frequency). The transforming step includes for instance applying a block based Discrete Cosine Transform; each of said coefficient types may then correspond to a respective coefficient index.

According to the 13th aspect of the present invention, there is provided apparatus for encoding at least one block of pixels, comprising: -means for transforming pixel values for said block into a set of coefficients each having a coefficient type; -means for determining, for each coefficient type, an estimated value representative of a ratio between a distortion variation provided by encoding a coefficient having the concerned type and a rate increase resulting from encoding said coefficient; -means for subjecting coefficients of said set to a quantization step to produce quantized symbols, wherein the subjected coefficients form a subset of said set and wherein the estimated values for coefficient types of coefficients in said subset are larger than the highest estimated value over coefficient types of coefficients not included in said subset; and -means for encoding the quantized symbols.

Optional features proposed above in connection with the encoding method may also apply to the encoding apparatus just mentioned.

<1 1th to 14th aspects> 11th to 14th aspects of the present invention, set out below, may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.

The features of the 1 1°' to lzt' aspects of the invention may be provided in combination with the features of the 1st to 4th and/or 51h to 8 and/or 9th and 10th aspects but this is not essential and it is possible to use the features of the 1 1th to 14th aspects independently of the features of the 1st to 4th aspects and of the 51h to 8th aspects and of the gth and 101h aspects.

According to the 11th aspect of the invention, there is provided a method for encoding a first image comprising blocks of pixels, the method comprising: -transforming blocks of pixel data into blocks of transformed block coefficients; -quantizing the transformed block coefficients into quantized symbols; -obtaining alphabets of symbols, each alphabet defining the possible symbols of an associated quantized transformed block coefficient; -for each alphabet, obtaining a probabilistic distribution of the possible symbols defined therein; -using the probabilistic distributions to group at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two transformed block coefficients associated with the grouped alphabets; -obtaining binary entropy codes for the possible symbols of each remaining alphabet; and -encoding the quantized symbols using the obtained binary codes.

This aspect of the invention is applicable to any block-based video codec for which a probabilistic distribution of transformed coefficients, such as OCT coefficients, is available. The probabilistic distribution of the corresponding quantized symbols may then be directly obtained, based on parameters of the quantization (i.e. of the used quantizer).

The grouping of alphabets is to be understood as a product of those alphabets, as set out for example in the book "Elements of Information Theory" (TM.

Cover, J.A. Thomas, Second Edition., ed. Wiley, 2006).

For instance, the alphabet resulting from the grouping of two alphabets A={a} and B{b} is made of all the pairs of symbols (ai, b).

Thanks to this aspect of the invention, the mean length (i.e. entropy) of the used binary codes is substantially improved compared to the conventional approaches, and is made closer to the theoretical entropy of the probabilistic distributions. Better compression of the image is thus obtained.

This is due to the grouping of alphabets as recited above, which corresponds to using only one symbol (one binary code) to encode together at least two quantized coefficients. Indeed, the grouping of alphabets makes it possible to drastically reduce the maximum overhead for each symbol to encode, which is one bit for the conventional Huffman coding. For example, by grouping N same alphabets, the maximum overhead for one symbol to encode falls to 1/N bit on average for each grouped alphabet.

Furthermore, according to this aspect of the invention, the grouping of alphabets is based on the probabilistic distributions of the symbols, enabling efficient grouping of the alphabets to improve the mean length of the binary codes used. This is because, as will become apparent below, the probabilistic distributions make it possible to distinguish the alphabets having small entropy and those having high entropy, to adapt the grouping. An alphabet with small entropy (i.e. generally having the worst mean overhead) may then be "hidden" into a new alphabet with an alphabet having high entropy (which has a low mean overhead).

According to the 12th aspect of the present invention, there is provided apparatus for encoding a first image comprising blocks of pixels, the apparatus comprising: -a transformation module for transforming blocks of pixel data into blocks of transformed block coefficients; -a quantization filter for quantizing the transformed block coefficients into quantized symbols; -an alphabet generator for generating alphabets of symbols, each alphabet defining the possible symbols of an associated transformed block coefficient due to the quantization; -a probabilistic module for obtaining, for each alphabet, a probabilistic distribution of the possible symbols defined therein; -a grouping module configured to group, using the probabilistic distributions, at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two transformed block coefficients associated with the grouped alphabets; -a binary code generator for generating, for each remaining alphabet, binary entropy codes for the possible symbols of each remaining alphabet; and -an encoding module for encoding the quantized symbols using the obtained binary codes.

According to the 131h aspect of the present invention, there is provided a method for decoding a data bit-stream of an encoded first image, the method comprising: -obtaining, from the bit-stream, blocks of encoded symbols corresponding to quantized block coefficients; -obtaining alphabets of symbols, each alphabet defining the possible symbols of an associated quantized block coefficient; -for each alphabet, obtaining a probabilistic distribution of the possible symbols defined therein; -using the probabilistic distributions to group at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two quantized block coefficients associated with the grouped alphabets; -obtaining binary entropy codes for the possible symbols of each remaining alphabet; -decoding the encoded symbols using the obtained binary codes.

According to the 14th aspect of the present invention, there is provided apparatus for decoding a data bit-stream of an encoded first image, the apparatus comprising: -a parser for obtaining, from the bit-stream, blocks of encoded symbols corresponding to quantized block coefficients; -an alphabet generator for generating alphabets of symbols, each alphabet defining the possible symbols of an associated quantized block coefficient; -a probabilistic module for obtaining, for each alphabet, a probabilistic distribution of the possible symbols defined therein; -a grouping module configured to group, using the probabilistic distributions, at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two quantized block coefficients associated with the grouped alphabets; -a binary code generator for generating binary entropy codes for the possible symbols of each remaining alphabet; and -a decoding module for decoding the encoded symbols using the obtained binary codes.

The encoding/decoding apparatuses and the decoding method may have features and advantages that are analogous to those set out above and below in relation to the encoding method embodying the 111h aspect, in particular that of improving the mean length of the binary codes used.

In particular, the encoding method may further comprise determining an entropy value for each of the alphabets, and the grouping of at least two alphabets comprises grouping the alphabet having the smallest entropy with the alphabet having the highest entropy.

The grouping can then be achieved with few computations, since the entropy values can be determined directly from the probabilistic distribution of the corresponding symbols.

Here, the basic idea is that the alphabets with the largest overhead are most of the time the alphabets with the smallest entropy.

This is because the coding overhead for conventional encoding (i.e. without grouping of alphabets) is mainly due to symbols with high probability of occurrence.

Clearly, one understands that the probabilities close to one are in alphabets with small entropy.

On the other hand, alphabets with high entropy cannot have high probabilities.

Finally, in the product alphabet (i.e. after grouping), the largest probability is bounded by the largest probability of the alphabet with high entropy. So, it is ensured that no high probability remains after grouping and the coding overhead is thus reduced.

In a variant, the grouping of at least two alphabets comprises grouping two alphabets which maximizes a reduction in distance between the entropy of the alphabets and the entropy of the binary codes constructed from the alphabets.

The entropy of the binary codes may also be seen as the effective mean length of those codes.

The reduction may be understood as a difference between the above-defined distance when using both alphabets before grouping, and the above-defined distance when using the resulting new alphabet.

This approach is more exhaustive since the decision to group alphabets is directly taken from the overhead due to not grouping those alphabets.

According to a first particular feature, each distance relating to an alphabet is weighted by the entropy of that alphabet.

This strategy gives priority to grouping alphabets having small entropy rather than alphabets with larger entropy, when those alphabets have binary codes with similar distances to the entropy.

According to a second particular feature, the entropy of the binary codes constructed from the alphabets is modelled by the entropy of a corresponding Shannon code.

Usually the binary codes are generated using Huffman code. However the entropy (or mean length) of Huffman codes cannot be anticipated without constructing the Huffman tree itself, which is a very costly and not practical operation, in particular when an iterative grouping process is implemented.

Using the entropy of Shannon codes to model that entropy of the binary codes used is therefore a low-cost approach, which furthermore provides an efficient determination of the reductions in distance. This is because a distance of the Shannon code to the entropy provides an upper bound of the corresponding Huffnian code distance to the entropy.

In one embodiment of the invention, the method further comprises iteratively grouping alphabets using the probabilistic distributions, wherein the probabilistic distribution of a new alphabet is calculated based on the probabilistic distributions of the at least two alphabets involved in the corresponding grouping.

This iterative process makes it possible to group alphabets several times, thus further reducing the mean coding overhead per symbol. The encoding of the first image is therefore made more efficient in terms of bitrate.

According to a particular feature, alphabets are iteratively grouped as long as a resulting new alphabet comprises less symbols than a predefined number.

Such limit of the number of symbols is defined to ensure that the associated Huffman trees (or equivalent) are small enough to be handled in real time by the codec. Indeed, an alphabet with a very large number of symbols will require a lot of memory to generate a corresponding Huffman tree.

In another embodiment of the invention, the step of grouping groups at least two alphabets associated with at least two transformed block coefficients of the same block of pixels.

This strategy defines an intra-block grouping.

Grouping within a block maintains the random spatial accessibility of the encoding stream with a precision of one block, in particular when temporal and spatial predictions are disabled.

In particular, the same grouping of alphabets is performed for a plurality of blocks within the first image. This plurality may comprise all the blocks of the first image, or a set of blocks that are collocated with base layer blocks encoded with the same coding mode (e.g. Intra prediction, P image Inter prediction, B image Inter prediction, Skip mode). Indeed, such blocks often use the same quantizations, and then the same alphabets for their symbols to encode.

A result of this is that with few calculations entropy encoding is improved for a substantial pad of the image.

According to a particular feature, the at least two transformed block coefficients are adjacent coefficients within the same block.

According to a particular embodiment, the encoding method further comprises grouping at least two alphabets associated with transformed block coefficients collocated in at least two blocks of pixels.

Of course, those grouped alphabets may be alphabets already grouped by the intra-block grouping (i.e. representing several block coefficients with a block).

This strategy defines an inter-block grouping.

It takes advantage of some Markov-Iike correlation between neighbouring or adjacent blocks to efficiently reduce the mean overhead per symbol.

In particular, the at least two blocks of pixels are from the same macroblock dividing the first image.

This maintains the spatial random accessibility of the encoding stream with a precision of a macroblock, in particular when temporal and spatial predictions are avoided.

Furthermore, the non-dependency from macroblock to macroblock also enables massive parallelization of the encoding and decoding processes, as well as low memory bandwidth consumption, since all encoding/decoding operations can be performed locally in each macroblock, independently from the other macroblocks.

This results in minimizing the amount of data transfer in memory, which is crucial for a low-complexity video codec design.

According to a particular feature, a new alphabet resulting from the grouping of at least two alphabets replaces the at least two grouped alphabets.

According to another particular feature, the probabilistic distribution is the same for the alphabets associated with transformed block coefficients collocated within a plurality of the blocks of the first image. The plurality may comprise all the blocks of the first image, but also a part of it (e.g. related to the same base coding mode).

This reduces the amount of information to handle.

In particular, the obtaining of the probabilistic distribution for an alphabet associated with a transformed block coefficient comprises fitting a Generalized Gaussian Distribution (GGD) model onto the transformed block coefficients collocated with said transformed block coefficient within said plurality of blocks. The probabilistic distribution for the alphabet can then be directly deduced from the fitted GGD, using the quantization parameters.

According to an embodiment of the invention, the encoding method further comprises, prior to grouping, discarding alphabets having associated entropy or a size (in terms of number of symbols) less than a predefined value.

This is because the impact of these alphabets on the PSNR (Peak signal-to-noise ratio) is negligible. In that case, it is not worth grouping those alphabets.

According to another feature of the invention, the obtaining of binary codes comprises obtaining a Huffman code from each of the remaining alphabets and from their associated probabilistic distributions.

Applying the Huffman coding to the grouped alphabets provides the best entropy encoding for those alphabets.

In one embodiment of the invention, the encoding method further comprises encoding a low resolution version of the first image into an encoded low resolution image; generating a residual enhancement image by subtracting an upsampled decoded version of the encoded low resolution image from said first image; and the step of transforming is applied to the blocks of pixels forming the residual enhancement image.

In other words, the main steps of the encoding method (i.e. the transforming of blocks, the quantizing of coefficients, the obtaining of alphabets, the obtaining of probabilistic distributions, the grouping of alphabets, the obtaining of binary codes and the encoding of the symbols) are applied to the residual enhancement image.

In this particular situation, the invention drastically reduces the complexity of encoding that residual enhancement image compared to conventional encoding schemes, while keeping a good compression ratio if the temporal and spatial predictions are disabled for that encoding.

In particular, the alphabet associated with a transformed block coefficient of a block depends on a coding mode of the block collocated with that block in the encoded low resolution image.

Taking into account the coding mode of the corresponding blocks in the base layer enables fitting a (GGD) distribution to the effective values taken by various transformed block coefficients (DCT coefficients) with more accuracy. Therefore, as the probabilities of symbols are closer to reality, the entropy coding thus gets closer to the theoretical entropy of the data source to code.

According to a particular feature, the same grouping of alphabets is performed for the blocks within the residual enhancement image that are collocated with blocks encoded with the same encoding mode in the encoded low resolution image.

This reduces the amount of calculation for several blocks while providing an efficient improvement of the entropy coding, since the same alphabet is then used to encode blocks corresponding to the same base coding mode.

The above optional features may also apply optionally to the decoding method.

Furthermore, according to an embodiment of the invention, the decoding method may further comprise, for each alphabet of symbol, obtaining parameters from the bit-stream and applying these parameters to a probabilistic distribution model to obtain the probabilistic distribution of the possible symbols defined in said alphabet. By transmitting only parameters, bitrate of the bit-stream can be saved.

<15w to 19th aspects> 1 5th to 1 901 aspects of the present invention, set out below, may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.

The features of the 15th to 1gth aspects of the invention may be provided in combination with the features of the 1st to 4th and/or 5th to 801 and/or 901 and 10th and/or 11th to 141h aspects but this is not essential and it is possible to use the features of the 15th to 191h aspects independently of the features of the 1st to 41h aspects and of the 51h to 8th aspects and of the 9th and 10th aspects and of the 11th to 141h aspects.

These aspects of the present invention seek to improve entropy encoding, in particular with the aim of getting closer to the entropy rate.

According to the 151h aspect of the present invention, there is provided a method for encoding a first image comprising blocks of pixels, the method comprising: -obtaining blocks of quantized coefficients corresponding to blocks of the first image; -obtaining alphabets of values, each defining the possible values of an associated quantized coefficient; -associating a respective flag with each block of quantized coefficients, said flag specifying a magnitude; -restricting alphabets corresponding to the quantized coefficients of each block, based on the magnitude specified in the flag associated with that block; -generating binary entropy codes for the possible values of each restricted alphabet; and -encoding the quantized coefficients using the generated binary codes.

Thanks to this aspect of thee invention, the mean length (i.e. entropy) of the binary codes used is substantially improved compared to the conventional approaches, and is made closer to the theoretical entropy of the probabilistic distributions. Better compression of the image is thus obtained.

This is due to the use of a block flag as defined above, since the magnitude specified therein makes it possible to restrict the alphabets of values (or "symbols") used to encode those quantized coefficients.

Thanks to this restriction of the alphabets, the number of values therein is reduced, also reducing the mean length of the corresponding binary codes. In other words, the bits that would be required for enabling the encoding of unused values (for example those above a maximum magnitude of the block) are avoided with the present aspect of the invention.

As will become apparent below, the overhead resulting from the encoding of the flag may be substantially limited in order to obtain, thanks to the present aspect of the invention, an overall gain in compression of about 5%.

One may note that a flag specifying a particular magnitude (for example equal to zero) may have the same function as the Skip flag=1 of the prior art. In such a case, the encoding of the corresponding block may be avoided. The flag according to the present aspect of the invention is furthermore used to restrict the alphabets in order to further decrease the entropy of the generated binary codes.

According to the 16tFi aspect of the present invention, there is provided apparatus for encoding a first image comprising blocks of pixels, the encoding apparatus comprising: -a processing module for generating blocks of quantized coefficients corresponding to blocks of the first image; -an alphabet generator for generating alphabets of values, each defining the possible values of an associated quantized coefficient; -a flag generator for associating a respective flag with each block of quantized coefficients, said flag specifying a magnitude; -an alphabet restriction means for restricting alphabets corresponding to the quantized coefficients of each block, based on the magnitude specified in the flag associated with that block; -a binary code generator for generating binary entropy codes for the possible values of each restricted alphabet; and -an encoding module for encoding the quantized coefficients using the generated binary codes.

According to the 171h aspect of the present invention, there is provided a method for decoding a data bit-stream of an encoded first image, the method comprising: -obtaining, from the bit-stream, blocks of encoded quantized coefficients; -obtaining alphabets of values for the encoded quantized coefficients, each alphabet defining the possible values taken by a quantizer associated with an encoded quantized coefficient; -obtaining, from the bit-stream, a flag associated with each block of encoded quantized coefficients, said flag specifying a magnitude; -restricting alphabets corresponding to the encoded quantized coefficients of each block, based on the magnitude specified in the flag associated with that block; -generating binary entropy codes for the possible values of each restricted alphabet; -decoding the encoded quantized coefficients using the generated binary codes.

According to the 18 aspect of the present invention, there is provided apparatus for decoding a data bit-stream of an encoded first image, the decoding apparatus comprising: -means for obtaining, from the bit-stream, blocks of encoded quantized coefficients; -an alphabet generator for generating alphabets of values for the encoded quantized coefficients, each alphabet defining the possible values taken by a quantizer associated with an encoded quantized coefficient; -means for obtaining, from the bit-stream, a flag associated with each block of encoded quantized coefficients, said flag specifying a magnitude; -an alphabet restriction means for restricting alphabets corresponding to the encoded quantized coefficients of each block, based on the magnitude specified in the flag associated with that block; -a binary code generator for generating binary entropy codes for the possible values of each restricted alphabet; and -a decoding module for decoding the encoded quantized coefficients using the generated binary codes.

According to the 19th aspect of the present invention, there is provided a bit-stream of an encoded first image, comprising: -blocks of binary codes corresponding to blocks of encoded quantized coefficients; -data (for example parameters of a probabilistic distribution model of the quantized coefficients as described below in more detail) used to determine quantizers associated with the encoded quantized coefficients, each quantizer defining an alphabet of possible values; -binary flags, each associated with a block of binary codes and specifying a magnitude; wherein the binary codes of a block belong to binary entropy codes generated from the alphabets restricted based on the magnitude specified in the flag associated with that block.

The bitstream may be carried by a carrier medium such as a signal. It may also be recorded on a recording medium.

The encoding/decoding apparatuses and the decoding method may have features and advantages that are analogous to those set out above and below in relation to the encoding method embodying the 15 aspect of the present invention, in particular that of improving the mean length of the binary codes used In particular regarding the encoding method, each flag is selected from a set (or alphabet) of flags, each flag of the set having an associated probability of occurrence.

This makes it possible for the flag associated with each block to be entropy encoded based on the probabilities of occurrence of the flags of the set.

Thanks to these provisions, the overhead due to the flags is limited. This provides an overall improvement thanks to the present aspect of the invention.

According to a particular feature, the set of flags is a restricted set from a larger set of flags. This provision is useful since the number of possible values for the quantized coefficients is very large, thus defining a huge number of possible magnitudes (and therefore flags) for the blocks. Reducing this number to a restricted number enables the flag encoding overhead to be kept as low as possible compared to the compression gain obtained by the restriction of the alphabets as set out above.

According to a feature of this aspect of the invention, the encoding method may further comprise restricting the larger set of flags based on probabilities of occurrence of the flags of that larger set, a probability of occurrence of a flag corresponding to the probability that a block of quantized coefficients has a maximum magnitude equal to the magnitude specified in the flag.

This makes it possible to keep, in the restricted set, the most significant flag values with respect to the probabilities of occurrence.

In particular, restricting the larger set of flags comprises selecting flags of the larger set that define intervals of magnitudes corresponding to a probability of occurrence greater than a predefined amount, for example 10 %. The probability of occurrence of an interval corresponds to the probability that a block of quantized coefficients has a maximum magnitude comprised within that interval. It may be determined from the probabilities of occurrences of the flags of the larger set.

According to a particular feature, the magnitude specified in a flag associated with a block is determined based on the quantized coefficient values of that block. This ensures restricting the alphabets taking into account how they are used within the considered block.

In particular, the magnitude specified in a flag associated with a block represents a maximum magnitude of the quantized coefficient values of that block. This makes it possible to restrict the alphabets in a way that deletes the values above the maximum magnitude (i.e. unused values). The specified maximum magnitude can be taken by one of the quantized coefficient values, but can also be an upper bound as suggested below.

According to a particular feature, the larger set comprises at least flags specifying, as magnitude, all the possible values defined in the alphabets. It is then easy to select from this set a flag corresponding to the maximum magnitude of a block.

In that case of maximum magnitude, restricting the alphabets of a block may comprise deleting, from the alphabets, the possible values that have a magnitude (i.e. absolute value) larger than the magnitude specified in the flag associated with the block.

In a variant to the maximum magnitude, the magnitude specified in a flag associated with a block represents the value MB-dB, where MB is the maximum magnitude of the possible values of all the alphabets and dB is a block distance to the maximum magnitude defined by d11 mlii M -a,j with am representing the i;am!=O tim e A, quantized coefficient values of the block, A, representing one of the alphabets, M being the maximum magnitude of the possible values of the alphabet A, and dB = 1B when all the quantized coefficient values am of the block are zero. This provision ensures that the flag used still takes the value zero for a skipped block.

In another variant, the magnitude specified in a flag associated with a block represents the value d8, where dB is defined by d8 miii Al, with am i;am!=O e A, representing the quantized coefficient values of the block, A representing one of the alphabets, M being the maximum magnitude of the possible values of the alphabet A, and d8 =Mk when all the quantized coefficient values am of the block are zero. This value dB defines the minimum amount of possible values that can be deleted from each edge (or extreme) of the intervals of possible values of the alphabets, while keeping a lossless entropy encoding of the quantized coefficient values am. This is because, due to the above construction of d5, it is sure that the dB most leftward and d5 most rightward values of those intervals (i.e. the dB most positive and negative values) are not used in the considered block.

In those cases, restricting the alphabets of a block may comprise deleting the dB most negative and dB most positive possible values from each of the alphabets.

According to a particular feature, the number of flags in the restricted set is at most three. It appears that the best results are obtained in that situation. As a variant four flags could however be kept.

In particular, one flag in the restricted set is the flag specifying the largest magnitude in the larger set. This ensures that every block can be entropy encoded without loss, because, when this flag is used for a block, the alphabets are not restricted at all when encoding that block.

In a variant or in combination, one flag in the restricted set specifies a magnitude equal to zero. In particular when the flag magnitude represents the maximum magnitude of all the quantized coefficients in the block, this specific flag has the same function as the skip flag=1 defined in H.264, meaning that all the quantized coefficients of the block are equal to zero. This provision is thus compliant with the Skip mode of H.264. Furthermore, using such flag has the advantage that there is no need to encode the corresponding blocks, saving bits.

In one embodiment of the invention, the flags associated with the blocks of the same macroblock are concatenated into one flag associated with the macroblock.

This ensures a more efficient encoding of that information, based on the grouping of alphabets as described below with more detail.

In that case, the flag associated with the macroblock may be entropy encoded based on probabilities of occurrence of the possible values for such a macroblock flag, those possible values resulting from the product of the sets of flags from which the concatenated flags are selected.

This is because the grouping (or product) of alphabets (or sets) when encoding shares the maximum one-bit Huffman coding overhead between the grouped alphabets. For example, by grouping N same alphabets, the maximum overhead for one value to encode falls to 1/N bit on average for each grouped alphabet.

In particular, the encoding method further comprises obtaining at least one probability PMB,n (in particular pMB.o) of occurrence that a macroblock comprises only quantized coefficients having a magnitude less than a given corresponding value n (i.e. a macroblock with only coefficients equal to zero when considering PMR0); and the probabilities of occurrence of the possible values for a macroblock flag depend on said at least one probability PMBn.

This is because, due to the content dependence between neighbouring blocks within an image, the probability PMB.O of a skipped macroblock (i.e. when the corresponding four flags have a maximum magnitude equal to 0) is much higher than the product of the four probabilities PB.o of having each block of the macroblock skipped: pMBo>>(pBo)4. The same applies when considering other maximum magnitude: PMBn>>(PB,n)4. The above provision then takes into account this situation to normalize the probabilities of each macroblock flag. The entropy encoding of the macroblock flags based on those probabilities is therefore closer to the true entropy.

According to a particular feature, the sets of flags from which the concatenated flags are selected are the same set restricted from a larger set of flags.

Only few probabilities associated with the flags of that restricted set have to be handled in that case.

In another embodiment of the present aspect of the invention, the encoding method further comprises encoding a low resolution version of the first image into an encoded low resolution image; generating a residual enhancement image by subtracting an upsampled decoded version of the encoded low resolution image from said first image; and wherein the blocks of quantized coefficients are obtained from the residual enhancement image.

In other words, this means that the encoding method of the present aspect of the invention based on restricting the alphabets applies to the residual enhancement image.

In this particular situation, the present aspect of the invention reduces the complexity of encoding that residual enhancement image compared to conventional encoding schemes, while keeping a good compression ratio if the temporal and spatial predictions are disabled for that encoding.

In particular, the flag associated with a block depends on a coding mode of the block collocated with that block in the encoded low resolution image.

Optional features are also provided for the decoding method.

For example, the decoding method may further comprise obtaining, from the bit-stream, probabilities associated with each of a set of possible flags; and entropy decoding encoded flags from the bit-stream based on the obtained probabilities to obtain said flag associated with each block.

In one embodiment in which the flag represents a maximum magnitude of the quantized coefficients in the associated block, restricting the alphabets of a block may comprise deleting, from the alphabets, the possible values that have a magnitude (i.e. absolute value) larger than the magnitude specified in the flag associated with the block.

In another embodiment in which the flag may convey information dB about block distance to the maximum magnitude (i.e. dB depends on the magnitude specified in the flag associated with the block), restricting the alphabets of a block may comprise deleting the d8 most negative and dB most positive possible values from each of the alphabets.

According to a feature of the present aspect of the invention, the number of flags in the set is restricted to three at most.

According to a particular feature, one flag in the restricted set specifies a magnitude equal to zero.

In one embodiment, the decoding method further comprises obtaining an encoded macroblock flag associated with a macroblock made of blocks; entropy decoding said encoded macroblock flag based on the obtained probabilities; and splitting the decoded macroblock flag into flags associated with the blocks of the macroblock. As mentioned above, using such macroblock flags concatenating the flags for the blocks of the macroblock substantially improves the entropy encoding of the flags.

In another embodiment of the present aspect of the invention, the decoding method further comprises decoding a first bit-stream to obtain a decoded low resolution version of the first image; obtaining a decoded residual enhancement image from the decoded quantized coefficients; and adding the obtained decoded residual enhancement image to an upsampled version of the decoded low resolution version of the first image, to obtain a decoded first image.

<201h & 21st aspects> In the absence of temporal prediction in the enhancement layer, decoding a first enhancement image may involve using the motion information of the corresponding first base image in order to obtain residual blocks from a "reference" decoded UHD image (temporally corresponding to the decoded reference base image used for predicting the first base image). Such blocks may then be used to correct the enhancement image data directly obtained from the bit-stream.

Using temporal prediction information of the base layer to decode the enhancement layer is known from the standard SVC (standing for "Scalable Video Coding"). One may note that such approach cannot be applied to blocks which are not temporally predicted in the base layer (i.e. the so-called Intra images or the intra-predicted blocks).

However, the 201h and 21st aspects of the invention intend to improve the efficiency of a decoding method based on predicting the enhancement layer. This aims at improving the quality of reconstructed high resolution (e.g. UHD) images, while keeping low complexity at the encoding and decoding sides.

The features of the 20th and 21st aspects of the invention may be provided in combination with the features of the 1st to 4th and/or 5tFi to 8th and/or 9th and 10th andlor 11th to 14th and/or 15th to 19Lh aspects but this is not essential and it is possible to use the features of the 20 and 21st aspects independently of the features of the 1 to 4th aspects and of the 51h to gth aspects and of the 91h and 10th aspects and of the 11th to 141h aspects and of the 151h to 1 gIh aspects.

According to the 201h aspect of the present invention, there is provided a method for decoding a scalable video bit-stream, comprising decoding a base layer from the bit-stream, decoding an enhancement layer from the bit-stream and adding the enhancement layer to the base layer to obtain a decoded video of high resolution images, wherein decoding the enhancement layer comprises: -decoding, from the bit-stream, and dequantizing encoded transformed coefficients of the enhancement layer; -using motion information of the base layer to predict residual blocks of coefficients of the enhancement layer, and transforming the predicted residual blocks of coefficients into transformed residual blocks; -obtaining at least one first probabilistic distribution of the transformed coefficients; -obtaining at least one second probabilistic distribution of the differences between the dequantized transformed coefficients and the coefficients of the transformed residual blocks; and -merging the dequantized transformed coefficients and the coefficients of the transformed residual blocks, based on the obtained probabilistic distributions.

In more detail, the method for decoding a scalable video bit-stream, comprises: -decoding a low resolution version of the video, the decoding of the low resolution version comprising using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image; -decoding an enhancement version of the video, each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video; and -adding each decoded enhancement image to an up-sampled version of the corresponding low resolution image, to obtain a decoded video of decoded high resolution images; wherein decoding a first enhancement image temporally corresponding to a first low resolution image comprises: -decoding, from the bit-stream, and dequantizing blocks of encoded quantized transformed coefficients of the first enhancement image; -obtaining at least one first probabilistic distribution of the quantized transformed coefficients; -using the motion information to obtain residual blocks from a decoded reference high resolution image temporally corresponding to the decoded reference low resolution image; and transforming said residual blocks into transformed residual blocks; -obtaining at least one second probabilistic distribution of the differences between the coefficients of the transformed residual blocks and the dequantized transformed coefficients; and -merging the dequantized blocks of dequantized transformed coefficients with the transformed residual blocks, based on the first and second probabilistic distributions.

The blocks of the encoded enhancement layer obtained from the bit-stream and the residual blocks obtained using the motion information of the base layer are merged together to form parts of the decoded enhancement image. As explained above this decoded enhancement image is then added to an up-sampled decoded base image to obtain a decoded high resolution (e.g. UHD) image.

This approach refines the quality of the decoded transformed (i.e. DCT) coefficients in the decoder.

According to the present aspect of the invention, the quality of the decoded high resolution image is improved compared to known techniques. This is due to the use of two probabilistic distributions that model both the original transformed coefficients and an error of temporal prediction, when merging the transformed coefficients (e.g. OCT coefficients).

The first probabilistic distribution corresponding to the transformed coefficients encoded in the bit-stream may be obtained from the bit-stream itself, for instance from parameters contained therein. These may represent statistical modelling of the original transformed coefficients (i.e. before quantization and encoding).

The second probabilistic distributions that correspond to the blocks predicted using the motion information of the base layer, provide information about the noise of temporal prediction. In particular they provide modelled information on the difference between those predicted coefficients and the transformed coefficients. Since the original transformed coefficients are not known by the decoder, the decoded and dequantized transformed coefficients known at the decoding side are used in place of the original transformed coefficients. The inventors have observed that using those coefficients rather than the original ones provides modelling that is quite close to reality.

Since the decoded transformed coefficients and the transformed predicted coefficients (or residual blocks) both bring relevant information about the original transformed coefficients (DCT coefficients before encoding), using the above probabilistic distributions enables the obtaining of transformed coefficients to be statistically optimized so as to be closer to the original values than in the known techniques.

In particular, probabilities, such as the expected value example below, provide good statistical results.

For example, for low bitrate (meaning large quantization intervals), the temporally predicted blocks may more often provide relevant information on the original DCT coefficients than the quantization level obtained by the dequantization. For high bitrate, the opposite occurs.

The invention allows gains up to several dBs in rate-distortion performance at almost no cost of additional complexity at the decoder, and at the cost of zero additional rate when the parameters have already been transmitted.

As a further advantage, the approach according to the present aspect of the invention does not necessarily have to be performed at the decoding side. For example, it may be switched off in case of very low complexity decoders. Further, the encoding is independent of the switching decision.

According to the 21 aspect of the present invention, there is provided apparatus for decoding a scalable video bit-stream, comprising a base layer decoder configured to decode a base layer from the bit-stream, an enhancement layer decoder configured to decode an enhancement layer from the bit-stream and a video building unit configured to add the enhancement layer to the base layer to obtain a decoded video, wherein the enhancement layer decoder is further configured to: -decode, from the bit-stream, and dequantize encoded transformed coefficients of the enhancement layer; -use motion information of the base layer to predict residual blocks of coefficients of the enhancement layer, and transform the predicted residual blocks of coefficients into transformed residual blocks; -obtain at least one first probabilistic distribution ot the transformed coefficients; -obtain at least one second probabilistic distribution of the differences between the dequantized transformed coefficients and the coefficients of the transformed residual blocks; and -merge the dequantized transformed coefficients and the coefficients of the transformed residual blocks, based on the obtained probabilistic distributions.

In more detail, the decoding apparatus comprises: -a base decoder configured to decode a low resolution version of the video, using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image; -an enhancement decoder configured to decode an enhancement version of the video, each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video; and -an image building unit configured to add each decoded enhancement image to an up-sampled version of the corresponding low resolution image, to obtain a decoded video of decoded high resolution images: wherein the enhancement decoder is further configured to: -decode, from the bit-stream, and dequantize blocks of encoded quantized transformed coefficients of the first enhancement image; -obtain at least one probabilistic distribution of the quantized transformed coefficients; -use the motion information to obtain residual blocks from a decoded reference high resolution image temporally corresponding to the decoded reference low resolution image; and transform said residual blocks into transformed residual blocks; -obtain at least one second probabilistic distribution of the differences between the coefficients of the transformed residual blocks and the dequantized transformed coefficients; and -merge the dequantized blocks of dequantized transformed coefficients with the transformed residual blocks, based on the first and second probabilistic distributions.

For example, the step of merging may merge a dequantized transformed coefficient with a collocated coefficient in the transformed residual blocks (meaning collocated blocks and collocated coefficients within those blocks), using first and second probabilistic distributions associated with these collocated coefficients, on a quantization interval associated with the value of the corresponding quantized transformed coefficient (i.e. the value before the quantized transformed coefficient is dequantized).

This ensures an accurate merged transformed coefficient to be provided, given its quantized value that has been transmitted in the encoded bit-stream.

In particular, the first and second probabilistic distributions are integrated using Riemann sums over that quantization interval during the merging step. This provision makes it possible to perform a probabilistic merger of transformed coefficients, on low complexity decoders.

According to a particular feature, the step of merging comprises calculating the expectation of a block coefficient, given the quantization interval associated with the value of the corresponding quantized transformed coefficient and given its corresponding value in the transformed residual blocks, based on the first and second probabilistic distributions.

In particular, calculating the expectation 2, ot a block coefficient i comprises calculating the following value: xPDFI(x)PDFN(x -)dx xi = 5 PDFI(x)PDFN(x)dx where PDF is the first probabilistic distribution associated with the block coefficient i, PDFN is the second probabilistic distribution, Y0 is the value of the coefficient collocated with said block coefficient i in the transformed residual blocks, and Gm is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient L These approaches combine the probabilities of occurrence of the considered coefficient in the quantization interval (i.e. the first probabilistic distribution), the predicted value (Y0) and the noise modelling of the prediction (i.e. the second probabilistic distribution).

The probabilistic best value is thus obtained for the transformed coefficients when reconstructing the decoded high resolution images. These images prove to be statistically improved with respect to quality.

In one embodiment of the invention, the probabilistic distributions are generalized Gaussian distributions GGD(a1, fl, x) = exp (-Ix / a, ), where and are 2a1F(1 / /3) two parameters. This parametric model is well-suited for modelling noise, such as the residuals.

In particular, the obtaining of the second probabilistic distribution comprises fifting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks and the dequantized transformed coefficients. In that case, the second probabilistic distribution is statistically obtained based on the coefficients that are actually handled by the decoder.

According to a particular feature, the obtaining of the first probabilistic distribution comprises obtaining parameters from the bit-stream and applying these parameters to a probabilistic distribution model.

In one particular embodiment of the present aspect of the invention, the low resolution or base image temporally corresponding to a first enhancement image to decode is an image bi-directionally predicted from reference low resolution or base images using motion information in each of the two directions, and the decoding of the first enhancement image comprises obtaining transformed residual blocks for each direction and merging together the transformed residual blocks in both directions with the dequantized blocks of dequantized transformed coefficients.

This applies for example to enhancement images corresponding to B-type images of the base layer.

This approach proves to be more precise than an approach which first determines a single transformed residual block based on prediction in both directions.

This is because a motion prediction noise estimation in each direction is separately obtained, improving a probabilistic merger.

Similarly to the case briefly described above, the merging can be based on calculating an expectation.

For example! the step of merging may comprise calculating the merger value 5 of a block coefficient i using the formula: xPDF(x)PDFN(x --Y'0)dx x = -5 PDF(x)PDFN(x -)PDFN.(x -Y'0)dx where PDF is the first probabilistic distribution associated with the block coefficient i. FDFN and PDFN are the second probabilistic distributions for respectively each of the two directions! Io and Y'0 are the value of the coefficient collocated with said block coefficient i in the transformed residual blocks in respectively each of the two directions, and Om is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient L In one embodiment of the present aspect of the invention, obtaining residual blocks comprises: -obtaining, using the motion information, motion predictor blocks from a decoded reference high resolution image; -up-sampling the low resolution image temporally corresponding to the first enhancement image to decode into high resolution, to obtain up-sampled blocks; -subtracting each motion predictor block from a corresponding (i.e. collocated) up-sampled block to obtain the residual blocks.

These steps define the temporal prediction of the enhancement layer based on the images already reconstructed. They produce another enhancement layer (since each obtained block is the difference with the base layer) from which a modelling of the temporal prediction noise can be performed.

In one embodiment of the present aspect of the invention, before using the motion information, that motion information is up-sampled (or interpolated) into high resolution. This is because the reference image on which that information is about to be used is of high resolution.

According to a particular feature, the motion information that is up-sampled comprises, for a given block, a motion vector and a temporal residual block; and the obtaining of the motion predictor blocks comprises -obtaining blocks of the decoded reference high resolution image using the up-sampled motion information, and -adding the up-sampled temporal residual block to the obtained blocks.

It may further comprise a reference image index identifying said reference low resolution image, when the encoding with temporal prediction uses multiple reference images.

In another embodiment of the present aspect of the invention, the decoding method may further comprise filtering, using a deblocking filter, the obtained decoded high resolution images; wherein parameters (e.g. the filter strength parameter or the quantization-dependent parameter) of the deblocking filter depend on the first and second probabilistic distributions used during the merger.

This makes it possible to locally adjust the post-processing for filtering discontinuities resulting from the modelling according to the invention.

In yet another embodiment, the second probabilistic distributions are obtained for blocks collocated with enhancement image blocks of the corresponding low resolution or base image that are encoded with the same coding mode. The coding mode of the base (low resolution) layer is for example the INTER mode, which may be further subdivided into an INTER P-prediction mode and an INTER B-prediction mode, or the SKIP mode (as defined in H.264).

In another embodiment, first probabilistic distributions are obtained for respectively each of a plurality of channels, wherein a channel is associated with collocated coefficients having the same block coefficient position in their respective blocks. Furthermore, a channel may be restricted to the blocks collocated with base layer blocks having the same coding mode.

<Other aspects> The sets of aspects described above (5th to 8th aspects, 9th & 1001 aspects, 1 i to 14th aspects, 15th to 1gth aspects and 20th & 21st aspects) can be applied independently of one another but any two or more sets can be combined. As noted above, too, they can also be applied independently of the 1St to 4th aspects or in combination with the 1 sI to 401 aspects.

In the 1St to 4th aspects of the present invention the encoding and decoding of the base layer are in conformity with HEVC. However, in other aspects of the present invention, it is possible to encode and decode the base layer using techniques other than HEVC, for example H.264. Other preferred features of the Vt to 4th aspects of the invention, as described above, can still be applied when HEVC encoding and decoding are not used. Combinations with the sets of aspects described above (5Lh to 8th aspects, gth & 10th aspects, 11th to l4' aspects, 15th to 1gth aspects and 20th & 21St aspects) can be made when HEVC encoding and decoding are not used.

<Program aspects> Any of the methods and apparatuses embodying the aforesaid 1st to 18th 20th and 21st aspects of the present invention and the Other Aspects may be implemented in software.

Accordingly, further aspects of the present invention relate to programs which, when executed by a computer or processor, cause the computer or processor to carry out encoding methods embodying any of, or any combination of, the 1st 5th gth 1 1th and 15th aspects of the present invention.

Further aspects of the present invention relate to programs which, when executed by a computer or processor, cause the computer or processor to carry out decoding methods embodying any of, or any combination of, the 2, 6th 13th 17th and 20th aspects of the present invention.

Further aspects of the present invention relate to programs which, when loaded into a computer or processor, cause the computer or processor to function as the encoding apparatuses of any of, or any combination of, the 3rd 71h 101b, 12th and 16th aspects of the present invention.

Further aspects of the present invention relate to programs which, when loaded into a computer or processor, cause the computer or processor to function as the decoding apparatuses of any of, or any combination of, the 41h 8th 14th 18th and 21' aspects of the present invention.

The programs may have features and advantages that are analogous to those set out above.

A program embodying the invention may be provided by itself or may be carried in, on or by a carrier medium.

The carrier medium may be a storage medium or recording medium.

Preferably, the storage medium is computer-readable.

The carrier medium may also be a transmission medium such as a signal.

Such a signal may be transmitted through a network such as the Internet to enable a program embodying the present invention to be distributed via the network.

BRIEF DESCRIPTION OF THE DRAWINGS

Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which: -Figure 1 schematically shows an encoder for a scalable codec; -Figure 2 schematically shows the corresponding decoder; -Figure 3 schematically illustrates the enhancement video encoding module of the encoder of Figure 1 -Figure 3a is a more detailed schematic illustration of the enhancement video encoding module of the encoder of Figure 1 according to the invention; -Figure 4 schematically illustrates the enhancement video decoding module of the encoder of Figure 2; -Figure 4a is a more detailed schematic illustration of the enhancement video decoding module of the decoder of Figure 2 according to the invention; -Figure 5 illustrates the performance of current entropy coding using Huffman codes; -Figure 6 illustrates a structure of a 4:2:0 macroblock; -Figure 7 illustrates an example of a quantizer based on Voronoi cells; -Figure 8 schematically illustrates the alphabet of symbols and associated probabilities for the quantizer of Figure 7; -Figure 9 illustrates the video stream structure for the luminance, with indication of the alphabets involved for DCI coefficients; -Figure 10 shows general steps of the processing of alphabets according to the invention; -Figure 11 illustrates the intra-block grouping of Figure 10; -Figure 12 illustrates the inter-block grouping of Figure 10; -Figure 13 shows steps of an exemplary method for the intra-block grouping of Figure 11; -Figure 14 shows steps of an exemplary method for the inter-block grouping of Figure 12; -Figure 15 shows steps of an entropy encoding method according to the invention; -Figure 16 shows steps of a corresponding decoding method; -Figure 17 illustrates the performance of grouping alphabets, in comparison to the performances illustrated in Figure 5; -Figure 18 illustrates the spatial random access property; -Figure 19 illustrates an implementation of entry points to allow spatial random access as illustrated in Figure 18; -Figure 20 shows a particular hardware configuration of a device able to implement methods according to the invention; -Figure 21 show general steps of the restricting operations according to the invention; -Figure 22 illustrates the restricted alphabets; and -Figure 23 illustrates the coding structure of the bitstream with macroblock flags according one embodiment of the invention; -Figure 24 shows the correspondance between data in the spatial domain (pixels) and data in the frequency domain; -Figure 25 illustrates an exemplary distribution over two quanta; -Figure 26 shows exemplary rate-distortion curves, each curve corresponding to a specific number of quanta; -Figure 27 shows the rate-distortion curve obtained by taking the upper envelope of the curves of Figure 26; -Figure 28 depicts several rate-distortion curves obtained for various possible parameters of the DCT coefficient distribution; -Figure 29 shows the domain where the optimisation is carried out; -Figure 30 depicts curves showing the convergence of the optimisation process used to select the quantizers to be used; -Figure 31 is a more detailed schematic illustration of the decoder of Figure 2 according to another embodiment of the invention; -Figure 32 illustrates the prediction of the enhancement layer in the decoder of Figure 31; -Figure 33 illustrates the probabilistic merging in the decoder of Figure 31; and -Figure 34 illustrates the performance of the Figure 31 decoder.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

OverviewThrinciple As set out in the introduction, given constraints in terms of limited power and memory access bandwidth, an embodiment of the present invention seeks to provide a UHD codec with low complexity based on scalable encoding. It should be noted that, although the description below refers to UHD and HD, HD is merely an example of a first (lower) resolution and UHD is merely an example of a second (higher) resolution. The invention can be applied to any case in which two or more different resolutions of video data are available or can be obtained. Video data having the first (lower) resolution can, for example, be obtained from video data having the second (higher) resolution by downsampling. Alternatively, some video sources, such as video cameras, may deliver simultaneously video data having the first and second resolutions.

Basically, the UHD video is encoded into a base layer and one or more enhancement layers.

The base layer results from the encoding of a low resolution version (version having the first resolution) of the UHD images, in particular having a HD resolution, with a standard existing codec (e.g. H.264 or HEVC -High Efficiency Video Coding). As stated above, the compression efficiency of such a codec relies on spatial and temporal predictions.

Furthermore, the conventional block-based codec, such as H.264, implements a Skip mode as described for example in US application No 2009/0262835. According to the Skip mode, when a macroblock (or block) residual resulting from the prediction is full of zeros (and possibly when the motion vector for the macroblock is zero [i.e. no motion]), a Skip Macroblock flag is set to 1 in the bit stream, instead of coding the motion vectors or the residual. This mainly reduces the number of bits to be coded, and improves compression rate.

Basically, this Skip mode takes advantage of the dependence between the quantized OCT coefficients, in particular of the strong correlation between the small values of the quantized OCT coefficients. For instance, it is generally observed that there are much more OCT blocks in which all the quantized OCT coefficients are zeroes (so-called skipped blocks) than expected from a theory without dependence. In other words, the probability of a skipped block is much higher than the product of probabilities of having a zero tor all quantized OCT coefficients.

Further to the encoding of the base layer, an enhancement image is obtained from subtracting an interpolated (or up-scaled or upsampled) decoded image of the base layer from the corresponding original UHO image. The enhancement images, which are residuals or pixel differences with UHO resolution, are then encoded into an enhancement layer.

Figure 1 illustrates such approach at the encoder 10.

An input raw video 11, in particular a UHD video, is down-sampled 12 to obtain a so-called base layer, for example with HO resolution, which is encoded by a standard base video coder 13, for instance H.264/AVC or HEVC. This results in a base layer bit stream 14.

To generate the enhancement layer, the encoded base layer is decoded 15 and up-sampled 16 into the initial resolution (UHO in the example) to obtain the up-sampled decoded base layer.

The latter is then subtracted 17, in the pixel domain, from the original raw video to get the residual enhancement layer X. The information contained in X is the error or pixel difference due to the base layer encoding and the up-sampling. It is also known as a "residual".

A conventional block division is then applied, for instance a homogenous 8x8 block division (but other divisions with non-constant block size are also possible).

Next, a block-based OCT transform 18 is applied to each block to generate DCT blocks forming the OCT image XDCT having the initial UHD resolution.

This DCT image XDCT is encoded in X2 by an enhancement video encoding module 19 into an enhancement layer bit stream 20.

The encoded bit-stream EBS resulting from the encoding of the raw video 11 is made of: -the base layer bit-stream 14 produced by the base video encoder 13; -the enhancement layer bit-stream 20 encoded by the enhancement video encoder 19; and -parameters 21 determined and used by the enhancement video encoder.

Examples of those parameters are given here below.

Figure 2 illustrates the associated processing at the decoder 30 receiving the encoded bit-stream EBS.

Part of the processing consists in decoding the base layer bit-stream 14 by the standard base video decoder 31 to produce a decoded base layer. The decoded base layer is then up-sampled 32 into the initial resolution, i.e. UHD resolution.

In another part of the processing, both the enhancement layer bit-stream 20 and the parameters 21 are used by the enhancement video decoding and dequantization module 33 to generate The image XQPt is the result of the quantization and then the inverse quantization on the original image X. An inverse DCT transform 34 is then applied to each block of the dequantized image to obtain the decoded residual XIff.Q (of UHD resolution) in the pixel domain. For example, each decoded residual is added 35 to the corresponding block in the up-sampled decoded base layer to obtain decoded images of the video.

Filter post-processing, for instance with a deblocking filter 36, is finally applied to obtain the decoded video 37 which is output by the decoder 30.

Reducing UHD encoding and decoding complexity relies on simplifying the encoding of the enhancement images at the enhancement video encoding module 19 compared to the conventional encoding scheme.

To that end, the inventors dispense with the temporal prediction and possibly the spatial prediction when encoding the UHD enhancement images. This is because the temporal prediction is very expensive in terms of memory bandwidth consumption, since it often requires accessing other enhancement images as reference images. Low-complexity codecs may then be designed, in particular at the encoding side.

While this simplification reduces by 80% the slow memory random access bandwidth consumption during the encoding process, not using those powerful video compression tools may deteriorate the compression efficiency, compared to the conventional standards.

In this context, the inventors have developed several additional tools for increasing the efficiency of the encoding of those enhancement images.

Figure 3 illustrates an embodiment of the enhancement video encoding module 19 (or "enhancement layer encocler")that is provided by the inventors.

In this embodiment, the enhancement layer encoder models 190 the statistical distribution of the DCI coefficients within the DCI blocks of a current enhancement image by filling a parametric probabilistic model.

This fitted model becomes the channel model of DCI coefficients and the fitted parameters are output in the parameter bit-stream 21 coded by the enhancement layer encoder. As will become more clearly apparent below, a channel model may be obtained for each DCI coefficient position within a DOT block based on fitting the parametric probabilistic model onto the corresponding collocated DCT coefficients throughout all the DOT blocks of the image Xfl( or of part of it.

Based on the channel models, quantizers may be chosen 191 from a pool of pre-computed quantizers dedicated to each parametric channel.

The chosen quantizers are used to perform the quantization 192 of the DCT image XDCT to obtain the quantized DCT image XDCJQ.

Lastly, an entropy encoder 193 is applied to the quantized DOT image XDQ to compress data and generate the encoded DOT image XDYQ which constitutes the enhancement layer bit-stream 20.

Ihe associated enhancement video decoder 33 is shown in Figure 4.

From the received parameters 21, the channel models are reconstructed and quantizers are chosen 330 from the pool of quantizers.

An entropy decoder 331 is applied to the received enhancement layer bit-stream 20 (X = XfQ) to obtain the quantized DOT image RDEC.

A dequantization 332 is then performed by using the chosen quantizers, to obtain X1.

The channel modelling, the selection of quantizers and the adaptation of entropy coding are some of the additional tools as introduced above.

As will become apparent from the explanation below, those additional tools may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.

As briefly introduced above, the invention is particularly advantageous when encoding images without prediction.

Entropy Encoding The following section of description focuses on the entropy coding 193. In this respect, predefined quantizers may be used instead of the selection 191 of optimal quantizers.

Figure 5 illustrates the performance of conventional entropy encoding.

The y-axis corresponds to the average picture quality obtained when decoding an encoded video bit-stream. The x-axis corresponds to the average bitrate obtained in the encoding of the corresponding video.

The dashed curve shows a rate distortion curve obtained, with bitrate values equal to the theoretical entropy of the coded source. As it is commonly known, the entropy bitrate is the lowest rate bound that can be achieved by any entropy coder.

The plain curve illustrates the rate-distortion curve obtained when a conventional Huffman coding/decoding process is used, for example as defined in the standard H.264.

From these two curves, it can be seen that the bitrate usually obtained is substantially higher than the entropy rate. In particular, this is due to the fact that the Huffman coding may be as much as one bit away from the theoretical entropy for a single symbot to encode.

The present invention seeks to improve this entropy encoding situation, in particular with the aim of getting closer to the entropy rate.

For the detailed description below, focus is made on the encoding and the decoding of a UHD video as introduced above with reference to Figures 1 to 4. It is however to be recalled that the invention applies to the encoding of any image from which a probabilistic distribution of transformed block coefficients can be obtained (e.g. statistically). In particular, it applies to the encoding of an image without temporal prediction and possibly without spatial prediction.

Referring again to Figure 3, a low resolution version of the initial image has been encoded into an encoded low resolution image, referred above as the base layer; and a residual enhancement image has been obtained by subtracting an interpolated high resolution (or up-sampled) decoded version of the encoded low resolution image from said initial image.

Conventionally, that residual enhancement image is then transformed from the spatial domain (i.e. pixels) into the (spatial) frequency domain, using for example a block-based DCT transform, to obtain an image of transformed block coefficients. In the Figure, that image is referenced XD(j., which comprises a plurality of DCT blocks, each comprising DCI coefficients.

As an example, the residual enhancement image has been divided into blocks Bk, for instance 8x8 blocks but other divisions may be considered, on which the DCT transform is applied. Within a block, the DCI coefficients are associated with a block coefficient position or "index" i (e.g. i = 1 to 64), along a zigzag scan for successive handling when encoding, for example.

Blocks are grouped into macroblocks MBk. A very common case for so-called 4:2:0 YUV video streams is a macroblock made of 4 blocks of luminance Y, 1 block of chrominance U and 1 block of chrominance V, as illustrated in Figure 6. Here too, other configurations may be considered.

In the example developed below, a macroblock MBk is made of 16x16 pixels of luminance Y and the chrominance has been down-sampled by a factor two both horizontally and vertically to obtain 8*8 pixels of chrominance U and 8*8 pixels of chrominance V. The four blocks within a macroblock MBk are referenced B, B, B, B. To simplify the explanations, only the coding of the luminance component is described below. However, the same approach can be used for coding the chrominance components.

Starting from the image XDCT, a probabilistic distribution P of each DCT coefficient is determined using a parametric probabilistic model. This step is referenced in the Figure.

Since, in the present example, the image XDCT is a residual image, i.e. information is about a noise residual, it is efficiently modelled by Generalized Gaussian Distributions (GGD) having a zero mean: DCT(X) GGD(a,13), where u,j3 are two parameters to be determined and the GGD follows the following two-parameter distribution: GGD(a, $, x) 2aF(1 / $) cxp(-x / a and where F is the well-known Gamma function: F(z) = t1e dt The DCT coefficients cannot be all modelled by the same parameters and, practically, the two parameters a43 may depend on: -video content. This means that the parameters must be computed each image or every n images for instance: -the index i of the DCT coefficient within a DOT block Bk. Indeed, each DOT coefficient has its own behaviour. A DOT channel is thus defined as the DCT coefficients collocated (i.e. having the same index) within a plurality of DOT blocks (possibly all the blocks of the image). A DOT channel can therefore be identified by the corresponding index i; and/or -the encoding mode used for the cottocated block of the base layer, referred in the present document as to the "base coding mode". Typically, Intra blocks of the base layer do not behave the same way as Inter blocks. Blocks with a coded residual in the base layer do not behave the same way as blocks without such a residual (i.e. Skipped blocks). And blocks coded with a non-nil texture data according to the coded-block-pattern syntax element as defined in H.264/AVO do not behave the same way as those blocks without non-nil texture data.

It is to be noted that, due to the down-sampling of the base layer, the collocation of blocks should take into account that down-sampling. For example, the four blocks of the n-th macroblock in the residual enhancement layer with UHD resolution are collocated with the n-th block of the base layer having a HD resolution.

That is why, generally, all the blocks of a macroblock in an enhancement image have the same base coding mode.

For illustrative purposes, if the residual enhancement image XDCT is divided into 8x8 pixel blocks, the modelling 190 has to determine the parameters of 64 DCT channels for each base coding mode.

In addition, since the luminance component Y and the chrominance components U and V have dramatically different source contents, they must be encoded in different DCT channels. For example, if it is decided to encode the luminance component Y on one channel and the chrominance components UV on another channel, 128 channels are needed for each base coding mode.

At least 64 pairs of parameters for each base coding mode may appear as a substantial amount of data to transmit to the decoder (see parameters 21). However, experience proves that this is quite negligible compared to the volume of data needed to encode the residuals of Ultra High Definition (4k2k or more) videos. As a consequence, one may understand that such a technique is preferably implemented on large videos, rather than on very small videos because the parametric data would be too costly.

For the sake of simplicity of explanation, a set of OCT blocks corresponding to the same base coding mode is now considered. The invention may then be applied to each set corresponding to each base coding mode. Furthermore, as suggested above, the invention may be directly applied to the entire image, regardless the base coding modes.

To obtain the two parameters cz1j31 defining the probabilistic distribution P1 for a DCI channel i, the Generalized Gaussian Distribution model is fitted onto the DOT block coefficients of the DCT channel, i.e. the DOT coefficients collocated within the DCI blocks with the same base coding mode. Since, this fitting is based on the values of the DOT coefficients before quantization (of the DOT blocks having the same base coding mode in the example), the probabilistic distribution is a statistical distribution of the DOT coefficients within a considered channel i.

For example, the fitting may be simply and robustly obtained using the moment of order k of the absolute value of a GGD: E(GGD(a., $ )k) J(keRJ a + k)//3j.

= I F)<[GGD(a,/3.,x) dx = F(1/J3) Determining the moments of order 1 and of order 2 from the DOT coefficients of channel i makes it possible to directly obtain the value of parameter D: M. -f(1/fljF(3//&) -[(2 / $)2 The value of the parameter f3 can thus be estimated by computing the above ratio of the two first and second moments, and then the inverse of the above function of f3.

Practically, this inverse function may be tabulated in a memory of the encoder instead of computing Gamma functions in real time, which is costly.

The second parameter a may be determined from the first parameter and the second moment, using the equation: P1. = 02 = a12F(3 / J3) / LU / Pa).

The two parameters a,I31 being determined for the DCT coefficients i, the probabilistic distribution P of each DCT coefficient i in a considered block is defined by P1(x) = GGD(a.,$,x) = exp(Hx/a) 2a1F( / j3) Still referring to Figure 3, a quantization 192 of the DOT coefficients is then performed, to obtain quantized DCT coefficients (i.e. symbols or values).

As shown in the Figure, the quantization of those coefficients may involve optimal quantizers chosen (step 191) for each DOT channel i based on the corresponding probabilistic distribution P1(x) of the DOT coefficients.

In a variant, the quantizers may be predefined prior to the encoding. Since the quantization is not the core of the present invention, it is here assumed that a quantizer is selected for each DCT channel and each base coding mode as defined above, meaning that various quantizers are generally used for quantizing various DOT coefficients.

Figure 7 illustrates an exemplary Voronoi cell based quantizer.

A quantizer is made of M Voronoi cells distributed over the values of the DCT coefficients. Each cell corresponds to an interval [,+iI, called quantum Q,.

Each cell has a centroid C,.!, as shown in the Figure.

The intervals are used for quantization: a DOT coefficient comprised in the interval Ec tm+I I is quantized by a symbol arr associated with that interval.

The centroids are used for de-quantization: a symbol am associated with an interval is de-quantized into the centroid value Cm of that interval.

Figure 5 illustrates what is named an alphabet of quantization for a given quantizer.

This Figure shows an example of symbols am corresponding to the Voronoi cells. In practice, those symbols are consecutive integer values centred on 0, i.e. values from [-MAX; MAX] as shown in the Figure.

The alphabet A comprises all the symbols or values am defined by the Voronoi cells of the quantizer used. In other words, the alphabet A comprises all the possible symbols or values for a DOT coefficient due to the quantization performed on that coefficient.

Considering the quantizer used for each DOT coefficient in the DCT image Xf)(T, an alphabet A is obtained for each DOT coefficient i. This is schematically illustrated by Figure 9, in which A1 denotes the alphabet available for a i-tb DCT coefficient within the DCT blocks Bk.

One may note that the alphabets of two DCT coefficients belonging to the same DCI channel (i.e. collocated DCI coefficients from different blocks Bk) are identical, since the same quantizer is used. An illustrative example is given in Figure 9 where the four blocks B of a macroblock MBk use the same alphabets A1, A2, A3, etc. for their quantized DCT symbols.

It is to be recalled here that due to different coding modes of the base layer macroblocks, different kinds of alphabets, indexed by the corresponding base coding mode, are used to code the residual macroblocks (i.e. according to the corresponding DCT channel).

However, it is to be recalled here that the explanations below concentrate on macroblocks associated with the same coding mode, i.e. using the same alphabets.

This is because the residual enhancement image may be segmented according to the base coding modes, and then processed by segment each corresponding to the same coding mode.

Still referring to Figure 3, based on the probabilistic distribution P(x) associated with a DCI channel i (for a base coding mode), the probabilities Pi,rn of occurrence of each symbol or value am of the corresponding alphabet A are calculated.

This may be done by computing the integral L Ex)dx on the quantum Om.

The probabilities {p1} for an alphabet A thus define the probabilistic distribution of the possible symbols or values defined therein.

Given the considered DCT channel i for a base coding mode, the probabilistic distribution is the same for the alphabets associated with DCI coefficients collocated within a plurality of the blocks of the image.

Such probabilities may be computed off-line and stored in memory of the encoder, in order to decrease the complexity of real time encoding. This is for example possible when the parameters a,fi for modelling the distribution of the DCI coefficients are chosen from a limited number of possible parameters, and when the possible quantizers are know in advance.

The probabilities {pi.m} of each alphabet A and the quantized symbols or values am obtained from each DCT coefficient in the DCT image XD(.T are then provided for entropy coding 193 as shown in Figure 3. In this respect, the quantized DCI coefficients are generally processed according to a zigzag scan.

Figure 3a gives more details about the present invention. Before generating binary entropy codes for the possible values or symbols of each alphabet, some processing of those alphabets may be performed in order to optimize their entropy when entropy encoding the DCI coefficients. The same references are used in Figures 3 and 3a.

Two additional processing operations are illustrated on the Figure, namely the restriction 194 of alphabets and the grouping 195 of alphabets. Those two processing operations can be implemented separately and one without the other.

As will become apparent below, the restriction 194 of alphabets leads to generating an additional stream with the encoded bit-stream EBS, namely the magnitude probability bit-stream 22. The grouping 195 of alphabets does not need such an additional stream.

The second processing operation, i.e. the grouping of alphabets before entropy encoding the DCT coefficients based on the resulting grouped alphabets, is explained below.

The restriction of alphabets 194 is an extension of the Skip flag as introduced in H.264 for example. It results in restricted alphabets that well fit to the encoding of the DCT blocks, given the maximum magnitude of their quantized DCT coefficients.

Similarly to the Skip flag of H.264, a flag is associated with each block of quantized DCI coefficients.

That flag for a DCT block specifies a magnitude, which may be determined based on the quantized coefficient values in the associated block. The magnitude is the extended information compared to conventional Skip flags.

And thanks to that extended information, the encoding method allows restriction of alphabets A corresponding to the quantized DCT coefficients of each block, based on the magnitude specified in the flag associated with that block.

By restricting the alphabets, the number of values therein is reduced, also reducing the entropy of the corresponding binary codes. This is because the bits that would be required for enabling the encoding of unused values (those that are deleted thanks to the restriction of alphabets) are avoided.

More details about a first implementation are now given.

In this first implementation, the magnitude specified in a flag associated with a block represents a maximum magnitude of the quantized coefficient values of that block. This specified maximum magnitude may be the maximum magnitude taken by one of the quantized coefficient values, but may also be an upper bound of those values.

Let's define the probability P8. that, in a given block B, the maximum magnitude of the quantized DCT coefficients is PB. p(maxrn arn = n) where am are the quantized OCT coefficients (in fact their values or symbols). These probabilities are computed after the quantization step based on the values of the quantized DCT coefficients, once the encoder knows the quantized value of each DCT coefficient.

Let's also define, for a macroblock MB, the probability Pirn,n that the maximum magnitude of the quantized DCT coefficients of the (four) blocks of macroblock MB is 10. These probabilities PAm;n are computed by the encoder after the quantization process, based on the actual state of the macroblocks (either from the entire image or from the part of the image corresponding to the same base coding mode).

Due to the correlation between neighbouring blocks, it appears that Pm,o >> PRo, and more generally that p,. >> p8,4.

Figure 21 illustrates an embodiment of the main steps of the restriction 194.

The method begins at step 2100 with the alphabets A of the quantized DOT coefficients considered (e.g. from the blocks associated with the same base coding mode) and their associated probabilistic distributions {p4.

Step 2105 consists in computing the probabilities P8,, as defined above..

This enables definition of a new alphabet F of the possible flags t, for blocks, each flag ), being associated with the probability P3..

Thus, the alphabet F comprises the flags f, where ri defines the maximum magnitude (i.e. the maximum absolute value) of the quantized DOT coefficients am in a block. In other words, the alphabet F comprises the flags specifying, as maximum magnitude, all the possible values defined in the alphabets A1.

At step 2110, a flag f from the alphabet F is associated with each of the DOT blocks Bk, based on the maximum magnitude n taken by a quantized DOT coefficient therein. Those flags specify the maximum magnitudes taken by one of the quantized coefficient values of their respective associated block. Those flags will be entropy encoded as further described below.

As a variant, this step may follow step 2115 and then a flag from the restricted alphabet F' as defined below may be selected and directly associated with each DOT block rather than a flag from the larger alphabet F. In that case, the flags may specify only an upper bound of the magnitude taken by the quantized coefficient values of their respective associated block.

The next step 2115 consists in restricting this set or alphabet F of flags into a restricted alphabet F', based on probabilities of occurrence P11, of the flags f of F. This is because, since the block flag f may take many different values for a block (i.e. the alphabet F may be large), the encoding cost of the flag may turn out bigger than the overhead reduction obtained by using the Huffman coding based on the restricted alphabets A as defined below.

The alphabet F is thus restricted to the most significant flag values, which will be used to restrict the alphabet A as further described below.

An example of selecting the most significant flag values is now given, even though other strategies may be implemented in the context of the invention.

In particular, the number of flags selected to constitute the restricted alphabet F' is at most three. This appears to be a good tradeoff between the costs of encoding the flags and of encoding the DOT coefficients based on the restricted alphabets A as defined below.

Furthermore, in order to be able to losslessly encode all quantized DOT blocks, one flag of the flags in the restricted alphabet F' is the flag specifying the largest magnitude in the alphabet F. Let's denote the three values by a, band, with ach<t and standing for the largest magnitude of the alphabet F: thus F' tt:i;,.x} The values a and b may be selected to define intervals of magnitudes (i.e. [0,a], ]a,b]) corresponding to a probability of occurrence greater than a predefined amount, for example 10 %. The probability of occurrence is the probability that a block has a maximum magnitude comprised in [0,a] or [a,b].

For example, a may be chosen as the smallest value with at least 10% probability of occurrence, and thus computed based on the following formula: a =mi4tl 10%«=P(n «=t)whcrcthcbloclcflag isf, } where P(n«=t) means the probability of occurrence of a block having a maximum magnitude equal to or less than t: P(n «= 0 = Similarly, the other value b may be chosen as the smallest value with the next at least 10% probability of occurrence from the previous value a. b verifies the following formula: b=rnin{t 10%«=P(a<n«=t)whcrcthcblockflagisf. I where P(a<n«=t)= One may observe that such conditions do not necessary lead to three different values a < b < . They may result in a restricted alphabet F' having only two values or even only one value Jo if all blocks are zero-skipped or f,, if only very few blocks do not reach the largest magnitude.

In a variant where only two values a and cc are considered, a predefined amount of 40% may be used to determine a.

The conventional Skip flag corresponds to Jo. In that case, one flag in the restricted alphabet F' specifies a (maximum) magnitude equal to zero.

Following step 2115, step 2120 consists in restricting the alphabets A, corresponding to each quantized DOT coefficient in a current block based on the (maximum) magnitude specified in the flag associated with the current block to process.

In particular, the restricting operation is as follows: -all blocks whose flag value is I with n «= a are restricted based on the flag value -all blocks whose flag value is J, with a < ii «= b are restricted based on the flag value J; -all blocks whose flag value is.1 with h <ii are restricted based on the flag value L. Restricting the alphabet A means calculating the intersection between that alphabet and the interval [-n,n] corresponding to the associated flag f (A fl [-n,n]). In other words, the values of A that correspond to a magnitude larger than n are deleted.

In that case, the probabilities (pjrn} associated with each possible values are normalized given that restriction, in order to enable further entropy coding.

One may note that the alphabets with all values less than ii are not restricted but kept as such.

However, as an alternative to restricting the alphabet A, all the alphabets corresponding to the quantized DOT coefficients ot a given block are replaced by a single alphabet made of values equal to the maximum magnitude specified in the flag associated with that block. Thus, the same alphabet is used when encoding all the quantized DCI coefficients of the block, possibly requiring the building of only one Hufiman tree.

As shown in Figure 3a, the probabilities and P..,, (na,b,cc), as well as the values a, band cc, for the restricted alphabet F' are included in the bit-stream 22, where the probabilities P3. for the restricted alphabet are the sums of the previous probabilities P11. over respectively the intervals [0,a], ]a,b] and]b,cc[.

Sometimes, PMa. with ii!= does not really need to be transmitted as is apparent from examples below.

In a second implementation of the flags, the distance of DCT coefficient magnitudes to the maximum possible magnitude allowed by the quantizers (when they may be determined) is used. In this implementation, the magnitude specified in a flag associated with a block represents the value MB-dB, where MB is the maximum magnitude of the possible values of all the alphabets and dB is a block distance to the maximum magnitude defined by c/ := mm 211, -a,4 with am representing the i;am!=O am e A1 quantized coefficient values of the block, A representing one of the alphabets, M being the maximum magnitude of the possible values of the alphabet A and c/B = M8 when all the quantized coefficient values am of the block are zero Let's define M1 as the maximum value of the alphabet A1 of the quantizer of the i-th DCT coefficient; this means that values taken by this quantizer are in the interval Let be the maximum possible magnitude of the blocks: = max The block distance to the maximum magnitude is defined as: cIB:= mm i;am!=O e A1 and d8 is 114fi if all DCI coefficients are zero.

The flag l, can therefore be defined for each block, where 14r11c/11 (which is referred to below as a "plane level").

One can see that the common skipped block full of zeroes corresponds to a nil plane level " 0.

A plane level fl = 1 means that quantizers have been planed MB -1 times; one iteration of planing means that the two outermost quanta have been dropped, until only the central 0 quantum remains.

A plane level / means that quantizers have been planed M11i times.

In particular, = means that quantizers have not been planed.

Figure 22 illustrates such plane levels for the quantizers of six DCT coefficients.

In this Figure, restricting the alphabets of a block comprises deleting the dB most negative and dB most positive possible values from each of the alphabets. This corresponds to dropping the dB outermost quanta from each alphabet, until the sole central quantum remains.

Similarly to the above, the probabilities Pa, and PMB, may then be defined (respectively as the probability that the plane level of a given block is equal ton, and as the probability that the plane level of a whole macroblock is equal to n).

A similar alphabet F as above is also obtained, which can be restricted as also explained above.

In a variant to n-MB-dB, n may be equal to dB as defined above. This has the same result except that n=0 does not correspond to the skip mode as defined in H.264.

Based on the restricted alphabets obtained at step 2125 (when the restricting operation 194 is implemented) or on the initial alphabets A (otherwise), and their associated probabilities, the grouping 195 may be applied.

The probabilistic distributions (i.e. the probabilities p) of the alphabets are used to group at least two alphabets into a new alphabet, the new alphabet replacing the at least two alphabets and defining, as symbols, the combinations of possible symbols for the at least two transformed block coefficients associated with the grouped alphabets.

The grouping of alphabets is to be understood as a product of those alphabets, as set out for example in the book "Elements of Information Theory" (T.M.

Cover, IA. Thomas, Second Edition., ed. Wiley, 2006).

For instance, the product or grouping of two alphabets A{a} and B={b} is made of all the pairs of symbols c=(a, b), with which a corresponding probability is associated p(cj)=p(aj)*p(bj).

The idea here is to consider together several DOT symbols resulting from the quantization as a new single symbol for which a corresponding binary code will be generated using an entropy coding scheme. It appears that the binary codes generated from the new alphabet (resulting from the grouping) give better mean length relative to each alphabet that is grouped.

This improvement is due to the fact that the variable-length-code (VLC) binary codes generated from an alphabet alone have an integer number of bits, moving away its mean length from the theoretical entropy of the alphabet (up to one bit). On the other hand, the product ot several alphabets makes it possible to dilute this one bit cost into several alphabets, decreasing the mean length associated with the part of binary codes for each alphabet.

For illustrative purposes, without grouping of alphabets, the VLC code of each alphabet is not worse than one bit (the above "one bit cost") from the entropy and satisfies the following inequalities: H(A) «= L(A) <H(A) + , where the entropy H(A) is defined by H(A) = _P log2 p, , and the mean length of the VLC code is defined by L(A) = (L being the length in terms of number of bits of the generated binary code associated with the symbol ai).

When grouping several alphabets, let say N times the alphabet A: B = A, the above inequalities for B are if we consider that there is no dependence between occurrences of symbols of the alphabet A (which is the worst case scenario), we have L(B)c.H(B)-i-1=NH(A)+I Since H(B) «= L(B) < 11(B) + 1, it m a y be inferred that L'(A) : L(B) / N < H(A) + 1/ N, proving that the mean length associated with the part of binary codes for each alphabet (when the binary codes are generated for the product of the alphabets) may be substantially improved.

In theory, by taking N large, the entropy encoding may be made as close as wanted to the theoretical entropy, but this would not be practical since large Huffman trees would then have to be handled by the encoder.

Different strategies for grouping the alphabets may be implemented as described below, providing VLC binary codes that are closer, to a greater or lesser extent, to the theoretical entropy of the alphabets.

Figure 10 illustrates an overall method of processing the alphabets associated with the quantized DOT symbols, according to an embodiment of the invention.

The method begins at step 1000 with the alphabets A of the quantized DOT symbols considered (e.g. from the blocks associated with the same base coding mode) and their associated probabilistic distributions At step 1005, which is optional, alphabets having associated entropy or a size (in terms of number of symbols) less than a predefined value are discarded from consideration for grouping alphabets.

For example, the very small alphabets A with H(A) c 0.05 bit are discarded. These alphabets can be dropped because their impact on the PSNR is negligible.

Optionally, when an alphabet is discarded, the encoding method may avoid generating corresponding VLC codes (e.g. Huffman trees) as described below, and may avoid encoding the DOT symbols coming from this alphabet.

Next, at step 1010, it is decided to group alphabets according to an intra-block grouping. It is recalled here that the grouping of alphabets is a product of those alphabets.

Such intra-block grouping groups at least two alphabets associated with at least two DOT coefficients of the same block of pixels.

Such situation is illustrated in Figure 11, in which a first grouping groups the alphabets A1 and A2 for the DOT coefficients i=1 and i=2 of the same DOT block Bk.

A similar grouping is made with the alphabets A3, A5 and A6.

In the example of this Figure, the at least two DOT coefficients for which the grouping is performed are adjacent coefficients within the same block.

Given that several DOT blocks are using the same alphabets, when it is decided to perform an intra-block grouping, the same grouping of alphabets may be performed for a plurality of blocks within the first image, in particular the blocks of the same macroblock (because they are associated with the same base coding mode).

Figure 11 illustrates this situation for two adjacent blocks Bk and Bk÷1.

More details of the intra-block grouping, in particular regarding the criteria to select alphabets to group, are given below with reference to Figure 13.

Further to the intra-block grouping, step 1015 is an inter-block grouping of the alphabets remaining after step 1010, i.e. they may be initial alphabets resulting from the quantization and/or new alphabets resulting from the intra-block grouping.

Such inter-block grouping groups at least two alphabets associated with DCT coefficients collocated in at least two blocks of pixels. In particular, the at least two blocks of pixels may be from the same macroblock dividing the DCT image XDCT.

This is illustrated in Figure 12, in which the intra-block grouping of A3, A5 and A6 in the four blocks of the macroblock MBk are also grouped to form a new alphabet (A3 x A5 x A6)4.

More details of the inter-block grouping, in particular regarding the criteria to select alphabets to group, are given below with reference to Figure 14.

It can be seen that a new alphabet resulting from intra or inter-block grouping replaces the alphabets that are grouped, and that a probabilistic distribution of that new alphabet is also computed as defined above (i.e. based on the probabilistic distributions of the at least two alphabets involved in the grouping). This is because such probabilistic distribution of the new alphabet may be useful when considering whether or not that new alphabet should be grouped again, but is also required to generate VLC codes.

At the end of step 1015, a set of alphabets and associated probabilistic distributions is obtained (step 1020).

Intra-block grouping and then inter-block grouping are performed iteratively as long as a resulting new alphabet comprises less symbols than a predefined number.

That means that the iterative process is continued up to a given upper limit of the cardinal of the grouped alphabets. This is in order to ensure that the associated VLC codes, e.g. conventional Huffman trees, are small enough to be handled in real time by the codec.

Several methods for the intra-block grouping will now be discussed.

With reference to Figure 13, a first method for intra-block grouping of alphabets is based on the entropy of those alphabets. This is because the alphabets with very small entropy are those with the largest overhead, due to the upper bound of one bit that may be asymptotically reached. On the contrary, the alphabets with high entropy have generally a lower overhead.

Given this, the method comprises determining an entropy value for each of the alphabets, and the grouping of at least two alphabets comprises grouping the available alphabet having the smallest entropy with the available alphabet having the highest entropy. As known, the entropy value derives from the probabilistic distribution of the alphabet.

When it is desired to ensure that the resulting new alphabet comprises less symbols than a predefined number, the smallest (and respectively the largest) entropy alphabet may not be the alphabet having the smallest (and respectively the largest) entropy as such, but those alphabets that satisfy this constraint. For example, the alphabet with the smallest entropy is chosen for a grouping with the compatible (i.e. alphabet product size is smaller than the given upper limit) alphabet having the highest entropy.

The grouping based on the probabilistic distributions (indirectly the entropy) of the alphabets is iteratively repeated until there are no more alphabets satisfying the constraint. It is to be noted that a group of alphabets resulting from the intra-block grouping is considered as an alphabet for the next intra-block grouping iteration.

The initial state 1300 comprises the alphabets A1 A of a DCT block that is to be processed, as well as their associated probabilistic distribution {Pi,mi} {Pn.nin} (mi" being an index varying from 0 to the number of symbols within the corresponding alphabet Ai). To each alphabet A is appended the list of the corresponding DCT coefficients, initially only the DCT coefficient with index i.

At step 1305, the entropy H(A) of each alphabet is computed, and the alphabet having the minimum entropy is selected.

At step 1310, the alphabet Amax which has the maximum entropy amongst the other alphabets, while satisfying the constraint on the cardinal of the product AminXAmax (cardinal less that an upper limit denoted MAX_ALPHABET_SIZE in the Figure), is then selected.

In case such a pair (Amin, is found (test 1315), the step 1320 is executed during which: -the product Amin X Amax is considered as a new alphabet for the next iteration, by determining its corresponding symbols c. Amj. and Amax taken alone are discarded from the list of the alphabets for the next iteration; -the probabilistic distribution {Pnew,rn} of the new alphabet is calculated based on the probabilistic distribution of the grouped alphabets: Pnewm Pminni x Prnaxn1 -a list of the DCT coefficients associated with the new alphabet is appended to that alphabet.

In case no pair is found at test 1315, the intra-block grouping ends at step 1325 with a plurality of alphabets (some group initial alphabets A) and their associated probabilistic distributions.

Such intra-block grouping efficiently improves the entropy of VLC codes generated from the resulting alphabets. This is because in the resulting alphabet, the largest probability (which induces a large overhead) is bounded by the highest probability of the alphabet with high entropy. Therefore, the remaining highest probability induces reduced overhead.

A second method for intra-block grouping of alphabets is now described.

Compared to the first method, this method is more efficient but requires more calculation. Instead of only calculating the entropy of the alphabets, the second method selects alphabets such that the reduction of coding overhead is maximum, meaning that the steps 1305 and 1310 are different (but the other steps are the same).

To that end, the second method comprises grouping two alphabets which maximizes a reduction in distance between the entropy of the alphabets and the entropy of the binary codes constructed from the alphabets. Generally, the binary codes are generated from the alphabet and the associated probabilistic distribution, based on a Huffman code.

Unfortunately, the distance of a Huffman code to the entropy cannot be anticipated without a costly and impractical operation of constructing the Huffman tree itself. In order to simplify the computational complexity, the entropy of the binary codes constructed from the alphabets is modelled by the entropy of a corresponding Shannon code. This is because the distance d(A5) as defined below provides an upper bound of the Huffman code distance to the entropy of the alphabet A1.

The entropy of Shannon code is defined by PrJb0 Ti where [1 is the integer ceiling function.

In the second method, this entropy is used to estimate the overhead or distance to the entropy of the alphabet. This distance of the Shannon code to the entropy of the alphabet A is defined as follows: = ±4ioc 111 - log The reduction in overhead provided by the grouping of two alphabets A and A is the following: Reduction = d(A1) + d(A) -d(A xA).

Therefore, the second method for the intra-block coding consists in selecting (corresponding to the two steps 1305 and 1310) the two alphabets A1 and A1 which provide the maximum reduction together while satisfying the constraint on cardinality: ArgMax card i2cA MAX -AT. PEA±a.ET SIZE If such a pair is found, step 1320 is performed, ensuring that the group of alphabets resulting from the intra-block grouping is considered as an alphabet for the next intra-block grouping iteration, with an associated probabilistic distribution.

As for the first method, the grouping of the alphabets is iteratively repeated until there are no more alphabets satistying the constraint on the size of A1 x A. A third method for the intra-block grouping derives from the second method, wherein each distance relating to an alphabet is weighted by the entropy of that alphabet. This gives priority to grouping alphabets having small entropy rather than alphabets with larger entropy, when those alphabets have binary codes with similar distances to the entropy.

Compared to the second method, the alphabets to group are selected based on the following formula: -J d(A) d(A) -d(A±xA7) 1.

rgax / -. I HA.) HA.) HA.xA.

-AL.PAPT L -S Whatever the method selected for the intra-group grouping, step 1010 ends with a set of M alphabets (some grouping several initial alphabets A) and associated probabilistic distributions.

Figure 14 illustrates an example of inter-block grouping that may be applied to these alphabets obtained at the end of step 1010. Let's denote those alphabets byA' when beginning the inter-block grouping 1015.

In this example, the inter-block grouping is performed between blocks 8k of the same macroblock MBk. Such a situation has been previously introduced with reference to Figure 12.

The inter-block grouping is performed iteratively by making the product alphabet of an alphabet A'1 multiplied by itself, due to the same base coding mode within the macroblock (which permits the same intra-block grouping to be performed in all the blocks of that macroblock).

If it is grouped zero times, the result is one alphabet per block Bk.

If it is grouped once, the result is one alphabet per 2 blocks (i.e. halt the macro block).

If ills grouped twice, the result is one alphabet per 4 blocks (i.e. per a macroblock). This is the case of Figure 12 in which the alphabet A'=(A3 x A5 x A6) is grouped twice to obtain the new alphabet (A3 x A5 x A6)4 for the macroblock MBk. In this example, while (A3 x A5 x A6)4 is only one alphabet, its symbols represent 12 OCT coefficients at the same time, namely OCT coefficients with indexes 3, 5 and 6 of the four blocks Bk1, Bk2, Bk3 and Bk4.

With reference to Figure 14, the initial slate 1400 comprise the M alphabets A'1, as well as their associated probabilistic distributions {p}. nbCoeffs defines the number of DCT coefficients that are appended to a considered alphabet.

Step 1405 initialises an index a enabling the iterative processing ot the inter-block grouping through steps 1425 and 1430. Each iteration a processes the alphabet A', trying to group it with itself, once or twice (output "no" of test 1420 driving the second grouping of this alphabet).

In the example, the only condition for allowing a grouping is the given upper limit (MAX_ALPHABET_SIZE) in the cardinality on the inter-block grouped alphabet.

However, other conditions may be implemented. They are checked at step 1410.

When a grouping of the alphabet A' with itself is possible, step 1415 is performed. In particular, this step updates the appended list of OCT coefficients concerned by the new alphabet obtained. This list doubles each time the alphabet is grouped with itself.

Step 1415 also updates the probabilistic distribution of that new alphabet.

This iterative processing is over when all the alphabets A' have been processed, meaning that they have been grouped twice or they cannot satisfy the inter-block grouping constraint regarding cardinality.

Step 1015 of inter-block grouping ends with a set of M' alphabets (some grouping several initial alphabets A for several blocks) and associated probabilistic distributions.

Such a set is obtained for a specific macroblock with which a specific base coding mode is associated, as introduced above.

Following the grouping, binary entropy codes for the symbols defined in the M' alphabets are generated, in such a way that the quantized OCT symbols are then encoded using those generated binary codes.

The same order of the DOT symbols is applied at the encoding and at the decoding, since the decoder performs the same operations as the encoder (grouping based on the probabilities and probabilistic distribution parameters that it will receive).

For example, this order may be the zigzag scanning order, taking into account the grouping of alphabets (i.e. grouping of DCT coefficients): the smallest index of a DOT coefficient amongst the grouped indexes may be used to represent the encoding/decoding position of the grouped coefficients, along the zigzag order.

The binary codes may be VLO codes obtained from the probabilistic distribution, based on a Huffman code. Huffman trees are thus generated to obtain the corresponding VLC binary codes for each of the M' alphabets.

The entropy coding of the quantized DCT coefficients of a macroblock is illustrated in Figure 15. The process shown in this Figure is based on the tact that it is possible to alternate between many Huffman trees during the encoding without altering the decoding capability: the generated bit-stream 20 is still a prefix code and there is no need to add extra syntax to specify which DOT coefficients of the blocks have been grouped within a single alphabet (i.e. which DOT coefficients are encoded by the same binary code), since the decoder will perform the same grouping operations and obtain the same grouped alphabets.

The algorithm of Figure 15 has to be processed for each macroblock of the DCT image XDCT to encode the entire residual enhancement image.

The process begins (step 1500) with the obtaining of the restricted flag alphabet F' as constructed above, based on the current base coding mode.

At step 1502. the flags f of the macroblock are encoded with the flag value Ta, tb or f., depending on which interval [0,a], lab] or]b,cc[ the value n belongs to. This means that one of f8, fb and f,, is thus associated with each block of the current macroblock. This substitution between f,, and one of f, T and f,, can however be done previously, when restricting the alphabets.

In particular, the flag associated with each block is entropy encoded based on the probabilities of occurrence of the flags of the alphabet F', used to construct VLO (e.g. Huffman) codes.

However, to further improve the encoding of those flags, a grouping as defined above can be implemented: the flags associated with the blocks of the same macroblock are concatenated into one flag associated with the macroblock. And the flag associated with the same macroblock are entropy encoded based on probabilities of occurrence of the possible values for such a macroblock flag fMB, those possible values resulting from the product of the alphabet F of flags from which the concatenated flags are selected: fMB c FMB:= F'xF'xF'xF'.

Furthermore, the probabilities of occurrence of the possible values for a macroblock flag are calculated based on the probability PMB,n defined above. This is because since >> PEn' the other probabilities for a macroblock flag have to be normalized. For example, it only PMB,O is considered for that normalization: p(fo,f0,fo,fo)=pMB,o p(fifj,fk,fI) V Pai PB.j PB,k PB,I with (i,j,k,Dc{a,b,ce}4 and is not (0,0,0,0) and y=(1_PM8.o)/ EPBJPB,IPHJCPB,I to normalize the probabilities.

In a variant the probabilities PMB. may also be used, in which case y is adapted: PBJPB,!PB.kPB.1, and the probabilities PMB,n are I? (i.j.k,i)!=(;,n,n.n) N/fl provided in the bit-stream to the decoder.

For each macroblock, the niacroblock flag TMB is thus encoded using a Huffman coding based on the above probabilities.

Figures 23 illustrates the order of coding: a flag fMB followed by the Huffman coding of its associated macroblock data, then another flag followed by the Huffman coding of the associated next macroblock and so on.

Such a structure preserves the spatial random access capability in the stream.

It is to be noted that if a block has the flag.to associated with it, the encoding of that block is skipped because the decoder will know from that flag that all values are zeroes in that block.

Further to step 1502, step 1504 consists in obtaining the set S of M' symbol groups (i.e. of the M' alphabets issued from the intra and inter-block grouping) corresponding to that coding mode.

For the following explanations: -g represents the current alphabet whose symbols are being encoded for the current rnacroblock; -g.nbDGTCoe ifs represents the total number of DCT coefficients coded by each symbol of the current alphabet g (i.e. the number of DCT coefficients appended to that alphabet); -g.nbBlock represents the value nbBlock obtained during the inter-block grouping for the alphabet g (see Figure 14); -g.factor1 represents the cardinal of individual alphabets A that were grouped together to form the current alphabet g; and -quantized_coefIg.dct1 +block_index*64] is the value of the ith quantized DOT coefficient stored in the list appended to the current alphabet g, for the current symbol (since a symbol may represent several initial DOT coefficients). The current symbol being encoded is represented by the block_index variable.

The M' alphabets are processed iteratively through steps 1505, 1545 (which detects the last alphabet) and 1550 (which selects another alphabet).

For the current alphabet q, the algorithm encodes one or more symbols of that alphabet, depending on the number of 8x8 DOT blocks (variable nhBlock obtained from Figure 14 for that alphabet) associated with the current alphabet.

For instance, if the current alphabet g results from two inter-block groupings (i.e. A' has been grouped with itself twice), then one symbol of that alphabet encodes the appended DCI coefficients in the four blocks of the current macroblock (nbBlock4).

If it results from only one inter-block grouping, then it encodes the appended OCT coefficients of two consecutive BxB blocks (nbBlock=2).

If it has not been inter-block grouped, then it encodes the appended OCT coefficients of each block of the macroblock (nbBloclc-1).

Therefore, the number of symbols coded for the current alphabet depends on the number of blocks it covers. Thus, a loop is performed to encode the symbols of that current alphabetg, through steps 1510 (which initializes block_index), 1530 (which increments the block_index according to nbBlock) and 1535 (which checks whether all the blocks of the macroblock have been processed).

For each symbol to code (i.e. occurring in the macroblock), the symbol value symbol Val is computed at step 1515. This consists in combining quantized DOT coefficient values together so as to obtain the right symbol value.

Once the symbol value symbolVal is obtained, the Huffman code associated with that symbol value is obtained at step 1520 from the constructed Huffman trees. Then that Huffman code is written in the output video bitstream 20 at step 1525.

Thanks to the use of the flags f (or fMB), the DOT coefficients are encoded using the restricted alphabets. In particular: -all blocks whose flag value is L with a are encoded with flag value I, i.e. with alphabets having magnitudes less than a (for the first implementation of the flag as described above); -all blocks whose flag value is L with a < n «= b will be encoded with flag value ib, i.e. with alphabets having magnitudes between a and b; -all blocks whose flag value is i with h < n will be encoded with flag value £, i.e. without restricting the alphabets.

As explained above, this reduces the average number of bits (entropy) to encode the DOT coefficients. One may note that it does not greatly increase the computational complexity since at most two extra Huffman trees will be constructed (associated with a and b).

For the sake of completeness, with reference to Figure 16, the corresponding decoding of the bitstream 20 will now be described, as schematically shown in Figure 4a, where 333 references the restricting of alphabets thanks to the flags f for decoding, and 334 references the grouping of alphabets for decoding.

The decoding is to be performed successively for each encoded macroblock.

This process is similar to that of Figure 15 regarding the loops on the alphabets and on the block_index.

One difference between the coding and the decoding algorithm consists in step 1602 which is the decoding of the macroblock flag FMB and step 1604 which includes obtaining the alphabets restricted given the flags f.

During this decoding of the flags, the process comprises -obtaining alphabets of values for the encoded quantized DCT coefficients, each alphabet defining the possible values of a quantizer associated with an encoded quantized coefficient; -obtaining, from the bit-stream, a flag associated with each block of encoded quantized coefficients, said flag specifying a magnitude; -restricting alphabets corresponding to the encoded quantized coefficients of each block, based on the magnitude specified in the flag associated with that block.

In particular, from the bit-stream 22, probabilities are obtained associated with each of the alphabet F' of possible flags; and the encoded flags are entropy decoded from the bit-stream based on the obtained probabilities to obtain said flag associated with each block. Those probabilities make it possible to reconstruct the Huffman trees. They are the probabilities associated with the flags f0, fb and j.

As mentioned above, the macroblock flag concatenates several block flags.

Therefore, the decoding process comprises splitting the decoded macroblock flag into flags associated with the blocks of the macroblock.

Thanks to the decoded flags and the known quantizers, it is possible to determine the restricted alphabets (A fl [-n,n]), based on which the decoding of the encoded quantized coefficients is performed.

In particular, the decoder performs the same restriction as the one performed by the encoder: -with the first implementation of the flags, restricting the alphabets of a block comprises deleting, from the alphabets, the possible values that have a magnitude (i.e. absolute value) larger than the magnitude specified in the flag associated with the block; -with the second implementation (the flag conveys information about block distance to the maximum magnitude, referred to as dB), restricting the alphabets of a block comprises deleting the dB most negative and dB most positive possible values from each of the alphabets. In case the magnitude n specified in the flag equals to MB-dB, it is to be noted that MB can be retrieved from the probabilistic distributions of the DCT coefficients (i.e. from parameters ajj3j) and the associated quantizers, to deduce dB.

Another difference between the coding and the decoding algorithm consists in the two decoding steps illustrated by the dashed lines in the Figure.

These two steps consist in the decoding of a Huffman code, through the use of the Huffman tree associated with current grouped alphabet g. The decoded Hufiman code provides a symbol of the alphabet g.

This symbol then provides the values of the quantized OCT coefficients that are associated with current alphabet g, in a way symmetrical to step 1515. In particular, the decoding may comprise: -obtaining, from the bit-stream EBS, blocks of encoded symbols am corresponding to quantized block DCT coefficients; -obtaining alphabets A of symbols, each alphabet defining the possible symbols of an associated quantized block coefficient; -for each alphabet, obtaining a probabilistic distribution of the possible symbols defined therein; -using the probabilistic distributions to group at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two quantized block coefficients associated with the grouped alphabets; -obtaining binary entropy codes for the possible symbols of each remaining alphabet; -decoding the encoded symbols using the obtained binary codes.

More particularly, parameters (aj3) of each alphabet A can be obtained from the bit-stream 21 and then applied to the above defined probabilistic distribution model GGD(ctj3) to obtain the probabilistic distribution of each alphabet (requiring integrating that model on the considered quantums Qm as explained above). Given the knowledge of the quantizers by the decoder, the latter is then able to perform the same grouping operations as the encoder, resulting in the same grouped alphabets.

The improvement of the entropy coding thanks to the grouping of alphabets is illustrated in Figure 17, which is similar to Figure 5 in which a curve of the rate-distortion performance obtained with the grouping according to the invention is shown (dotted curve).

The Figure shows that a bitrate close to the theoretical entropy rate is achieved due to the intra and inter-block grouping of the invention.

The compression gain resulting from the use of the above flags f is about 5%, when both the restriction of alphabets and the entropy encoding of those flags are implemented.

The entropy coding scheme according to the present embodiment also has spatial random access properties, when the video encoding process is without inter frame (temporal) and intra block (spatial) predictions.

This is because the entropy coding according to the present embodiment has no dependence between macroblocks. In particular, if an entry point of the generated bit-stream 20 and the index of the associated macroblock are given, it is possible to perform the entropy decoding from that point, without decoding other parts of the encoded video.

It is said that the bit-stream has the random spatial access property because it is possible to decode only a part of the image (a region of interest) once the associated entry points are given.

Figure 18 illustrates how the residual enhancement image may be subdivided into spatial zones made of macroblocks, with entry points in order to allow efficient random access compliant coding.

The position of the entry points may be encoded in the header of the bit-stream 20 in order to facilitate easy extraction from the server side and allow the reconstruction of a valid stream on the decoder side.

Figure 19 shows the meta-organization of a bit-stream header. For example, the slice header shown in the Figure re-uses the slice header of the H.264/AVC video compression standard. However, to provide the entry points, there is added, at the beginning of each slice header, a new field ("coded slice length") which indicates the length in bytes of the coded slice. The entry points can therefore be easily

computed from the "coded slice length" fields.

Another advantage of this independence between macroblocks of the residual enhancement image is the possibility to perform parallel entropy decoding on the decoder side. Each decoding thread starts decoding from one of the entry points as defined above.

This invention provides a simple, efficient and block based entropy coder of the quantized DCT coefficients. In addition, the possible absence of intra prediction between blocks and the judicious choice of the entropy coder provides random spatial access into the video bitstream, i.e. the bitstream can be decoded from any entry point in the bit-stream.

Examplarv Encoding/Decoding Apparatus With reference now to Figure 20, a particular hardware configuration of a device for encoding or decoding images able to implement methods according to the invention is now described by way of example.

A device implementing the invention is for example a microcomputer 50, a workstation, a personal digital assistant, or a mobile telephone connected to various peripherals. According to yet another embodiment of the invention, the device is in the form of a photographic apparatus provided with a communication interface for allowing connection to a network.

The peripherals connected to the device comprise for example a digital camera 64, or a scanner or any other image acquisition or storage means, connected to an input/output card (not shown) and supplying image data to the device.

The device 50 comprises a communication bus 51 to which there are connected: -a central processing unit CPU 52 taking for example the form of a microprocessor; -a read only memory 53 in which may be contained the programs whose execution enables the methods according to the invention. It may be a flash memory or EEPROM; -a random access memory 54, which, after powering up of the device 50.

contains the executable code of the programs of the invention necessary for the implementation of the invention. As this memory 54 is of random access type (RAM), it provides fast access compared to the read only memory 53. This RAM memory 54 stores in particular the various images and the various blocks of pixels as the processing is carried out (transform, quantization, storage of the reference images) on the video sequences; -a screen 55 for displaying data, in particular video and/or serving as a graphical interface with the user, who may thus interact with the programs according to the invention, using a keyboard 56 or any other means such as a pointing device, for example a mouse 57 or an optical stylus; -a hard disk 58 or a storage memory, such as a memory of compact flash type, able to contain the programs of the invention as well as data used or produced on implementation of the invention; -an optional diskette drive 59, or another reader for a removable data carrier, adapted to receive a diskette 63 and to read/write thereon data processed orto process in accordance with the invention; and -a communication interface 60 connected to the telecommunications network 61, the interface 60 being adapted to transmit and receive data.

In the case of audio data, the device 50 is preferably equipped with an input/output card (not shown) which is connected to a microphone 62.

The communication bus 51 permits communication and interoperability between the different elements included in the device 50 or connected to it. The representation of the bus 51 is non-limiting and, in particular, the central processing unit 52 unit may communicate instructions to any element of the device 50 directly or by means of another element of the device 50.

The diskettes 63 can be replaced by any information carrier such as a compact disc (CD-ROM) rewritable or not, a ZIP disk or a memory card. Generally, an information storage means, which can be read by a micro-computer or microprocessor, integrated or not into the device for processing a video sequence, and which may possibly be removable, is adapted to store one or more programs whose execution permits the implementation of the method according to the invention.

The executable code enabling the coding device to implement the invention may equally well be stored in read only memory 53, on the hard disk 58 or on a removable digital medium such as a diskette 63 as described earlier. According to a variant, the executable code of the programs is received by the intermediary of the telecommunications network 61, via the interface 60, to be stored in one of the storage means of the device 50 (such as the hard disk 58) before being executed.

The central processing unit 52 controls and directs the execution of the instructions or portions of software code of the program or programs of the invention, the instructions or portions of software code being stored in one of the aforementioned storage means. On powering up of the device 50, the program or programs which are stored in a non-volatile memory, for example the hard disk 58 or the read only memory 53, are transferred into the random-access memory 54, which then contains the executable code of the program or programs of the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.

It will also be noted that the device implementing the invention or incorporating it may be implemented in the form of a programmed apparatus. For example, such a device may then contain the code of the computer program(s) in a fixed form in an application specific integrated circuit (ASIC).

The device described here and, particularly, the central processing unit 52, may implement all or part of the processing operations described in relation with Figures Ito 19 and 21 to 34, to implement methods according to the present invention and constitute devices according to the present invention.

Channel Modelling and Selection of Quantizers The following description focuses on channel modelling and selection of quantizers.

The quality of a video or still image may be measured by the so-called Peak-Signal-to-Noise-Ratio or PSNR, which is dependent upon a measure of the L2-norm of the error of encoding in the pixel domain, is. the sum over the pixels of the squared difference between the original pixel value and the decoded pixel value. It may be recalled in this respect that the PSNR may be expressed in dB as: where M( is the maximal pixel value (in the spatial domain) and MSE is the mean squared error (i.e. the above sum divided by the number of pixels concerned).

However, as noted above, most of video codecs compress the data in the OCT-transformed domain in which the energy of the signal is much better compacted.

The direct link between the PSNR and the error on DCT coefficients is now explained.

For a residual block, we note y',. its inverse DCT (or IDCT) pixel base in the pixel domain as shown on Figure 24. If one uses the so-called IDCT Ill for the inverse transform, this base is orthonormal: = 1.

On the other hand, in the OCT domain, the unity coefficient values form a base q which is orthogonal. One writes the OCT transform of the pixel block X as follows: X0 = where d' is the value of the n-th DCT coefficient. A simple base change leads to the expression of the pixel block as a function of the DCT coefficient values: x = IDCT(X) = TDCTcI"Q,, =crfDCTQp) =d"y,,.

If the value of the de-quantized coefficient d after decoding is denoted one sees that (by linearity) the pixel error block is given by:s = (d" -d)w The mean L2-norm error on all blocks, is thus: EII)= Ed" -dj = EUd j2) = where D2 is the mean quadratic error of quantization on the n-th OCT coefficient, or squared distortion for this type of coefficient. The distortion is thus a measure of the distance between the original coefficient (here the coefficient before quantization) and the decoded coefficient (here the dequantized coefficient).

It is thus proposed below to control the video quality by controlling the sum of the quadratic errors on the OCT coefficients. In particular, this control is preferable compared to the individual control of each of the DCT coefficient, which is a priori a sub-optimal control.

In the embodiment described here, it is proposed to determine (i.e. to select in step 191 of Figure 3) a set of quantizers (to be used each for a corresponding DCT channel), the use of which results in a mean quadratic error having a target value D,2 while minimising the rate obtained.

In view of the above correspondance between PSNR and the mean quadratic error D on DCT coefficients, these constraints can be written as follows: minimize R = st. = 13,2 (A) where R is the total rate made of the sum of individual rates Rr for each DCI coefficient. In case the quantization is made independently for each DCT coefficient, the rate R depends only on the distortion D of the associated n-th DCI coefficient.

It may be noted that the above minimization problem (A) may only be fulfilled by optimal quantizers which are solution of the problem minimize R(D) st. Eodr _d8) = D (B).

This statement is simply proven by the fact that, assuming a first quantizer would not be optimal following (B) but would fulfil (A), then a second quantizer with less rate but the same distortion can be constructed (or obtained). So, if one uses this second quantizer, the total rateR has been diminished without changing the total distortion; this is in contradiction with the first quantifier being a minimal solution of the problem (A).

As a consequence, the rate-distortion minimization problem (A) can be split into two consecutive sub-problems without losing the optimality of the solution: -first, determining optimal quantizers and their associated rate-distortion curves Rr(Dr) following the problem (B), which will be done in the present case for GOD channels as explained below; -second, by using optimal quantizers, the problem (A) is changed into the problem (A_opt): minimize R = Rn(Dr) st. = D and Rr(Dn)iS optimal (A_opt).

Based on this analysis, it is proposed as further explained below: -to compute off-line optimal quantizers adapted to possible probabilistic distributions of each DCI channel (thus resulting in the pool of quantizers of Figure 3); -to select one of these pre-computed optimal quantizers for each OCT channel (lie, each type of OCT coefficient) such that using the set of selected quantizers results in a global distortion corresponding to the target distortionD with a minimal rate (Le. a set of quantizers which solves the problem A_opt).

It is now described a possible embodiment for the first step of computing optimal quantizers for possible probabilistic distributions, here Generalised Gaussian Distributions.

It is proposed to change the previous complex formulation of problem (B) into the so-called Lagrange formulation of the problem: for a given parameter A> 0, we determine the quantization in order to minimize a cost function such as + AR.

We thus get an optimal rate-distortion couple (DA, R3. In case of a rate control (Le.

rate minimisation) for a given target distortion A, the optimal parameter A >0 is determined by A = argmin R) (Le. the value of A for which the rate is minimum while fulfiling the constraint on distortion) and the associated minimum rate is RA = R As a consequence, by solving the problem in its Lagrange formulation, for instance following the method proposed below, it is possible to plot a rate distortion curve associating a resulting minimum rate to each distortion value (A]? ) which may be computed off-line as well as the associated quantization, Le. quantizer, making it possible to obtain this rate-disortion pair.

It is precisely proposed here to formulate problem (B) into a continuum of problems (B_lambda) having the following Lagrange formulation minimize D +2R(D) st. Ex_cI2)= D (B_lambda).

The well-known Chou-Lookabaugh-Gray algorithm is a good practical way to perform the required minimisation. It may be used with any distortion distance c/ we describe here a simplified version of the algorithm for the L2-distance. This is an iterative process from any given starting guessed quantization.

As noted above, this algorithm is performed here for each of a plurality of possible probabilistic distributions (in order to obtain the pre-computed optimal quantizers for the possible distributions to be encountered in practice), and for a plurality of possible numbers M of quanta. It is described below when applied for a given probabistic distribution P and a given number M of quanta.

In this respect, as the parameter alpha a (or equivalently the standard deviation a of the Generalized Gaussian Definition) can be moved out of the distortion parameter D because it is a homothetic parameter. only optimal quantizers with unity standard deviation a = t need to be determined in the pool of quantizers.

Taking advantage of this remark, in the proposed embodiment, the GGD representing a given DCT channel will be normalized before quantization (i.e. homothetically transformed into a unity standard deviation GGD), and will be de-normalized after de-quantization. Of course, this is possible because the parameters (in particular here the parameter a or equivalently the standard deviation a) of the concerned GGD model are sent to the decoder in the video bit-stream.

Before describing the algorithm itself, the following should be noted.

The position of the centroids cm is such that they minimize the distortion ö inside a quantum, in particular one must verify that i8 = 0 (as the derivative is zero at a minimum).

As the distortion rn of the quantization, on the quantum Qrn' is the mean error E(d(x;crn)) for a given distortion function or distance d, the distortion on one quantum when using the L2-distance is given by ó, =J k_cj2PC)th and the nullification of the derivative thus gives: c = J XP(X)dX/Pm, where rn is the probability of x to be in the quantum Qm and is simply the following integral = J P(x)dx.

Turning now to minimisation of the cost function C = D2 +AR, and considering that the rate reaches the entropy of the quantized data: R = -P, log, f, , the nullification of the derivatives of the cost function for an optimal rn-I solution can be written as: 0 FI =. -A hi +A÷1 bPmij Let us set P= P(ç3the value of the probability distribution at the point t,. From simple variational considerations, see Figure 25, we get = and 0c1i = P. Then, a bit of calculation leads to km1 _Cml +1 Dik_Crnl2P(X)dX = P tm11 _Cml -2, cmJI: 1 (XCm)P(4 = flc_1 -, 1 as well as = -l'+1 _Cmil.

As the derivative of the cost is now explicitly calculated, its cancellation gives:O= 1ml2_2Thlrn mf m+1 1m 2 +AhIPrn+i m' which leads to a useful relation between the quantum boundariestm,tmi c+c HnP -mR andthecentroidsc:t = nIl_A "1* rn mu 2 2(c1 m) Thanks to these formulae, the Chou-Lookabaugh-Gray algorithm can be implemented by the following iterative process: 1. Start with arbitrary quanta Qm defined by a plurality of limits tm 2. Compute the probabilities m by the formula P, = j' P(x)dx 3. Compute the centroids Cm by the formula c = J xP(x)th/P,, 4. Compute the limits tm of new quanta by the formula -Cm 1)L -1nI rn-i-i 2 L(C-il -cm) 5. Compute the cost C = D2 + AR by the formula C = -nI-i 6. Loop to 2. until convergence of the cost C When the cost C has converged, the current values of limits trn and centroids Cm define a quantization, Le. a quantizer, with M quanta, which solves the problem (B_lambda), i.e. minimises the cost function for a given value A-, and has an associated rate value R2 and an distortion vatue Q. Such a process is implemented for many values of the Lagrange parameter A (for instance 100 values comprised between 0 and 50). It may be noted that for A equal to 0' there is no rate constraint, which corresponds to the so-called Lloyd quantizer.

In order to obtain optimal quantizers for a given parameter /3 of the corresponding GGD, the problems (B_lambda) are to be solved for various odd (by symmetry) values of the number Al of quanta and for the many values of the parameterA. A rate-distortion diagram for the optimal quantizers with varying M is thus obtained, as shown on Figure 26.

It turns out that, for a given distortion, there is an optimal number M of needed quanta for the quantization associated to an optimal parameter A. In brief, one may say that optimal quantizers of the general problem (B) are those associated to a point of the upper envelope of the rate-distortion curves making this diagram, each point being associated with a number of quanta (Le. the number of quanta of the quantizer leading to this point of the rate-distortion curve). This upper envelope is illustrated on Figure 27. At this stage, we have now lost the dependency on A of the optimal quantizers: for a given rate (or a given distortion) corresponds only one optimal quantizer whose number of quanta M is fixed.

Based on observations that the GGD modelling provides a value off3 almost always between 0.5 and 2 in practice, and that only a few discrete values are enough for the precision of encoding, it is proposed here to tabulate /3 every 0.1 in the interval between 0.2 and 2.5. Considering these values of /3 (i.e. here for each of the 24 values of /3 taken in consideration between 0.2 and 2.5), rate-distortion curves, depending on /3, are obtained as shown on Figure 28. It is of course possible to obtain according to the same process rate-distortion curves for a larger number of possibe values of /3.

Each curve may in practice be stored in the encoder in a table containing, for a plurality of points on the curve, the rate and distortion (coordinates) of the point concerned, as well as features defining the associated quantizer (here the number of quanta and the values of limits m and centroids Cm for the various quanta). For instance, a few hundreds of quantizers may be stored for each /3 up to a maximum rate, e.g. of 5 bits per DCT coefficient, thus forming the pool of quantizers mentioned in Figure 3. It may be noted that a maximum rate of 5 bits per coefficient in the enhancement layer makes it possible to obtain good quality in the decoded image.

Generally speaking, it is proposed to use a maximum rate per DCI coefficient equal or less than 10 bits, for which value near lossless coding is provided.

Before turning to the selection of quantizers, for the various DCT channels and among these optimal quantizers stored in association with their corresponding rate and distortion when applied to the concerned distribution (GGD with a specific parameter /3), it is proposed here to possibly encode only part of the DOT channels.

Based on the observation that the rate decreases monotonously as a function of the distortion induced by the quantizer, precisely in each case in the manner shown by the curves just mentioned, it is possible to write the relationship between rate and distortion as follows: R = f. (-lii(D,, /a)), where a, is the normalization factor of the DCT coefficient, La the GGD model associated to the DCI coefficient has o-for standard deviation, and where L »= 0 in view of the monotonicity just mentioned.

In particular, without encoding (equivalently zero rate) leads to a quadratic distortion of value a, and we deduce that 0 = f,(0).

Finally, one observes that the curves are convex for parameters /3 lower thantwoj3«=2 = It is proposed here to consider the merit of encoding a DOT coefficient.

More encoding basically results in more rate Rr (in other words, the corresponding cost) and less distortion D, (in other words the resulting gain or advantage).

Ihus, when dedicating a further bit to the encoding of the video (rate increase), it should be determined on which DOT coefficient this extra rate is the most efficient. In view of the analysis above, an estimation of the merit M of encoding may be obtained by computing the ratio of the benefit on distortion to the cost of encoding: ill = AD2 AR,, Considering the distortion decreases by an amounts, then a first order development of distortion and rates gives (D-s) =D2-2cD+o(c) and R(D -E) = f (-In((D -)lcr)) = fr(-1n(Dlc)1n(1 -Elm) = = j:,(-In(Dla)) + sf'(-ln(D!a))ID As a consequence, the ratio of the first order variations provides an explicit formula for the merit of encoding: *1 (D) = f, L111(Dr lan)) If the initial merit Al,? is defined as the merit of encoding at zero rate, Le.

before any encoding, this initial merit M! can thus be expressed as follows using the preceding formuIa:M:=M()= (because as noted above no encoding leads to a quadratic distortion of value o).

It is thus possible, starting from the pre-computed and stored rate-distortion curves, to determine the function f. associated with a given DCT channel and to compute the inital merit M,? of encoding the corresponding DCT coefficient (the value f'(O) being determined by approximation thanks to the stored coordinates of rate-distortion curves).

It may further be noted that, for J3 lower than two (which is in practice almost always true), the convexity of the rate distortion curves teaches us that the merit is an increasing function of the distortion.

In particular, the initial merit is thus an upper bound of the merit: In view of this, it is proposed to order the DCT coefficients by decreasing initial merit: »= 3w, »= ... »= »=..., and to encode only coefficients (non-nil-rate encoded DCT coefficients) which indexes are a left segment of the tuple (n1 n2,..., k') Said differently, after ordering the DCT channels in decreasing initial merit order, it is proposed to encode only a certain number of the first DCT channels taken in this order. This number may range from 0 (nothing is encoded) to the total number N of DCT channels considered (in which case each and every coefficient is in fact encoded).

The exact number of OCT channels to be encoded is determined during the quantizer selection process as explained below.

The optimality of this solution (encoding only the first coefficients ordered by inital merit) can be proven as follows.

If we assume that a first given DCT coefficient n1 is encoded and that there is another second OCT coefficient which is not encoded and which has a higher initial merit, then an infinitesimal amount of coding rate can be taken from the first coefficient n, to encode the second coefficient n1. Because one has Ai »= M, »= Mr(Rz) it is clear that one gets a lower distortion for the same rate. So, the encoding of the DCT coefficients is not optimal and one understands that, if a coefficient is encoded, then all coefficients with higher initial merits must be encoded.

As a corollary, if there are N DCT coefficients (or channels) per block, the number of possible configurations that should be envisaged when deciding which coefficients to encode drops from 2* (decision for each coefficient whether it should be encoded) to N + I (after ordering by decreasing initial merit, the number of coefficient may vary from 0 to N).

The encoding priority just mentioned does not specify whether a DCT coefficient is more or less encoded than another OCT coefficient; it indicates however that, if a DCT coefficient is encoded at a non-zero rate, then all coefficients with higher priority must be encoded at a non-zero rate.

The encoding priority provides an optimal encoding order that may be compared to the non optimal conventional zigzag scan coding order used in MPEG, JPEG, H.264 and HEVC standard video coding.

Based on the pre-computed optimal quantizers determined above and the possible sets of DCT channels to be encoded as just explained, it is possible to solve the optimisation problem (A_opt), Le. to select one of these pre-computed optimal quantizers for each DCT channel to be encoded such that using the set of selected quantizers results in a global distortion corresponding to the target distortionD with a minimal rate as follows. (This selection step corresponds to the choice referenced 191 in Figure 3.) The domain of optimization is as shown on Figure 29. The quality constraint = can be rewritten as h = 0 with h(D1, U,,...) := -D. The distortion of each DCT coefficient is upper bounded by the distortion without coding:D «= a, and the domain of definition of the problem is thus a multi-dimensional box Q={(D.D);D «=ar}= {(D1,D2,...);g «=o}, defined by the functions g,, (Dy) := D -o-.

Thus, the problem can be restated as follows: minimize R(D1,LX,,...) st. h =0,g, «=0 (A_opt').

Such an optimization problem under inequality constrains is fol instance solved using so-called Karush-Kuhn-Tucker (KKT) necessary conditions of optimality.

In this goal, the relevant KKT function A is defined as follows: The KKT necessary conditions of minimization are -stationarity: dA = 0, -equality: h=0, -inequality: g,, «= 0, -dual feasibility: ç »= 0, -saturation: içg,, = 0.

It may be noted that the parameter A in the KKT function above is unrelated to the parameter 2. used above in the Lagrange formulation of the optimisation problem meant to determine optimal quantizers.

If g = 0, the n -th condition is said to be saturated. In the present case, it indicates that the n -th DCT coefficient is not encoded.

By using the specific formulation R = f(h1(Dr!o)) of the rate depending on the distortion discussed above, the stationarity condition gives: 0 = 61)A = /)R,. -ADDh -p,,ö/)g, = -f,'/D -22.13, Le. 2AD, = By summing on n and taking benefit of the equality condition, this leads to 22.137 = -jiD -j;,'. C) In order to take into account the possible encoding of part of the coefficients only as proposed above, the various possible indices n are distributed into two subsets: -the set I = {n;p = o} of non-saturated DOT coefficients (i.e. of encoded DCT coefficients) for which we have p,,D = 0 and D = -f, I 2A, and -the set 1 = {n;p > o} of saturated DOT coefficients (i.e. of DOT coefficients not encoded) forwhich we have tcD,, = -j 2Aa2.

From (*), we deduce 2AD -pD -t=J'+22La -f fl II and by gathering the A 2A1D As a consequence, for a non-saturated coefficient (n E J°), i.e. a coefficent to be encoded), we obtain: = -a29f (-b(D Ia)! fm (-hi(D!aj).

I iaiel° It may be noted that this is an implicit system of equations because the derivatives depend on the distortions D. It is proposed to solve numerically this implicit system by a fixed point algorithm.

For a given set [U of non-saturated coefficients, the above system can be rewritten D = using a continuous vector function F and with D = (D1, D,,...). A fixed point method may be used to solve such a system by defining a series + 1) = F(DQ)) and 23(0) arbitrary among the possible solutions (in the sub-space of the box 0 with dimensions corresponding to the set j°). If this series converges to a limit D(cc), by continuity of the function F, this limit is solution of the problem.

It may be noted in addition that, by theorem, the series converges if the function t is a contracting function, Le. if its differentiate is smaller than one. As this is not always the case, it is possible to force the convergence using a penalization method: the fixed point problem b = P(b) can be rewritten as another fixed point problem: b=ã(b)=ef(D)+o-o)ñ.

By taking the parameter 0 close to zero, one can force the differentiate of G to be as close to one as wanted, ensuring the contraction of G. However use of very small B's lead to a veiy slow convergence and a balance may thus be found in practice.

In view of the above, the practical algorithm for solving the implicit system defined above is as follows: I. For each non-saturation set j° of the N+1 possible non-saturation sets (provided by the priority of encoding as explained above), the iterative fixed point method is performed: 1. if D,2 -a <0, encoding is impossible with the concerned non-saturation set 1; 2. start with arbitrary distortions Dr(0) (for n e J°), for example with D(0)=o- 3. determine the distortions D, (t + 1) (for n e j°) thanks to the formula: D; =1D _L'(_mn(t)1an))1J'(_1m(t)1am)) and the penalization: Dr (t + ) = OD,, + -O)Dr (t) for a fixed parameter 0. Compute the associated rate R(t + 1) = R (t + 1), where R (t + I) is the rate associated with the distortion p2 e10 D,, (t +1); 4. loop on 3. until convergence of the rates R(t + 1), and store the final rate under II. determine the minimum rate among all rates R1, (taken into account the the N + 1 possible non-saturation sets J°). The optimal DCT distortions D (for values of ii belonging to the set i° for which the minimum rate is obtained) are those associated with this minimum rate and were determined during the execution of the previous algorithm (I above). For each DCT channel to be encoded, the selected quantizer is the quantizer associated with the corresponding optimal distortion just obtained. It may be recalled in this respect that features (number of quanta, centroids and limit values) defining this quantizer are stored at the encoder in association with the distortion it generates, as already explained.

An example of convergence of the algorithm is shown on Figure 30. One clearly understands than the optimal non-saturation is not trivial because it does not correspond to the smallest encodable one. Actually, the smallest encodable non-saturation implies a lot of effort of encoding for the encoded DOT coefficient and leads to a big encoding rate. By encoding more coefficients, i.e. taking a set bigger than the smallest encodable one, one does not need to encode the coefficient too much and the rate is smaller. On the other hand, encoding all coefficients isfarfrom being optimal as seen on the figure; little rate is used on too many coefficients and finally this leads to a big total rate. The optimal set of non-saturation is a non trivial balance between the number of encoded coeffcients and the amount of encoding on each coefficient.

Once the distortion target Dr (depending on the base mode) of each DOT coefficient has been determined by the above process, one chooses the best optimal quantizer associated to this distortion. For instance one may take, from the list of optimal quantizers corresponding to the associated parameter fi of the OCT channel model, the quantizer with the least rate among quantizers having distortion less or equal than the target distortion D. Then, quantization is performed by the chosen (or selected) quantizers to obtain the quantized data XDCIQ representing the DCT image. Practically, these data are symbols corresponding to the index of the quantum (or interval or Voronoi cell in 1 D) in which the value of the concerned coefficient of XDCT falls in.

The entropy coding may be performed by any known coding technique like VLC coding or arithmetic coding. Context adaptive coding (CAVLC or CABAC) may also be used.

The encoded data can then be transmitted together with parameters allowing in particular the decoder to use the same quantizers as those selected and used for encoding as described above.

According to a first possible embodiment, the transmitted parameters may include the parameters defining the distribution for each DCT channel, Le. the parameter a (or equivalently the standard deviation a) and the parameter /3 computed at the encoder side for each DCI channel.

Based on these parameters received in the data stream, the decoder may deduce the quantizers to be used (a quantizer for each DCI channel) thanks to the selection process explained above at the encoder side (the only difference being that the parameters /3 for instance are computed from the original data at the encoder side whereas they are received at the decoder side).

Dequantization (step 332 of Figure 4) can thus be performed with the selected quantizers (which are the same as those used at encoding because they are selected the same way).

According to a possible variation of this first embodiment, the parameters transmitted in the data stream include a parameter representative of the set j0 of non-saturated coefficients which was determined at the encoder side to minimize the rate (i.e. the set for which the minimum rate was obtained). In this variation, it is thus unnecessary to seek the relevant non-saturated coefficient set by optimisation and the process of selecting quantizers to be used is thus faster (part I of the process described above only).

According to a second possible embodiment, the transmitted parameters may include identifiers of the various quantizers used in the pool of quantizers (this pool being common to the encoder and the decoder) and the standard deviation a (or equivalently the parameter a Dequantization (step 332 of Figure 4) can thus be performed at the decoder by use of the identified quantizers.

Further embodiment of decoder Ihe following description relates to a further embodiment of a decoder in accordance with the present invention.

Referring again to Figure 4, which illustrates the associated enhancement video decoder 33, it will be recalled that, from the received parameters 21, the channel models are reconstructed, meaning that a probabilistic distribution GGD(czj,J3j) is known for each encoded DCT coefficient of a channel L Quantizers are chosen 330 from the pool of quantizers, possibly based on these probabilistic distributions.

Next decoding, from the bit-stream, and dequantizing blocks of encoded quantized DCT coefficients of the first enhancement image are performed.

This comprises the following operations: -an entropy decoder 331 is applied to the received enhancement layer bit-stream 20 to obtain the quantized DCT image k'. As suggested above, conventional Huffman codes can be used, possibly taking into account the probabilistic distributions; and -a dequantization (or inverse quantization) 332 is then performed by using the chosen quantizers for each coefficient, to obtain a dequantized version of the DCI image. The dequantized version is referenced since it is different from the original version due to the lossy quantization.

The present embodiment particularly focuses on the rest of the decoding process, from the dequantized DCT coefficients of that dequantized image, as described now with reference to Figure 31.

As shown in this Figure, the decoding method according to the present embodiment comprises a step of merging 38 the dequantized DCT blocks of dequantized DOT coefficients with DOT residual blocks (or "predictor blocks") Y. The DCT residual blocks Y are generated by an enhancement prediction module 40 which is further detailed below.

The DOT blocks XQ form a first version of the residual enhancement image currently decoded, while the DOT residual blocks V form, at least partly, a second version of the same residual enhancement image, that is temporally predicted based on base layer motion information and an already decoded UHD image of the video, as explained below.

The merger of the blocks XQ[I with the blocks Y may be a probabilistic merging process that is based on the parameters 21 (i.e. the probabilistic distributions of the DOT coefficients as determined by the encoder) and on a second probabilistic distribution that characterizes the temporal prediction of the enhancement layer by the module 40. In particular, the second probabilistic distribution is a probabilistic distribution of the differences between the coefficients of the DOT residual blocks Y and the dequantized DOT coefficients of the dequantized DOT blocks Figure 32 illustrates the generation of the DOT residual blocks Y, i.e. of transformed residual blocks of the enhancement image associated with a current image Ito decode.

This prediction successively consists in temporally predicting current enhancement image in the pixel domain (thanks to up-sampled motion information), computing the pixel difference data between temporal predicted image and up-sampled reconstructed base image and then applying a DCI transform on the difference image.

It is first assumed that the image B of the base layer corresponding to the current image Ito decode has already been decoded by the base layer decoder 31 using temporal prediction based on an already-decoded reference image IRB of the base layer. For each block B or macroblock MB of the image I (depending on the granularity of the prediction), motion information Ml is known and temporarily stored when decoding the base layer, for the needs of the present invention.

This motion information comprises, for each block or macroblock, a base motion field BMF (including a motion vector and a reference image index) and a base residual BR, as well-known by one skilled in the art of video coding.

Each of the base blocks or base macroblocks that have been temporally predicted in the base layer are now successively considered.

It may be noted that the present invention also applies when not all the images l of the base layer are encoded in the bit-stream EBS. For example, an image 1B of the base layer may be obtained by interpolating other decoded base images. In that case, the available motion information for those other decoded base images may also be interpolated to provide motion information specific to blocks or macroblocks of the interpolated base image l. The following explanation also applies for such kind of base image 18.

First, the corresponding motion information is up-sampled 400 into high resolution corresponding to the resolution of the enhancement layer (e.g. UHD). It is shown in the Figure by the references UMF (up-sampled motion field) and UR (up-sampled residual).

In particular, this up-sampling comprises for each base macroblock: -the up-sampling of the macroblock partitioning (into blocks) to the resolution level of the enhancement layer. In the above example of a dyadic spatial scalability (UHD v.5. HO), up-sampling the partition consists in multiplying the width and height of inacroblock partitions by a factor of 2; -the up-sampling, by a factor of 2 (in width and height), of the base residual associated with the base macroblock. This texture up-sampling process may use an interpolation filter that is identical to that used in inter-layer residual prediction mechanisms of the SVC scalable video compression standard; and -the up-sampling, by two (x-and y-coordinates), of the motion vector associated with the base macroblock.

Once the up-sampling 400 has been performed, the generation of a DCT residual macroblock Y comprises a motion compensated prediction step 405 from the decoded UHD image that temporally corresponds to the reference base image IRB used for the decoding of the base layer, and based on the up-sampled motion information UMF and UR. That decoded UHD image is considered, for the temporal prediction, as the reference decoded image lRU.

It may for example be the reconstructed UHD image that temporally precedes the current image to decode, as shown in the Figure.

This motion compensation 405, in the pixel domain, leads to obtaining, using the motion information, motion predictor blocks from the decoded reference high resolution image IRD. In particular, the up-sampled prediction information is applied to the reference decoded image lR1 to determine predicted macroblocks.

One may note that the motion compensation results in a partially-reconstructed image. This is because the macroblocks reconstructed by prediction are obtained at spatial positions corresponding to INTER macroblocks in the base image 1B only (because there is no motion information for other macroblocks). In other words, there is no predicted block that is generated for the rnacroblocks collocated with INTRA macroblocks in the base layer.

Next, residual blocks are obtained by subtracting 410 each motion predictor block from a corresponding (i.e. collocated) up-sampled block in the up-sampled decoded base image (which is obtained by the up-sampling 32 of Figure 2). This step calculates the difference image (or residual) between the temporally predicted image and the up-sampled reconstructed base layer image. This difference image has the same nature as the residual enhancement image.

The module 40 ends by applying 415 a block-based transform, e.g. DOT on 8x8 blocks, on the obtained residual blocks to obtain transformed residual blocks that are the DCT residuals Y discussed above.

Therefore, a plurality of DOT residual niacroblocks Y is obtained for the current image to decode, which generally represent a partial predicted enhancement image.

The next steps of the decoding method according to the invention may be applied to the entirety of that plurality of macroblocks Y, or to a part of it depending for example on the base coding mode (P image Inter prediction, B image Inter prediction, Skip mode) in which case only the OCT predictor macroblocks Y and the dequantized DCT macroblocks XI collocated with base macroblocks having the same coding mode are handled together.

For the rest of the description, macroblocks Y collocated with F, B and SKIP base macroblocks are considered separately, as was done at the encoder when determining the probabilistic distribution of each DCT channel.

Based on the considered OCT residual macroblocks Y, a probabilistic distribution of the differences between the coefficients in the transformed residual blocks Y and the dequantized transformed coefficients is calculated. This aims to model the noise associated with the motion prediction 405.

A probabilistic distribution may be obtained for the entire set of coefficients of the considered DCT residual macroblocks, or for each OCT channel i in which case the explanation below should be applied for the DCT coefficients of the same channel.

Each DCT residual macroblock Y made of DCT coefficients for the current image to decode is considered as a version of the original DCT coefficients that would have been altered through a communication channel. It has been observed that the quantity Y-X0 (i.e. the noise of the residual Y compared to the DCT coefficients before encoding) can be well modelled by a generalized Gaussian distribution as introduced above: DCT (Y-XDc) GGD(a,/3) By knowing the statistical distribution of the predictor noise (Y-XDCT), it is therefore possible to retrieve a good approximation of the original DCT coefficients of blocks Xfl(.

Since the exact coefficients of DCT image Xf)(.]. are not known by the decoder (because of the quantization 192 at the encoding side), the exact prediction noise cannot be modelled. However, the inventors have observed that using the dequantized OCT macroblocks instead of the original OCT macroblocks Xfl(.T provides a GGD modelling that is close to the theoretical modelling with XDCT.

For this, the modelling of the predictor noise thus comprises fitting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks Y and the dequantized transformed coefficients The same mechanisms based on the first and second moments as described above can be applied to obtain the two parameters CLN,DN (either for all the considered macroblocks Y and X6, or for each DCT channel of coefficients in those macroblocks).

Next, the merging 38 of the considered macroblocks Y and is performed, in the OCT domain.

As introduced above, it is based on the probabilistic distribution of the OCT coefficient of XD(.T that is obtained through the parameters cj3 21 of each OCT channel: P(Xx)=GGD(a,f3,x), and on the probabilistic distribution of the predictor noise as determined above (parameters cLN,DN possibly for the considered OCT channel): P(Zz)=000(aN,13N1,x) where Z is the estimated noise Y-Xt.

For each coefficient i of the considered niacroblocks, the merged value can take the form of a probabilistic estimation of the original OCT coefficients value, given the known quantization interval of this OCT coefficient, and an aside approximation of the coefficient resulting from its motion compensated temporal prediction (blocks Y).

For example, a merged value according to the invention, denoted 2,, may be the expectation (or the "expected value") of the considered coefficient given the quantization interval Qm associated with the value of the corresponding quantized transformed coefficient in X' and given its corresponding value Y0 in the residual blocks Y. The quantization interval Q is directly retrieved from the quantized OCT coefficient obtained from the bit-stream 20, since its value am is the index of the quantization interval Om given the quantizer used.

Such expectation is calculated based on the probabilistic distributions mentioned previously, for example J xGGD(aI/x)GGD(aN,J3Vx-}}Jx J* GGD(a1 13,4GGD(aN -)dx The probabilistic calculation of 2. is illustrated in Figure 33.

In this Figure, the 000 distribution of XDCT as well as the statistical distribution of the prediction noise XDCT-YO have been drawn. The quantization interval Qm associated with the considered OCT coefficient i is also indicated.

The two distributions are multiplied over the interval Q, to calculate the desired conditional expected value of The integrals of those distributions can be computed using Riemann sums over the quantization interval.

Thanks to the invention, there is no need to force the obtained value Y0 to be within the current quantization interval Q. On the contrary, the fact that it could be outside that interval (as shown in the Figure) is thus taking into account to refine the DCT decoded coefficients.

The values. calculated for the DCT coefficients of all the considered macroblocks are stored in memory to form, at least partially, the merged enhancement image corresponding to the current image to decode.

Since no fitting and merging step is applied to INTRA macroblocks (because there is no motion information), their values obtained in the dequantized DCT macroblocks Xt may be used to form the remaining part of the merged enhancement image. Finally, an entire merged enhancement image is obtained which is input to the inverse transform IDCT 34 as shown in Figures 2 and 31.

The present embodiment has been illustrated and provides significant improvements in rate-distortion, of about several dBs. As explained in detail, the present embodiment relies on: -statistically modelling, at the encoder, the DCT coefficients of the residual enhancement image and then transmitting the obtained model parameters to the decoder; -statistically modelling, at the decoder, the DCT coefficients of a predicted residual enhancement image (issued from temporal prediction using the motion information of the base layer); and -probabilistically estimating (through a conditional expectation calculation) the combined DCT coefficients of these two residual enhancement images, based on their statistical modelling.

Figure 33 illustrates the performance of the present embodiment, in which the rate-distortion curves are plotted when the merging according to the present embodiment is respectively not implemented and implemented.

The Figure shows that an improvement the codec rate distortion performance is obtained, especially at low bitrates. This may be understood intuitively, since the quantization intervals get larger as the bitrate decreases, therefore increasing the relevant information brought by the temporal DCT residuals Y compared to the quantization level obtained by the dequantization step 332.

One may then note that the present embodiment also works for zero bitrate (meaning that no enhancement layer bitstream 20 is encoded or received by the decoder). In that case the parameters 21 (aj3 for each DCT channel) are received and are used with the parameters ctN,13N calculated with the present embodiment to obtain an improvement of the decoding quality of the base layer by several dBs.

Furthermore, the above performance is obtained with no complexity cost at the encoding side and with no additional bitrate when the parameters 21 are already needed and transmitted (e.g. for selecting the quantizers and/or entropy decoding).

The complexity increase due to the merging step remains reasonable at the decoding side.

The above examples are merely embodiments of the invention, which is not limited thereby.

Claims

<claim-text>CLAIMSC [Al MS: 1. A method of encoding video data comprising: encoding video data having a first resolution in conformity with HEVC to obtain video data of a base layer; and decoding the base-layer video data in conformity with HEVC, upsampling the decoded base-layer video data to generate decoded video data having a second resolution higher than said first resolution, forming a difference between the generated decoded video data having said second resolution and further video data, having said second resolution and corresponding to said first-resolution video data, to generate data of a residual image, and compressing the residual-image data to generate video data of an enhancement layer.</claim-text> <claim-text>2. A method as claimed in claim 1, wherein the compression of the residual-image data does not involve temporal prediction.</claim-text> <claim-text>3. A method as claimed in claim 1 or 2, wherein the compression of the residual-image data does not involve spatial prediction.</claim-text> <claim-text>4. A method as claimed in claim 1, 2 or 3, wherein the compression of the residual-image data comprises applying a discrete cosine transformation (DCI) to obtain DCI coefficients.</claim-text> <claim-text>5. A method as claimed in claim 4, further comprising employing a parametric probabilistic model of the DCT coefficients.</claim-text> <claim-text>6. A method as claimed in claim 5, wherein such a parametric probabilistic model is obtained for each type of DCT coefficient.</claim-text> <claim-text>7. A method as claimed in claim 5, wherein the DCT is a block-based DCI and such a parametric probabilistic model is obtained for each DCT coefficient position within a DCT block.</claim-text> <claim-text>8. A method as claimed in claim 7, further comprising fitting the parametric probabilistic model onto respective collocated DCT coefficients of some or all DCI blocks of the residual image.</claim-text> <claim-text>9. A method as claimed in any one of claims 5 to 8, wherein the compression of the residual-image data comprises employing the parametric probabilistic model to choose quantizers from a pool of available quantizers and quantizing the residual data using the chosen quantizers.</claim-text> <claim-text>10. A method as claimed in claim 9, wherein the available quantizers of said pool are pre-computed quantizers dedicated to a parametric probabilistic model.</claim-text> <claim-text>11. A method as claimed in any one of claims 5 to 10, wherein the compression of the residual-image data comprises entropy encoding of quantized symbols and employing a parametric probabilistic model to obtain a probabilistic distribution of possible symbols of an alphabet associated with each OCT coefficient, which alphabet is used for the entropy encoding.</claim-text> <claim-text>12. A method as claimed in claim 11 when read as appended to claim 9 or 10, wherein the same parametric probabilistic model is employed both for choosing quantizers and for entropy encoding.</claim-text> <claim-text>13. A method as claimed in any one of claims 5 to 12, wherein the parametric probabilistic model is a Generalised Gaussian Distribution GGO(o, I) having a zero mean.</claim-text> <claim-text>14. A method as claimed in any one of claims 5 to 13, wherein parameters of the parametric probabilistic model are determined in dependence upon one or more of: -a video content; -an index of the OCT coefficient within a OCT block; -an encoding mode used for a collocated block of the base layer; and -a size of block to encode.</claim-text> <claim-text>15. A method as claimed in any one of claims 5 to 14, wherein information about parameters of the parametric probabilistic model is supplied to a decoder.</claim-text> <claim-text>16. A method as claimed in any one of claims 5 to 14, wherein information about a set of saturated coefficients determined at the encoder to minimize a rate is supplied to a decoder.</claim-text> <claim-text>17. A method as claimed in claim 9 or 10, wherein information about the chosen quantizers is supplied to a decoder.</claim-text> <claim-text>18. A method as claimed in claim 9 or 10. wherein the quantizers are chosen based on a rate-distortion criterion.</claim-text> <claim-text>19. A method of decoding a scalable bitstream, comprising: decoding, in conformity with HEVC, encoded video data of a base layer of the bitstream to obtain video data having a first resolution, and upsampling the first-resolution video data to generate video data having a second resolution higher than said first resolution; and decoding compressed video data of an enhancement layer of the bitstream to obtain data of a residual image, and forming a sum of the generated second-resolution video data and the residual-image data to generate decoded video data having said second resolution.</claim-text> <claim-text>20. A method as claimed in claim 19, wherein the decoding of the compressed video data of the enhancement layer does not involve temporal prediction.</claim-text> <claim-text>21. A method as claimed in claim 19 or 20, wherein the decoding of the compressed video data of the enhancement layer does not involve spatial prediction.</claim-text> <claim-text>22. A method as claimed in claim 19, 20 or 21, wherein the compressed video data of the enhancement layer comprises encoded discrete cosine transformed (DCT) coefficients.</claim-text> <claim-text>23. A method as claimed in claim 22, further comprising employing a parametric probabilistic model of the OCT coefficients.</claim-text> <claim-text>24. A method as claimed in claim 23, wherein such a parametric probabilistic model is obtained for each type of OCT coefficient.</claim-text> <claim-text>25. A method as claimed in claim 23, wherein the OCT is a block-based DCT and such a parametric probabilistic model is obtained for each OCT coefficient position within a OCT block.</claim-text> <claim-text>26. A method as claimed in any one of claims 23 to 25, wherein the decoding of the compressed video data of the enhancement layer comprises employing the parametric probabilistic model to choose quantizers from a pool of available quantizers and using the chosen quantizers for inverse quantization of the encoded OCT coefficients.</claim-text> <claim-text>27. A method as claimed in claim 26, wherein the available quantizers of said pool are pre-computed quantizers dedicated to a parametric probabilistic model.</claim-text> <claim-text>28. A method as claimed in any one of claims 22 to 27, wherein the decoding of the compressed video data of the enhancement layer comprises entropy decoding of encoded quantized symbols obtained from the compressed video data to generate quantized DCT coefficients and employing a parametric probabilistic model to obtain a probabilistic distribution of possible symbols of an alphabet associated with each OCT coefficient, which alphabet is used for the entropy decoding.</claim-text> <claim-text>29. A method as claimed in claim 28 when read as appended to claim 26 or 27, wherein the same parametric probabilistic model is employed both for choosing quantizers and for entropy decoding.</claim-text> <claim-text>30. A method as claimed in any one of claims 23 to 29, wherein the parametric probabilistic model is a Generalised Gaussian Distribution GGD(o, 1) having a zero mean.</claim-text> <claim-text>31. A method as claimed in any one of claims 23 to 30, wherein information about parameters of the parametric probabilistic model is received from an encoder.</claim-text> <claim-text>32. A method as claimed in any one of claims 23 to 30, wherein information about a set of saturated coefficients determined at an encoder to minimize a rate is received from the encoder.</claim-text> <claim-text>33. A method as claimed in claim 26 or 27, wherein information about the chosen quantizers is received from an encoder.</claim-text> <claim-text>34. A method as claimed in claim 26 or 27, wherein the quantizers are chosen based on a rate-distortion criterion.</claim-text> <claim-text>35. Apparatus for encoding video data comprising: means for encoding video data having a first resolution in conformity with HEVC to obtain video data of a base layer; means for decoding the base-layer video data in conformity with HEVC; means for upsampling the decoded base-layer video data to generate decoded video data having a second resolution higher than said first resolution; means for forming a difference between the generated decoded video data having said second resolution and further video data, having said second resolution and corresponding to said first-resolution video data, to generate data of a residual image; and means for compressing the residual-image data to generate video data of an enhancement layer.</claim-text> <claim-text>36. Apparatus for decoding a scalable bitstream, comprising: means for decoding, in conformity with HEVC, encoded video data of a base layer of the bitstream to obtain video data having a first resolution; means for upsampling the first-resolution video data to generate video data having a second resolution higher than said first resolution; means for decoding compressed video data of an enhancement layer of the bitstream to obtain data of a residual image; and means for forming a sum of the generated second-resolution video data and the residual-image data to generate decoded video data having said second resolution.</claim-text> <claim-text>37. A program which, when executed by a computer or processor, causes the computer or process to carry out a method of encoding as claimed in any one of claims ito 18.</claim-text> <claim-text>38. A program which, when executed by a computer or processor, causes the computer or process to carry out a method of decoding as claimed in any one of claims 19 to 34.</claim-text> <claim-text>39. A method, apparatus or computer program for encoding video data substantially as hereinbefore described with reference to the accompanying drawings.</claim-text> <claim-text>40. A method, apparatus or computer program for decoding a scalable bitstream substantially as hereiribefore described with reference to the accompanying drawings.</claim-text>