GB2523076A

GB2523076A - Improved palette mode in HEVC for the encoding process

Info

Publication number: GB2523076A
Application number: GB1322616.2A
Authority: GB
Inventors: Patrice Onno; Guillaume Laroche; Christophe Gisquet
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-12-19
Filing date: 2013-12-19
Publication date: 2015-08-19
Anticipated expiration: 2033-12-19
Also published as: GB201322616D0; GB2523076B

Abstract

Coding a block of a picture in a video sequence, comprising, for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, said method comprising for a current portion, inserting said portions current colour in the colour palette as a new colour element 1409, if the current colour is not present in the colour palette and the absolute difference between the current colour and each and every colour element in the palette is superior to a predetermined threshold 1407, wherein said threshold is adaptive. Preferably the coding of the block comprises using a given mode, preferably a transform skip mode, having a quantization step based on quantization parameters. Further independent claims are included for: encoding the colour of a current portion by allocating to it a colour element from the colour palette before updating the definition of the colour element based on the colours of all portions to which said colour element had been allocated; and for encoding the colour of a current portion by determining a prediction mode among at least two prediction modes.

Description

Improved palette mode in HEVC for the encoding process

FIELD OF THE INVENTION

The invention is related to video coding and decoding. More precisely, the present invention is dedicated to a palette mode coding method. The Palette mode is a new coding method that has been presented in the scope of the Range Extension. This method is quite efficient for video coding targeting "screen content" video sequence. Yet the proposed method can be improved.

The current invention improves the coding efficiency of Palette mode by using some efficient encoding selection.

BACKGROUND OF THE INVENTION

One general principle of video compression standards is to benefit from spatial and temporal redundancies to reduce the video bitrate. In Figures 1 and 2, we have respectively represented the different steps performed in a video encoder and in a video decoder.

Figure 1 represents the architecture of an HEVC encoder. In the video sequence encoder, the original sequence (101) is divided into blocks of pixels (102). A coding mode is then affected to each block. There are two families of coding modes: the modes based on spatial prediction (INTRA) (103) and the modes based on temporal prediction (INTER, Bidir, Skip) (104, 105). An INTRA block is generally predicted from the encoded pixels at its causal boundary by a process called I NTRA prediction.

Temporal prediction first consists in finding in a previous or future frame (i.e. a reference frame (116)) the reference area which is the closest to the block to encode (motion estimation (104)) and secondly in predicting this block using the selected area (motion compensation (105)).

In both cases (spatial and temporal prediction), a residual is computed by subtracting the prediction from the original predicted block.

In the INTRA prediction, a prediction direction is encoded. In the temporal prediction, at least one motion vector is encoded. However, in order to further reduce the bitrate cost related to motion vector encoding, a motion vector is not directly encoded. Indeed, assuming that motion is homogeneous, it is particularly interesting to encode a motion vector as a difference between this motion vector, and a motion vector in its dyadic surrounding. In H.264 for instance, motion vectors are encoded with respect to a median vector computed between 3 blocks located above and on the left of the current block. Only a difference (also called residual motion vector) computed between the median vector and the current block motion vector is encoded in the bitstream. This is processed in module Mv prediction and coding (117). The value of each encoded vector is stored in the motion vector field (118). The neighboring motion vectors, used for the prediction, are extracted from (118). For HEVC, the motion vector coding process is slightly different and detail in the following sections.

Then, the mode optimizing the rate distortion performance is selected (106). In order to further reduce the redundancies, a transform (DCT) is applied to the residual block (107), and a quantization is applied to the coefficients (108). The quantized block of coefficients is then entropy coded (109) and the result is inserted in the bitstream (110). The encoder then performs a decoding of the encoded frame for the future motion estimation (111 to 116). These steps allow the encoder and the decoder to have the same reference frames.

To reconstruct the coded frame, the residual is inverse quantized (111) and inverse transformed (112) in order to provide the "decoded" residual in the pixel domain. According to the encoding mode (INTER or INTRA), this residual is added to the INTER predictor (114) or to the INTRA predictor.

Then, this first reconstruction is filtered (115) by one or several kinds of post filtering. These post filters are integrated in the encoded and decoded loop.

It means that they need to be applied on the reconstructed frame at encoder and decoder side in order to use the same reference frame at encoder and decoder side. The aim of this post filtering is to remove compression artifact.

For example, H.264/AVC uses a deblocking filter. This filter can remove blocking artifacts due to the DCT quantization of residual and to block motion compensation. In the current HEVC standard, 3 types of loop filters are used: deblocking filter, sample adaptive offset (SAO) and adaptive loop filter (ALF).

These loop filters are described in the following section.

Figure 2 represent the architecture of an HEVC decoder. In Figure 2, we have represented the principle of a decoder. The video stream (201) is first entropy decoded (202). The residual data are then inverse quantized (203) and inverse transformed (204) to obtain pixel values. The mode data are also entropy decoded and in function of the mode, an INTRA type decoding or an INTER type decoding is performed. In the case of INTRA mode, an INTRA predictor is determined in function of the Intra prediction mode specified in the bitstream (205). If the mode is INTER, the motion information is extracted from the bitstream (202). This is composed of the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual to obtain the motion vector (210). The motion vector is then used to locate the reference area in the reference frame (206). Note that the motion vector filed data (211) is updated with the decoded motion vector in order to be used for the un-prediction of the next decoded motion vectors. This first reconstruction of the decoded frame is then post filtered (207) with exactly the same post filter as used at encoder side. The output of the decoder is the un-compressed video (209).

Figure 3 illustrates the causal concept in a video encoder or video decoder. When encoding index A, all of part of B can be used to improve the coding efficiency for A. C cannot be accessed.

In Figure 4, is illustrated the new video format that the future HEVC Range Extension will support. The HEVC Range Extension, also commonly called HEVC RExt, is an extension under definition of the new video coding standard HEVC. The aim of this extension is to provide additional tools to code video sequences with additional colour formats and bit-depth. More specifically, this extension will support 4:2:2 colour format as well as 4:4:4 video format. If the current HEVC standard is able to deal with 4:2:0 colour format with 8 and 10 bits per colour sample, the HEVC Range Extension should additionally support 4:2:2 and 4:4:4 video format with an extended bit depth ranging from 8 bits up to 14 bits.

A colour picture is generally made of three colour components R, G and B. These components are generally correlated, and it is very common in image and video compression to decorrelate the colour components prior to processing the pictures. The most common format is the YUV colour format.

YUV signals are typically created from RGB pictures, by using a linear transform to the 3 inputs R, G and B input pictures. Y is usually called Luma component, U and V are generally called Chroma components. The term YCbCr' is also commonly used in place of the term YUV'.

It is very common to use different sampling ratios to the three colour components. The subsampling scheme is commonly expressed as a three part ratio J:a:b (e.g. 4:2:2), that describe the number of luminance and chrominance samples in a conceptual region that is J pixels wide, and 2 pixels high. The parts are (in their respective order): J: horizontal sampling reference (width of the conceptual region) (usually, 4).

a: number of chrominance samples (Cr, Cb) in the first row of J pixels.

b: number of (additional) chrominance samples (Cr, Cb) in the second row of J pixels.

Figure 4 illustrates the different considered Chroma formats in HEVC RExt. These formats are different due to a different picture size of the three colour components, and to a different type of the colour components.

In the 4:2:0 YUV Chroma format, illustrated in (401), the pictures are made of 3 colour components: Y (401.1), U (401.2) also noted Cr, and V (401.3) also noted Cb. If the Y component picture is of width W pixels and of height H pixels, the U and V components pictures are of width W/2 pixels and of height H/2 pixels.

In the 4:2:2 YUV Chroma format, illustrated in (402), the pictures are made of 3 colour components: Y (402.1), U (402.2) also noted Cr, and V (402.3) also noted Cb. If the Y component picture is of width W pixels and of height H pixels, the U and V components pictures are of width W/2 pixels and of height H pixels.

In the 4:4:4 YUV Chroma format, illustrated in (403), the pictures are made of 3 colour components: Y (403.1), U (403.2) also noted Cr, and V (403.3) also noted Cb. The three components are of same width Wand height H. In the 4:4:4 RGB Chroma format, illustrated in (403), the pictures are made of 3 colour components: R (403.1), G (403.2), and B (403.3). The three components are of same width Wand height H. When a picture is monochrome, its format is named 4:0:0.

The bit-depth of input samples is the number of bits related to each pixel for each colour component. The HEVC version 1 is defined for 8 bits input sample for 4:2:0 format. When the input samples are represented with 8 bits, each sample can take 28=256 values. For some applications, it is useful to extend the bit-depth of input sample in order to have a larger dynamic of samples. Generally, the aim is to increase the visual quality of the video. The known bit-depths are 8, 10, 12, 14 or 16 bits. The Range Extension (RExt) of HEVC is dedicated to this larger dynamics in addition to the extended colour format.

Moreover the Range Extension (RExt) of HEVC is able to encode losslessly the input sequences. The goal of a lossless codec (as opposed to a lossy one) is to have a decoded output strictly identical to the input. To achieve this, a number of things must be modified or added, compared to a lossy codec.

Here is a non-exhaustive list of specific things required for HEVC to work losslessly: Removal of the quantization (main source of errors); Forced activation of the bypass transform, as normal cosine/sine transforms may introduce errors (in addition to no longer be suited for lossless coding); Removal of tools specifically tailored at compensating quantization noise, such as DBF and SAO.

Moreover, additional tools Range Extension (RExt) of HEVC have been added or are under consideration to efficiently encode screen content" video sequences in addition to natural sequences. The screen content" sequences refer to particular sequences which have a very specific content. The "screen content" video sequences are related to video sequences corresponding to those captured from a personal computer of any other device containing for example text, PowerPoint presentation, Graphical User Interface, tables. These particular sequences are quite different statistics compared to natural video sequences. In video coding, performance of traditional video coding tools is sometimes inefficient of this specific video content.

The current tools under consideration are the Intra Block Copy mode and the Colour Palette mode. They have demonstrated a good efficiency over the traditional method targeting natural video sequences. The Colour Palette mode is the tools that we are considering in the scope of this invention in order to further improve the coding efficiency of the HEVC Range Extension when targeting screen content video sequences.

Figure 5 illustrates the Coding Tree Block splitting in Coding Units and the scan order decoding of these Coding Units. In the HEVC standard, the block structure is organized by Coding Tree Block (CTB). A frame contains several non-overlapped and square Coding Tree Block. The size of a Coding Tree Block can be equal from 64x64 to 16x16. This size is determined at sequence level. The most efficient size, in term of coding efficiency, is the largest one: 64x64. Please note that all Coding Tree Block have the same size except for the image border. In that case, the size is adapted according to the amount of pixels.

Each Coding Tree Block contains one or more square Coding Units (CU).

The Coding Tree Block is split based on a quad-tree structure into several Coding Units. The coding or decoding order of each Coding Unit in the Coding Tree Block follows the quad-tree structure based on a raster scan order. Figure shows an example of the decoding order of Coding Units. In this figure, the number in each Coding Unit gives the decoding order of each Coding Unit of this Coding Tree Block.

Figure 6 represents an HEVC' Syntax coding. In the HEVC standard or in the HEVC RExt, several methods are used to code the different syntax element.

HEVC uses several types of entropy coding like the Context based Adaptive Binary Arithmetic Coding (CABAC), Golomb-rice Code, or simple binary representation called Fixed Length Coding. Most of the time a binarisation process is performed before encoding to represent the different syntax element.

This binarisation process is also very specific and depends on the different syntax element.

For example, the syntax element called "coeff_abs_level_remaining" contains the absolute value or a part of an absolute of the coefficient residual.

The idea of this specific coding is to use Golomb Rice code for the first values and Exponential Golomb for the higher values. More specifically, depending on a given parameter called Golomb parameter, this means that for representing the first values (for example from 0 to 3) a Golomb-Rice code is used, then for higher values (from 4 and above) an Exponential Golomb code is used.

Figure 6 illustrates this principle of this specific decoding process. The input data of this process are the bistream (601) and the rParam which is known as the Rice Golomb parameter, or the order. The output of this process is the decoded symbol (612).

The prefix value is set equal to 1 (602) then 1 bit is extracted from the bitstream (601) and the variable flag is set equal to the decoded value (603). If this flag is equal to 0 (604) the Prefix value is incremented (605) and another bit is extracted from the bitstream (603). When the flag value is equal to 1, the decision module (606) checks if the value Prefix is strictly inferior to the 3. If it is true, the N=Param bits are extracted (608) from the bitstream (601) and set to the variable "codeword". This corresponds to the Golomb Rice representation.

And the Symbol value (612) is set equal to ((prefix cc rParam) + codeword) as depicted in step (609). Where cc is the left shift operator.

If the Prefix is superior or equal to 3 at step (606), the next step is (610) where N= (prefix -3 + Param) bits are extracted from the bitstream and set it to the variable "codeword" (610). The symbol value (611) is set equal to ((1 cc(prefix-3))+2)c< Param) + codeword. This corresponds to the Exp Golomb representation.

In the following technical note, this decoding process (or in a symmetric way the encoding process described in that figure is called Golomb_H with a input parameter parameterParam. It can be noted in a simple way Golomb_H(Param).

In the HEVC standard and HEVC Range Extension, the Golomb parameter Param' is updated according to a formula in order to adapt the entropy coding to the signal to be encoded. This formula tries to reduce the Golomb code size by increasing the Golomb parameter when the coefficients have large values. In the HEVC standard, the update is given by the following formula: Param = Min( cLastRiceParam + ( cLastAbsLevel > ( 3 * ( 1 << cLastRiceParam) )? 1: 0), 4).

Where cLastRiceParam is the last used Param, cLastAbsLevel is the last decoded coeff_abs_level_remaining. Please note that for the first parameter to be encoded or decoded, cLastRiceParam and cLastAbsLevel are set equal to 0.

Morever please note that the parameter Param cannot exceed the value of 4.

For the HEVC Range Extension, this formula has been updated in order to be adapted to deal with higher bit depth and take into account very high quality required by application dealing with video compression of extended format (4:2:2 and 4:4:4) including the lossless coding. For the Range Extension, the update formula has been changed as follows: Param = Min( cLastRiceParam + ( cLastAbsLevel >> ( 2 + cLastRiceParam)), 7).

With this formula, the maximum value of Param is 7. Moreover, for the first coding of the coeff_abs_level_remaining for a sub-block of Transform block, the Golomb parameter is set equal to: Param = Max( 0, cRiceParam -( transform_skip_flag I I cu_transquant_bypass_flag? 1: 2) ) where The variable "transform_skip_flag" is set to 1 if the transform is skipped for the current CU and 0 if the transform is used The variable "cutransquantbypassflag" is set to 1 if the CU is lossless encoded and 0 otherwise.

The variable "cRiceParam" is set equal to last used Param from another sub-block of the whole block otherwise is set to 0.

The Palette mode is a new coding method that has been presented in the scope of the Range Extension. This method is quite efficient for video coding targeting "screen content" video sequence. The palette method proposed to HEVC Range Extension is a prediction mode. It means that the Palette method is used to build a predictor for the coding of a given CU similarly to a prediction performed by Motion prediction (Inter case) or by an Intra prediction. After the generation of the prediction, a residual CU is transformed, quantized and coded.

A palette is generally represented by a table containing a set of N-tuple of colours, each colour being defined by its components in a given colour space.

For example, in a typical RGB format, the palette is composed of a list of P elements of N-tuple (where N=3 for a RGB). More precisely, each element corresponds to a fixed triplet of colour in the RGB format. Of course this is not limited to a RGB or YUV colour format. Any other colour format can be represented by a palette and can use a smaller or a higher number of colours (meaning that N is different from 3).

At the encoder side, the Palette mode, under consideration in RExt, consists in transforming pixel values of a given input CU in a finite number of indexes (or levels). These indexes are those of an associated palette which defines a limited number of colours. After applying the Palette mode method, the resulting CU is composed of levels and is then transmitted to the decoder with the associated palette (table of limited triplets used to represent the Cu).

To apply the Palette mode at the encoder side, we may proceed in a variety of fashions, but the simplest way to represent a block of pixels could be summarized as follows: -Find the P triplets describing at best the CU of pixels to code (e.g. by minimizing the overall distortion); -Then associate to each pixel the closest colour among the P triplets: the value to encode is then the corresponding index.

At the decoder side, the Palette mode consists in operating the conversion in the reverse way. This means that for each decoded index associated to each pixel of the CU, the process consists in reconstructing the CU by using the palette encoded in the bitstream for each CU. It means that each index associated to each pixel is replaced by the colour in order to reconstruct the corresponding colour for each pixel of the CU. As already mentioned, the Palette mode is a prediction mode, this means that a residual coding can be associated. It is then added to the prediction to build the final reconstructed CU.

Figure 7 further illustrates the principle of Palette mode prediction at the decoder side under investigation in the Range Extension of HEVC. The prediction mode for the current CU is extracted in step (702) from the bitstream (701). Currently, the Palette mode is identified by a flag located before the skip flag in the bitstream. This flag is CABAC coded using a single context. If this mode is the palette mode (703) then the related syntax of the palette mode (705) is extracted (704) from the bitstream (701).

Then, during the step (706), two elements are built: the palette (707) and the block/CU of levels (708). According to this block/CU of levels and the associated palette, the block/CU predictor (in pixel domain) (710) is built (709).

It means that for each level of the block/CU, a (RGB or YUV) colour is associated to each pixel.

Then the CU residue is decoded (711) from the bitstream (701). In the current implementation Palette mode under investigation in the Range Extension, the residual associated to a Palette mode is coded using the common HEVC Inter residual coding method. To obtain the residue of the CU, the traditional inverse quantization and inverse transformation are performed.

The block predictor (710) is added (713) to this block residual (712) in order to form the reconstructed CU (714).

Figure 8 illustrates the principle of the Palette method at encoder side.

The current CU 801 is converted into a block 802 of the same size which contains levels instead of pixels with 3 values (Y, U, V) or (R, G, B). The Palette associated to this block or CU 803 is built and contains for each level entry the related pixel colour values. Please note that for monochrome application, the pixel value can contain only one component.

As mentioned in Figure 7, the palette is coded and inserted in the bitstream for each CU. In the same way, the block/CU of levels is coded and inserted in the bitstream and an example is given in Figure 9. In this example, the CU is scanned in a horizontal order.

The block/CU (91) of levels is exactly the same as the one in figure (802).

The tables (92) and (93) describe the successive steps to process the block/CU (91). Table (93) should be read as the following steps of table (92). The colours used in these tables correspond to the eight steps for processing the pixels block/CU (91) having the same colours.

These two tables depict the current syntax associated to the Palette mode.

These syntax elements correspond to the encoded information associated in the bitstream for the CU (91). In these tables, 3 main syntax elements are used to fully represent the operations of the Palette mode and are used as follows: Fred mode" flag When this flag is equal to 0": this means that a new level is used for the current pixel. The level is immediately signaled after this flag; When this flag is equal to "1": this means that a copy up' mode is used.

More specifically, this means that the current pixel level corresponds to the pixel level located at the line immediately above (starting on the same position for a raster scan order). In that case of "Pred mode" flag equals to "1" there is no need to signal a level right after.

"Level" This syntax element indicated the level value of the palette for the current pixel. "Run"

This syntax element has a different meaning which depends on the "pred mode" flag.

When the "Pred mode" flag is equal to "0": this syntax element indicates the number of consecutive pixels where the same level is applied right after the current one. For example, if Run=8 this means that the current level is applied on the current sample (pixel location) and the following 8 samples which corresponds to 9 samples in total.

When the "Pred mode" flag is equal to "1": this syntax element indicates the number of consecutive pixels where the "copy up" mode is applied right after the current one. For example, if Run=31 this means that the level of the current sample is copied from the sample of the line above as well as the following 31 samples which corresponds to 32 samples in total.

Regarding tables (92) and (93), represent the eight steps to represent the block/CU (91) by using the Palette mode. Each step starts with the coding of the "Pred mode" flag which is followed by the level syntax element if the "Pred mode" flag is equal to "0" of by the "run" syntax element if the "Pred mode" flag is equal to "1". The "level" syntax element is always followed by a "run" syntax element.

Figure 10 represents the syntax elements decoding of the Palette mode.

When the prediction mode decoded for the current block is the palette mode, the decoder first decodes the syntax related to this block and then applied the reconstruction process for the CU described in Figure 11) Figure 10 illustrates in details the decoding process of the syntax elements related to the Palette mode. First, the size of the palette is extracted and decoded (1002) from the bitstream (1001). The exact size of the palette (Palette_size) is obtained by adding Ito this size value decoded at step (1002).

Indeed, the size is coded by using a unary code for which the value 0 has the smallest number of bits (1 bit) and the size of the palette cannot be equal to 0, otherwise no pixel value can be used to build the block predictor.

Then the process corresponding to the palette values decoding starts. A variable i corresponding to the index of the palette is set equal to 0 (1004) then a test is performed in step 1005 to check if i is equal to the palette size (Palette_size). If it is not the case, one palette element is extracted from the bitstream (1001) and decoded (1006) and is then added to the palette with the related level/index equal to i. Then the variable i is incremented through the step (1007). If i is equal to the palette size (1005), the palette has been completely decoded.

Then the process corresponding to the decoding of the block of levels is performed. First, the variable j, corresponding to a pixel counter, is set to 0 as well as the variable syntax_i (1008). Then a check is performed to know if the pixel counter corresponds to the number of pixels contained in the block/CU. If the answer is yes at step (1009) the process is ended with step (1017), otherwise the value of the flag "Pied mode" corresponding to one prediction mode is extracted from the bitstream (1001) and decoded (1010).

The value of "Pred mode" is added to a table at the index syntax_i containing all "Pred mode" value decoded. If the value of this "Pied mode" is equal to 0 (1011), the syntax element corresponding to "Level" is extracted from the bitstream (1001) and decoded (1012). This variable "Level" is added to a table at the index syntax_i containing all levels decoded. The variable corresponding to the pixel counter is incremented by one (1013).

Then the "Run" syntax element is decoded in step (1014). If the syntax element "Pred Mode" is equal to 1 (1011), the "Run" value is also decoded in step (1014). This syntax element "Run" is added to a table at the index syntax_i containing all the runs decoded.

Then in step (1015), the value j is incremented by the value of the run decoded in step (1014). The variable syntax_i is incremented. If the counterj is equal to the number of pixels in the block then the syntax to build the palette predictor is finished (1017). At the end of this process related to the Palette, the decoder knows the palette, and the tables containing the list of all the "Pred mode", "Level" and "Run" syntax elements associated to the Palette prediction mode of this CU. The decoder can then proceed with the reconstruction process of the CU.

The following table gives the entropic code uses for each syntax element.

Syntax element Entropy codes 3 binary codes corresponding to the 3 colours Palette element (Y, U, V) or (R, (Y,U,V) or (R,G,B).

G, B) The binary code length corresponds to the bit depth (step 1006) of the each colour component. Example: 8 bits to ____________________________ represent 256 values.

"Palette size" (step 1002) Unary code "Pred mode" 1 bit Binary code b bits, b is the integer such that 2b is the smallest eve integer equal or superior to the Palette_size, ie. for ____________________________ P=14, b=4 and for P=4, b=2 "Run" Golomb_H(Param = 3) as described in Figure 6.

Figure 11 illustrates the reconstruction process to build the CU which will be used as predictor. The input data of this process are the tables containing the list of"Pred mode", "Level" and "Run", and the block/CU size.

In a first step (1101), the variable i, representing a pixel counter, is set equal to 0 and the variable j is also set equal to 0. Then the element extracted from the table of "Fred mode" at the index j is checked if it is equal to 0 (1104).

If it is equal to 0, this means that a new level is encoded for this current pixel i.

So the value of the pixel at position i is set equal to the level at the index j from the table of levels (1105) (Block[i] =Level[j]).

The variable i is incremented by one (1106). The variable k is set equal to 0 (1107). A check is performed in (1108) to know if k is equal to the "Run" element of the table of runs at the index j. If it is not the case, the value (level value) of the pixel at position i is set equal to the value (level value) of the pixel at position i-i (1109), corresponding to the following expression Block[i] = Block[i-1]. The variable i (1110) and the variable k(1111) are then incremented by one. If the check in step 1108 returns yes, the propagation of the left level value is finished and the step 1120 is performed.

When the check at step 1104 returns the value No, the "copy up" mode starts. The variable k is set equal to 0(1112). The step 1113 checks if (k-i) is equal to the "Run" element of the table of runs at the index j. If it is not the case, the value (level value) of the pixel at position i is set equal to the value (level value) of the pixel at position i of the above line (1014). This corresponds to the following expression Block[i] = Block[i-width]. The value "width" corresponds the width of the block/CU. The variable i and the variable k are then incremented by one in steps 1115 and 1116. If the check at step 1113 returns the value yes, the prediction mode copy up' is completed and the next step 1120 is performed.

At step 1120, a check is performed to know if the variable i is equal to the amount of pixels in the block/CU. If it is not the case the variable j is incremented by one (1121) and the process continues with step (1104) already described above.

If the check is positive at step (1120), all the levels are affected to each pixel of the block in step (1122).

Then the final stage (1123) consists in converting each level in colour value according to the palette content. This process affects a pixel value (Y, U, V) or (R, G, B) at each block position according to the level of this position in the block and the entries of the palette.

Figure 12 illustrates the palette determination algorithm at encoder side.

The input data of this process are the original block/CU and the block/CU size.

In this example, we are creating a YUV palette but a RGB palette could be built in the same way.

In a first step (1201), the variable representing a pixel counter "i' and the variable "Palette_size" are set to 0. A variable "TH" representative of a threshold is set to 9. Then in step (1203) the pixel pi is read (1203) from the original block (1204). Then the variablej is set equal to 0(1205) and at step (1206) a check is performed to know if the palette size is equal to the variable "j".

If it is the case, the palette at the index "j" is set equal to the pixel value pi (1209). This means that the current pixel Pi becomes an entry of the palette.

More precisely the following assignment is performed: PALY[j] = (Yi) PALU[j] = (Ui) PALV[j] = (Vi).

The palette size (Palette_size) is incremented by one (1210) and an occurrence table is set equal to 1 for the index Palette size' (1211). Then the variable i is incremented (1213) to consider the next pixel"i" of the block/CU.

A check is performed in step (1214) to know if all pixels of the block/CU have been processed. If "yes", the process is completed by the ordering process (1215) explained later otherwise the next pixel is considered in step (1203) which was already described.

Coming back to step (1206) when the check returns the value "No", the next step is step (1207) where the absolute value for each colour component between pi and the palette element at the index j is computed. If all the absolute differences is strictly inferior to a threshold equal to TH (set to 9 in this embodiment), the occurrence counter regarding the entry i" in the palette is incremented by one. After step (1212), the next step is step (1213) which was already described.

In step (1207) when the check returns a negative answer, the variable j is incremented (1208) to compare the other palette elements to the current pixel (1207). If no element in the palette respects the criterion (1207) a new element is added to the palette (1209, 1210, 1211).

Please note that for the decision module (1207) can compared each colour element for a 444 (YUV or RGB) sequences and can only compare the Luma colour component for 420 sequences.

At the end of this process, the table "Counter' contains the occurrences of the palette elements. Then the palette elements are ordered in step (1215) according to their occurrences so that the most frequent element is in the first position in the palette.

Moreover, the palette size can exceed the maximum size of the palette which is fixed to 24 in the current implementation. If the palette size exceeds 24, the palette size is set equal to 24 and the remaining elements are removed from the palette. At the end of these processes the palette is built. It is only an example of implementation and it is possible to build the palette directly with this algorithm.

Figure 13 illustrates the process selection of the current implementation of the block of levels, prediction modes and runs. The input data of this process are the original block/CU, the palette and the block/CU size.

In a first step, the variable i" relative to a pixel counter is set to 0 (1301).

The two modes of prediction ("Pied mode" = 0 and "Pied mode" =1) are evaluated independently.

For the copy up' prediction (corresponding to "Pied mode" =1), the vanable "icopy" is set equal to 0 (1303). Then the step (1304) checks if the current level at pixel location Block[i + icopy] is equal to the level of the pixel in the above line corresponding to the location Block[i + icopy -width] where "width" corresponds to the width of the current block CU considered. If the answer is positive, the variable "icopy" is incremented by one in step (1305).

Please note that the level of each index of the block/CU is determined in parallel in step (1308). This consists in affecting to the pixel at the position i the le closest palette element (1308). This process used, the position i, the palette (1306) and the original block (1307). In step (1308), a level is associated to the sample located a position "i" in the block/CU.

If at step (1304) the answer returns "No", the variable icopy" is transmitted to the decision module 1314. This variable "icopy" corresponds to the number of value copied from the up line.

In parallel (or sequentially), the loop to determine the run value for the left prediction mode is processed. First the variable "iStart" and "j" are set equal to i" and "ileft" is set equal to 0 (1009). At step (1310) a check is performed to know if j =0 and if "Pred mode-1]" = 0 and if Block[j] = Block[j-1]. If it is true, the variable "ileft" is incremented by one (1311) and the variable j is incremented (1312).

When at step 1310, the answer is "No", the next step is (1313) where a new check is performed. If "j" is superior to "iStart" (1313). So it checks if it is not the first value of the prediction mode which is not predicted and directly encoded with the level Q =0 or Pred mode[j-1] = 0). If it is not the first value the variable "ileft" is transmitted to the decision module 1314. And it is the first checked level (1313), only the variable j is incremented. In the same way as the loop for copy up prediction, the level at the index i Block[i] is determined in the same loop (1308).

After computing the maximum run for the left prediction' and the copy up' mode, the variable "ileft" and "icopy" are compared (1314) according to know if "icopy" 0 and "icopy" + 2 is superior to "ileft" It means that if "icopy" is equal to 0, the "left prediction" mode is selected ("icopy" 0) (1315). If it is different to 0 and that "icopy" + 2 is superior to "ileft" the "copy up" mode is selected (1316).

For the left prediction" mode the "PredMode" variable is set equal to 0 and the run to "ileft" (1315). For the copy up' mode the "PredMode" variable is set equal to 1 and the run to "icopy"-l' (1316).

Then the tables containing the "Pred mode" and the Run" are updated with the current value "Fred mode" and "run" (1317). Please note, that the next position to consider in step (1318) correspond to the current position incremented by the run" value. Then a check is performed in step (1319) to know the last pixels have been processed. If yes the process is completed (1320) otherwise evaluation of the two prediction modes "left prediction" and "copy up" are evaluated for the next pixel position.

At the end of this process, the encoder know the levels for each sample of the block/CU, and is able to encode the corresponding syntax of the block/CU based on the content of the two tables "Pred mode" and "Run".

To determine the block prediction, the encoder then converts the defined block of levels according to the palette.

SUMMARY OF THE INVENTION

The present invention has been devised to improve the coding efficiency of the palette mode. More specifically, the current invention improves the coding efficiency by selecting efficiently the parameters of the Palette mode.

According to a first aspect, the invention provides a method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, said method comprising for a current portion, inserting said portion's current colour in the colour palette as a new colour element, if the current colour is not present in the colour palette and if the absolute difference between the current colour and at least one element of the colour is superior to a predetermined threshold. Said threshold is adaptive.

In an embodiment, the threshold is adapted according to a predetermined quality parameter.

In an embodiment, the coding of the block comprises using a given coding mode having a quantization step based on quantization parameters, the threshold being adapted according to the value taken by the quantization parameters.

In an embodiment, the quantization step is applied on a residual obtained from the picture's block and wherein said quantization step is realized in portion's domain.

In an embodiment, the given mode is the Transform Skip mode.

In an embodiment, the threshold is adapted according to a rate-distortion criteria applied on the picture's block.

In an embodiment, a rate-distortion function J is defined by J=D+A.R, where D is a measure of the distortion applied on the picture's block, R a coding cost of the bits and A a Lagrangian parameter, wherein the threshold is adapted according to the value taken by A. According to another aspect of the invention, it is provided a method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, said encoding the colour of a current portion comprising: allocating to the at least one portion of the block, a colour element selected among the colour elements of the colour palette, the definition of said selected colour element being the closest to the definition of said portion colours definition, and updating the definition of at least one colour element of the colour palette with a value based on the real colours of all the portions of the block whose said colour element has been allocated.

In an embodiment, the definition of at least one colour element of the colour palette is updated with a value corresponding to the average of the real colours of all the portions of the block whose said colour element has been allocated.

In an embodiment, the definition of at least one colour element of the colour palette is updated with a value corresponding to the median of the real colours of all the portions of the block whose said colour element has been allocated.

According to another aspect of the invention, it is provided a method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, wherein encoding the colour of a current portion comprising: -Considering all the colour elements of the colour palette, and -Encoding the colour of the current portion by using the colour element whose definition is the closest to the current portion colour's definition.

According to another aspect of the invention, it is provided a method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, wherein encoding the colour of a current portion comprising determining a prediction mode among at least two prediction modes, the determining step comprising: -Determining for each mode the value of a syntax element indicating the number of consecutive portions following the current portion having the same colour, and -If the value of the syntax element for the first mode is non-null and superior to the value of the syntax element for the second mode then selecting the first mode, else selecting the second mode.

According to another aspect of the invention, it is provided a method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, wherein encoding the colour of a current portion comprising determining a prediction mode among at least two prediction modes, the first mode being associated to a syntax value indicating the number of consecutive portions following the current portion having the same colour, and the second mode being associated to a syntax value indicating the number of consecutive portions following the current portion having the same colour and a value corresponding a colour element of the palette colour] the determining step comprising: -selecting for each mode the value of a syntax element only, and -If the value of the syntax element for the first mode is non-null and superior to the value of the syntax element for the second mode then selecting the first mode, else selecting the second mode.

In an embodiment, the first mode is the "copy up mode" where the colours of the current portion is copied from the colour of the portion located immediately above the current portion and the second mode is the "left prediction" mode where the colour of the current portion is predicted from the colour of the portion located at the left of the current portion.

For example, the portion of the block is a pixel of the block.

According to another aspect of the invention, it is provided a coding device configured to implement a coding method as mentioned above.

At least part of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a USB key, a memory card, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 illustrates the architecture of a video encoder.

Figure 2 illustrates the architecture of a video decoder.

Figure 3 illustrates the concept of the causal area.

Figure 4 illustrates Chroma format supported in HEVC Range Extension.

Figure 5 illustrates the CTB splitting in CUs and the scan order decoding of these CU.

Figure 6 illustrates the decoding process for the coeff_abs_level_remaining of HEVC.

Figure 7 illustrates the principle of palette mode prediction.

Figure B illustrates an example of the palette and block/CU of levels derivation.

Figure 9 illustrates an example of the syntax palette element generated.

Figure 10 illustrates the decoding process of the palette mode syntax.

Figure 11 illustrates the construction of the block/CU of levels at decoder side.

Figure 12 illustrates the palette determination algorithm at encoder side.

Figure 13 illustrates the process selection of the current implementation of the block of levels, prediction modes and runs, and Figure 14 illustrates one embodiment of the invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In one embodiment, the threshold used in the palette determination algorithm at encoder side is adapted to the quality. As described in Figure 12, the threshold "TH" used in module 1207 is fixed and set equal to 9. In this embodiment, this threshold depends on the quality targeted for the current block. In one embodiment, the threshold "TH" is set equal to the quantization step q used in the transform Skip mode. For a transform Skip block, all coefficients are de-quantized uniformly. In the HEVC standard, the quantization is quite complex and it depends on several parameters. An easy way to obtain the quantization step q for the threshold "TH" is to use the de-quantization function of the transform skip with coefficient 1. Indeed the de-quantized coefficient 1 gives the quantization step q. Please note that the quantization step depends on the size of the transform. So to simplify the implementation the transform size is set always equal to the 4x4 size. In that case, if the OP is set for the whole frame, the quantization step q is determined only once.

In another embodiment, the "TH" is set equal to the a.q. In a preferred embodiment, a is equal to 3/4. In one embodiment, a depends on the size of the current block.

In one embodiment, the threshold is set equal to a formula which depends on the Lambda of the rate distortion criterion used to compare the palette mode to other coding modes.

In one embodiment, the value of each palette element is modified in order to find an optimal value.

In one other embodiment, the average of the pixels related to one element is selected instead of the first element. Figure 14 illustrates one possible way to implement it. This figure is the same as Figure 12, except that the modules 1401, 1411 and 1412 are changed. In 1401, the variable sum" containing the sum of all pixels for each colour component is initialized to 0. When a palette element is added to the Palette and the variable counter for this new element is set equal to 1, the variable Sum" is set equal to the pixel pi for each colour component. When the decision module 1407 decides that the pixel pi is close to the palette element j, the counter at the index j is incremented (1412) as in 1212 and each colour component of the pixel pi is added to each colour sum of the variable sum at the index j: "Sum[j] =+ pi". When all the pixels of the block are processed, module 1407 computes the average value for each palette element.

This process is just before the palette reordering module 1415. So, each palette element Pal[j] is set equal to sum[j]/Counter[j] (the average value of the pixel associated to this palette element index j) (1417). Of course this average is set independently for each colour component.

In one embodiment, the average value can be updated for each new pixel associated to a class. This consists in removing the module 1417 of Figure 14 and changing the module 1412. In this module 1412, after adding the current pixel pi to the sum the Pal[j], the Pal[j] (the palette element at the index j) is updated for each colour component to the average: sum[j]/Counter[j].

In another embodiment, the average is changed by the median value of the pixels associated to a palette element.

In another embodiment, the average is changed by the most probable value. It means that the pixel value associated to a palette element which is the most often selected is selected as the value of the palette element.

In the current implementation, the level associated to a pixel is the first element in the palette. The selection of the level for the block of levels is processed in module 1308 of Figure 13. In one embodiment, the encoder looks for the closest palette element instead of the first closest element in the list. Of course, for this embodiment all palette elements need to be compared to the current pixel but it improves the coding efficiency.

In the current implementation, the selection between the two prediction modes is based on the criterion "icopy 0 && icopy + 2 > ileft" as shown in module 1314 of Figure 13. This means that, if the run of the copy up + 2 is superior to the run of the left prediction mode and if the "copy up" run is different to 0, the "copy up" mode is selected. The operation "+ 2" tries to compensate the rate cost of the level of the left mode. Yet this is not optimal and this should depend on the number "Palette_size" which is directly related to the cost of the level. In a preferred embodiment, the operation "+2" of the criterion of module 1314 is removed in order to be: "icopy 0 && icopy > ileft". This embodiment improves the coding efficiency.

Claims

CLAIMS1. A method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, said method comprising for a current portion, inserting said portion's current colour in the colour palette as a new colour element, if the current colour is not present in the colour palette and if the absolute difference between the current colour and at least one element of the colour is superior to a predetermined threshold, wherein said threshold is adaptive.
2. A method according to claim 1, wherein the threshold is adapted according to a predetermined quality parameter.
3. A method according to claim 2, wherein the coding of the block comprises using a given coding mode having a quantization step based on quantization parameters, the threshold being adapted according to the value taken by the quantization parameters.
4. A method according to claim 3, wherein the quantization step is applied on a residual obtained from the picture's block and wherein said quantization step is realized in portion's domain.
5. A method according to claim 3 or claim 4, wherein the given mode is the Transform Skip mode.
6. A method according to claim 1, the threshold is adapted according to a rate-distortion criteria applied on the picture's block.
7. A method according to claim 5, wherein a rate-distortion function J is defined by J=D+A.R, where D is a measure of the distortion applied on the picture's block, R a coding cost of the bits and A a Lagrangian parameter, wherein the threshold is adapted according to the value taken by A.
8. A method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, said encoding the colour of a current portion comprising: allocating to the at least one portion of the block, a colour element selected among the colour elements of the colour palette, the definition of said selected colour element being the closest to the definition of said portion colour's definition, and updating the definition of at least one colour element of the colour palette with a value based on the real colours of all the portions of the block whose said colour element has been allocated.
9. A method according to claim 8, wherein the definition of at least one colour element of the colour palette is updated with a value corresponding to the average of the real colours of all the portions of the block whose said colour element has been allocated.
10. A method according to claim 8, wherein the definition of at least one colour element of the colour palette is updated with a value corresponding to the median of the real colours of all the portions of the block whose said colour element has been allocated.
11. A method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, wherein encoding the colour of a current portion comprising: -Considering all the colour elements of the colour palette, and -Encoding the colour of the current portion by using the colour element whose definition is the closest to the current portion colour's definition.
12. A method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, wherein encoding the colour of a current portion comprising determining a prediction mode among at least two prediction modes, the determining step comprising: -Determining for each mode the value of a syntax element indicating the number of consecutive portions following the current portion having the same colour, and -If the value of the syntax element for the first mode is non-null and superior to the value of the syntax element for the second mode then selecting the first mode, else selecting the second mode.
13. A method for coding a block of a picture in a video sequence, said method comprising for at least one portion of the block, encoding a colour based on a colour palette with several colour elements, each colour element being defined according to a weighted combination of several colour components, wherein encoding the colour of a current portion comprising determining a prediction mode among at least two prediction modes, the first mode being associated to a syntax value indicating the number of consecutive portions following the current portion having the same colour, and the second mode being associated to a syntax value indicating the number of consecutive portions following the current portion having the same colour and a value corresponding a colour element of the palette colour, the determining step comprising: -selecting for each mode the value of a syntax element only, and -If the value of the syntax element for the first mode is non-null and superior to the value of the syntax element for the second mode then selecting the first mode, else selecting the second mode.
14. A method according to claim 12 or 13, wherein the first mode is the "copy up mode" where the colours of the current portion is copied from the colour of the portion located immediately above the current portion and the second mode is the "left prediction" mode where the colour of the current portion is predicted from the colour of the portion located at the left of the current portion.
15. A method according to any preceding claims, wherein the portion of the block is a pixel of the block.
16. A coding device configured to implement a coding method according to anyone of the preceding claims.