GB2524477A

GB2524477A - Method and device for decoding or encoding a frame

Info

Publication number: GB2524477A
Application number: GB1404666.8A
Authority: GB
Inventors: Guillaume Laroche; Christophe Gisquet; Patrice Onno
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-03-14
Filing date: 2014-03-14
Publication date: 2015-09-30
Also published as: GB201404666D0

Abstract

Processing a set of reconstructed samples 1101a-1108a of an image frame comprising obtaining a range 1109a of possible sample values and setting all reconstructed sample values within the obtained range to be the same single value 1104a. In the embodiments this processing is part of a video encoding or decoding algorithm as a sample adaptive offset (SAO) filter that is applied to samples reconstructed after HEVC transform encoding, and is useful in the encoding of computer generated or screen content images which have a small number of discrete colours. The filtering rounds the reconstructed sample value to the nearest colour of the original colour palette. Preferably the whole range of possible sample values are divided into a number of contiguous ranges 1109a-1111a, which each have an associated single value. The associated value may be calculated based on a value associated with the frame, or may be the modal reconstructed sample value within each obtained range.

Description

METHOD AND DEVICE FOR DECODING OR ENCODING A FRAME

FIELD OF THE INVENTION

The present invention relates to a method and device for decoding an encoded frame into a decoded frame. Particularly, but not exclusively, the invention relates to a method and device for providing a filtered reconstructed frame in an HEVC decoder.

BACKGROUND OF THE INVENTION

Video data is typically composed of a series of still images or frames which are shown rapidly in succession as a video sequence to give the idea of a moving image. Video applications are continuously moving towards higher and higher resolution. A large quantity of video material is distributed in digital form over broadcast channels, digital networks and packaged media, with a continuous evolution towards higher quality and resolution (e.g. higher number of pixels per frame, higher frame rate, higher bit-depth or extended colour gamut). This technological evolution puts higher pressure on the distribution networks that are already facing difficulties in bringing HDTV resolution and high data rates economically to the end user.

Video coding techniques typically use spatial and temporal redundancies of images in order to generate data bit streams of reduced size compared with the video sequences. Spatial prediction techniques (also referred to as Intra coding) exploit the mutual correlation between neighboring image pixels, while temporal prediction techniques (also referred to as INTER coding) exploit the correlation between images of sequential images. Such compression techniques render the transmission and/or storage of the video sequences more effective since they reduce the capacity required of a transfer network, or storage device, to transmit or store the bit-stream code.

An original video sequence to be encoded or decoded generally comprises a succession of digital images or frames which may be represented by one or more matrices the coefficients of which represent pixels. An encoding device is used to code the video frames into a bit stream, with an associated decoding device being available to reconstruct the frames from the bit stream for display and viewing.

Common standardized approaches have been adopted for the format and method of the coding process. One of the more recent standards is HEVC, in which a video frame is partitioned into smaller portions of pixels known as Coding Units (sometimes referred to as macroblocks or blocks of pixels or frame areas). The coding units can be partitioned and adjusted according to the characteristics of the original frame segment under consideration. This allows more detailed coding of areas of the video frame which contain relatively more information and less coding effort for those areas with fewer features.

Both encoding and decoding processes involve a decoding process of an encoded frame. This process is typically performed at the encoder side for the purpose of future motion estimation which enables an encoder and a corresponding decoder to have the same reference frames.

A filtering process may be applied to the encoded frame to improve the visual quality of the frame as well as the motion compensation of subsequent frames.

In the current HEVC standard, loop filters which are employed include: a deblocking filter, an adaptive loop filter (ALE) and a sample adaptive offset filter (SAO).

SAO filtering enables correction of the pixels of a group named class of pixels by adding the same integer value to each pixel of the class. In the current SAO applied in HEVC, pixel classification/grouping is determined according to the type of SAO: Edge Offset type or Band Offset type.

In addition, for the Edge Offset type, the Edge Offset class enables the edges form of a SAO partition to be identified according to a given direction.

The Band Offset type splits the whole range of pixel values into bands of pixels. Each band contains four classes, each corresponding to a sub-range of contiguous pixel values. The encoder selects the best integer offset value for each class. The offset values are then inserted in the bitstream for each SAC partition.

Conventional SAO filtering is fully adapted to the encoding of natural sequences but appears poorly adapted to the encoding of screen content" video sequences. The "screen content" video sequences refer to particular video sequences having computer-or machine-generated content, for example text, PowerPoint (RTM) presentations, Graphical User Interfaces, Electronic Program Guides, tables (e.g. screen shots). Such screen content may be characterised by having relatively few colours or by having discrete colours spaced out at relatively wide intervals in the colour spectrum.

In video coding, performance of conventional video coding tools, including HEVC, proves sometimes to be underwhelming when processing such "screen content". This is because these particular "screen content" video sequences have quite different statistics compared to natural video sequences. For example, the colour components, e.g. luma Y and chroma U, V components, of pixels within the frame or within a part of a frame have fewer values that actually occur, thus resulting in an original histogram of pixel values that has peaks for some specific pixel values.

However, the quantization of the conventional encoding tends to disperse the pixel values in the vicinity of these specific pixel values, and conventional SAO filtering does not correct this loss of sharpness due to the quantization.

Improved forms of sample adaptive filtering are required to improve the efficiency and quality of encoding and decoding process.

The present invention has been devised to address one or more of the foregoing concerns.

According to a first aspect of the invention there is provided a method of processing a set of reconstructed samples of a frame, the reconstructed samples having respective sample values, the method comprising: obtaining at least one range of sample values forming a subpart of a full range of all the possible sample values, and setting all reconstructed samples whose sample values are within such an obtained range to one and the same filtered sample value.

The present invention improves the coding efficiency in particular for screen content such as screen shots.

In natural video sequences, the histogram of the original pixel values in the sequence is distributed over the full iange of the possible pixel values. In video sequences based on screen content, the histogram of the original pixel values is quite different. In particular, the number of pixel values actually in the video frames is small, so that the original histogram of pixel values has peaks at some pixel values only, which pixel values are usually clearly separated one from the other. However, quantization of the conventional coding process makes this histogram being spread or dispersed over the neighboring values of these few original pixel values.

Thanks to the invention and particularly to the use of one and the same filtered sample or pixel value to replace the pixel values spread within the obtained range, i.e. dispersed around one original pixel value, it is possible to reconstruct efficiently the original pixel value corresponding to them in order to match the original histogram.

According to a second aspect of the invention, there is provided a device for processing a set of reconstructed samples of a frame, the reconstructed samples having respective sample values, the device comprising: a range determining module configured for obtaining at least one range of sample values forming a subpart of a full range of all the possible sample values, and a filtering module configured for setting all reconstructed samples whose sample values are within such an obtained range to one and the same filtered sample value.

The device provides similar advantages to the above-defined method.

Optional features of the method or of the device are defined in the appended claims and summarized below.

In some embodiments, the sample values in a range are contiguous sample values. This is for example compliant with the ranges in SAO filtering.

In some embodiments, the method further includes obtaining a value associated with the frame, and the one and the same filtered sample value is obtained by adding the obtained value to one of the sample values that is representative of the obtained range.

This provision advantageously uses the same offset syntax as in the conventional SAO filtering. As a consequence, this reduces the adaptation of the conventional encoding/decoding standard to implement the invention.

In some specific embodiments, the representative value is one of the sample values that corresponds to the middle of the obtained range.

This may be the mean integer value for the range. This provision advantageously does not require transmission of such information between the encoder and the decoder, thus reducing coding costs.

According to specific features, the representative value corresponds to a value of a sample value outside the obtained range. In some cases, this provision provides better coding efficiency.

In some other embodiments, the method further includes obtaining a value associated with the frame, and the one and the same filtered sample value is the obtained value.

This provision gives high freedom in choosing the value of the filtered sample value to be applied to all the pixels whose values belong to the range considered.

In some other embodiments, the method further includes obtaining a n-bit value associated with the frame, n being less than a bitdepth of the sample values, and adding the decoded n-bit value to one of the sample values that is representative of the obtained range, to obtain the one and the same filtered sample value.

This provision significantly reduces coding costs since the filtered sample value to be used is expressed relatively to the representative sample value, thus reducing the number of bits required to encode the value in the bitstream.

In some specific embodiments, n is the logarithm of the number of sample values forming the obtained range with respect to base 2 (i.e. log2), and the representative value is the starting sample value forming the obtained range.

This results in having the one and the same filtered sample value belonging to the obtained range.

One may note that the starting pixel value forming the range may be obtained using any pixel value "pi" of the frame that belongs to said range and using the bitwise AND operator, denoted &, as follows: (p & MASK) where MASK is made of k ones ("l's) followed by n zeros ("D's), kim being the bitdepth of the pixel values.

In some other specific embodiments, n is strictly higher than the logarithm of the number of sample values forming the obtained range with respect to base 2.

This provision makes it possible to have the filtered sample value taking value from the range but also from pixel values neighboring the range. The neighboring pixel values depend on which representative value is used.

Preferably, the representative value is selected as being one of the pixel values that corresponds to the middle of the obtained range. The neighboring pixel values thus include pixel values from both external sides of the range considered.

Of course, for the very first range and the last range of all the possible pixel values (i.e. the range including the possible value 0 and the range including the last possible value 2Abitdepth1), the representative value is selected to define neighboring pixel values in the sole external side that exists for the range considered.

Also preferably, n is selected as being the above logarithm plus 1.

In some other embodiments, the one and the same filtered sample value has a fixed length.

In some other embodiments, the sample value of the obtained range that has the highest number of occurrences in the reconstructed samples of the frame is selected as the one and the same filtered sample value.

The selection of the filtered sample value may thus be performed at both encoder and decoder without additional information compared to conventional encoding. Therefore, the above provision advantageously reduces the bits to be sent to the decoder, i.e. reduces coding costs.

In yet some other embodiments, the method further comprises determining, based on at least one item of information, whether or not the operation of setting samples to one and the same filtered sample value is to be applied to reconstructed samples whose samples values belong to a given range of sample values.

This configuration is to allow a plurality of filterings to concurrently exist.

In some specific embodiments, when the item of information indicates that the operation of setting samples is to be applied, the one and the same filtered sample value for the obtained range is calculated based on said item of information.

This provision avoids use of an additional bit to signal such choice of applying or not the invention to reconstructed samples.

Of course, in a variant, a flag may be used to signal which range or ranges of contiguous possible values are concerned with the sample adaptive filtering of the invention.

In yet some other embodiments, the method further comprises determining the number of sample values belonging to each of a plurality of ranges splitting the full range of the possible sample values, and obtaining at least one range of sample values includes selecting the range or ranges having the highest number of samples values.

This means that these selected ranges are those for which the sample values of each are modified into one and the same filtered sample value associated with the range considered.

The number of ranges to be selected may be predefined. In a variant, the selection may be conducted on the ranges that have a number of samples values above a predefined threshold.

In yet some other embodiments, the method comprises: obtaining a second set of reconstructed samples of the frame; obtaining, for the second set of reconstructed samples, at least one second range of sample values, forming part of the full range of the possible sample values; and setting all reconstructed samples of the other set, whose sample values are within such obtained second range to one and the same second filtered sample value, wherein the ranges obtained for the two set of reconstructed samples and/or the two filtered sample values are different.

For example, each set of reconstructed samples may correspond to a coding tree block (CTB) according to HEVC. In this example, the above provision defines different ranges and filtered values for different CTBs.

The invention also concerns a method of decoding a frame comprising: receiving encoded data comprising encoded sample values; decoding the encoded sample values to provide a set of reconstructed samples; and processing the set of reconstructed samples using the method as described above.

In some embodiments, the or each obtained range is read from the encoded data received from an encoder.

In variants, the or each range is obtained based on an analysis of the reconstructed samples.

In some embodiments, the one and the same filtered sample value for the obtained range is obtained from the encoded data.

The invention also concerns a method of encoding a frame comprising: encoding a set of samples of the frame; decoding the encoded samples to provide a set of reconstructed samples, the reconstructed samples having respective sample values; processing the set of reconstructed samples using the method as described above; and transmitting encoded data comprising encoded sample values of the encoded samples.

In some embodiments, transmitting encoded data comprises transmitting, to a decoder, information from which the decoder can obtain the or each range.

In some other embodiments, the or each range is selected using a rate-distortion criterion.

In yet some other embodiments, a plurality of sets including contiguous ranges of sample values or a plurality of ranges of sample values for which respective operation of setting samples to one and the same filtered sample value are applied, are signaled in the encoded data.

This provision makes it possible not to have only contiguous ranges as in the conventional HEVC. This is to take advantage of the high contrasts of the screen contents.

In some specific embodiments, only two sets of contiguous ranges are signaled in the encoded data using a m-bit identifier, the ranges of a first set belonging to the first half of ranges among all the ranges splitting the full range of the possible sample values and the ranges of the second set belonging to the second half.

Therefore, m can be chosen as the logarithm of the number of ranges splitting the full range of the possible sample values with respect to base 2, less one.

This provision reduces coding costs since one bit is saved compared to conventional identification of a band (i.e. compared to the length of the

sao_band_position field in HEVC).

In some other specific embodiments, the full range of the possible sample values is split into 2 ranges having the same number N of sample values; and the plurality of ranges or of sets of contiguous ranges as signaled in the encoded data are ordered according to an increasing range order among all the ranges splitting the full range of the possible sample values; and each time a range or a set of contiguous ranges is identified in the last 1I2-th of the ordered ranges, the next set or range is signaled in the encoded data using a (n-fl-bit identifier, where n = log2(N).

This provides makes it possible to save an increasing number of bits for signaling the sets (i.e. bands in SAC) or ranges as the sets or ranges reaches the end of the possible sample values. Consequently, the coding costs are reduced.

Another aspect of the invention relates to a non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in a device, causes the device to perform the steps of the above-defined method.

The non-transitory computer-readable medium may have features and advantages that are analogous to those set out above and below in relation to the method of decoding, in particular that of improving SAG filtering for screen content.

At least parts of the method according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects which may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium, for example a tangible carrier medium or a transient carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RE signal.

BRIEF DESCRIPTION OF THE DRAWNOS

Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which: -Figure 1 illustrates a video encoder, compliant with the HEVC standard for video compression in which embodiments of the invention may be implemented; -Figure 2 illustrates a corresponding video decoder in which embodiments of the invention may be implemented; -Figures 3A and 3B illustrate examples of the Edge Offset class for the Edge offset type in HEVC; -Figure 4 illustrates an example of the sample adaptive Band offset classification for the Band Offset type of HEVC; -Figure 5 is a flow chart illustrating steps of a process for decoding SAC parameters; -Figure 6 is a flow chart illustrating an example of the conventional process for SAO parameter syntax reading; -Figure 7 is a flow chart of the conventional SAO decoding process; -Figure 8 is a flow chart illustrating the process of offsets determination at encoder side for the conventional SAC of HEVC; -Figure 9 is a flow chart illustrating the process of optimal offsets determination in term of RD criterion at encoder side for the conventional SAC of HEVC; -Figure 10 is a flow chart illustrating a process of SAC band position determination at encoder side for the conventional SAC band offset of HEVC; -Figures ha to lid illustrate ranges of pixel values and the possible filtered pixel value vj that can be associated with a given range in some embodiments of the invention; -Figure 12a and 12b illustrate two configurations where two or more ranges are signaled to the decoder for implementation of the invention; -Figure 13 is a flowchart illustrating steps of decoding the signaling of a method according to the invention; -Figure 14 is a flowchart illustrating steps for selecting parameters of the method according to the invention; -Figure 15 schematically illustrates a data communication system in which one or more embodiments of the invention may be implemented; and -Figure 16 schematically illustrates a processing device configured to implement at least one embodiment of the present invention.

DETAILLED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Figure 1 illustrates a block diagram of video encoding device 10 of a generic type conforming to a HEVC video compression system. The HEVC encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 1611 of device 1600 represented in Figure 16, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images.

An original sequence of digital images or frames 101 to be compressed is received as an input by the encoder. The original video sequence 101 is a succession of digital images "images i'. As is known per se, a digital image or frame is represented by one or more matrices of which the coefficients or samples represent pixels.

A coded bitstream 110 is output by the encoder after implementation of the encoding process. The encoder successively performs the steps described below to provide the coded bitsream. The bitstream 110 comprises a plurality of coding units or slices, each coding unit or slice comprising a header for transmitting encoding values of encoding parameters used to encode the coding unit or slice, and a payload body comprising encoded video data.

A first image or frame to be encoded (compressed) is divided into blocks of pixels referred to as coding units (CU) in the HEVC standard. The first frame is thus split into blocks or macroblocks 102. Each block or macroblock is an area in the frame.

A coding unit of an HEVC frame corresponds to a square block of that frame, and can have a size in a pixel range from 8x8 to 64x64. A coding unit which has the highest size authorized for the considered frame is also called a Largest Coding Unit (LCU) or CTB (coded tree block). The CTB5 may be defined in the bitstream using a quadtree that reflects a recursive breakdown of the frame into square-shaped regions of pixels.

A coding mode is then affected to each block. There are two families of coding modes: the modes based on spatial prediction (INTRA) (103) and the modes based on temporal prediction (INTER, Bidir, Skip) (104, 105).

An INTRA block is generally predicted from the encoded pixels at its causal boundary by a process called INTRA prediction. An INTRA prediction residual is computed as the difference between the block to be predicted and its block predictor. A prediction direction defining where the block predictor can be found relatively to the block and the INTRA prediction residual is encoded in the bitstream 110 if INTRA prediction is elected.

An INTER block is generally predicted from a reference block predictor selected by a motion estimation operation 104 from reference frame 116 stored in a dedicated memory buffer., An INTER prediction residual is computed as the difference between the block to be predicted and its reference block predictor, through motion compensation 105. In addition to the INTER prediction residual, motion information (comprising the motion vector and possibly an identifier of the reference frame used) is encoded in the bitstream 110 if INTER prediction is elected.

However, in order to further reduce the bitrate cost related to motion vector encoding, the motion vector is not directly encoded. This is because assuming that motion is quite homogeneous amongst neighboring blocks, it is particularly interesting to encode the motion vector relatively to the motion vectors used for the neighboring blocks, fol example to encode the motion vector as a difference between this motion vector and a motion vector in its dyadic surrounding.

In H.264 for instance, the motion vectors are encoded with respect to a median vector computed between the three adjacent blocks located above and on the left of the current block. Only a difference (also called motion vector residual) computed between the median vector and the current block motion vector is encoded in the bitstream. This is processed in module Mv prediction and coding (117). The value of each encoded vector is stored in the motion vector field (118) to be used for computation of the median vector.

In HEVC, a slightly different process is implemented to predict the motion vector.

The INTRA and INTER prediction residual are supplied and compared in a module for selecting the best coding mode 106. The mode optimizing the rate distortion performance is selected.

In order to further reduce the redundancies, the prediction residual selected by the choice module 106 is then transformed (107) in the frequency domain, by means of a discrete cosine transform OCT. and then quantized (108). The OCT transform and the quantization usually use blocks of smaller size than the LCUs or CTBs, for example of 4x4 or 8x8 pixels.

Finally, a last entropy coding step (109) is performed to encode the selected coding mode, the motion information in case of INTER prediction or the prediction direction in case of INTRA prediction, as well as the quantized DCT coefficients previously calculated. This entropy coder encodes each of these data into their binary form and encapsulates the so-encoded block into a container referred to as a NAL unit (Network Abstract Layer). A NAL unit contains all encoded coding units from a given slice. A coded H EVC bit-stream 110 is composed of a series of NAL units.

In the remainder of the document, reference will mainly be made to entropy coding. However, a person skilled in the art is capable of replacing it with arithmetic coding or any other suitable coding.

As is known per se, the bit stream corresponding to an encoded block (LCU or CTB) comprises a first part made of syntax elements and a second part made of encoded data for each data block.

The second part generally includes the encoded data corresponding to the encoded data blocks, i.e. the encoded residuals together with their associated motion vectors and reference frame indexes (INTER prediction) or with the prediction direction (INTRA prediction).

In order to calculate the "INTRA" block predictors or to make the motion estimation for the "INTER" block predictors, the encoder performs decoding of the blocks already encoded by means of a so-called "decoding" loop (111, 112, 113, 114, 115, 116) in order to obtain reference frames 116 for the future motion estimations.

This decoding loop makes it possible to reconstruct the blocks and frames from quantized transformed residuals.

It ensures that the coder and decoder use the same reference frames.

Thus the quantized transformed residual is dequantized (111) by application of a quantization operation which is inverse to the one provided at step 108, and is then reconstructed (112) by application of the transformation that is the inverse of the one at step 107.

If the quantized transformed residual comes from an "Intra" coding 103, the "Intra" predictor used is added to that residual (113) in order to obtain a reconstructed block corresponding to the original block modified by the losses resulting from the quantization operation.

If on the other hand the quantized transformed residual comes from an "lntei' coding 105, the block pointed to by the current motion vector (this block belongs to the reference frame 116 referred to in the coded motion information) is added to this decoded residual (114). In this way the original block is obtained, modified by the losses resulting from the quantization operations.

In order to attenuate, within the same frame, the block effects created by strong quantization of the obtained residuals, the encoder includes a post filtering module 115, the objective of which is to eliminate compression artifact, in particular block effects such as the artificial high frequencies introduced at the boundaries between blocks.

For example, H.264/AVC uses a deblocking filter that removes blocking artifacts due to the DOT quantization of residual and to block motion compensation.

The deblocking filter 115 smoothes the borders between the blocks or "frame areas" in order to visually attenuate these high frequencies created by the coding In the current HEVO standard, three types of loop filters are proposed: the deblocking filter, a sample adaptive offset (SAG) filter and an adaptive loop filter (ALF).

Parameters of the post filtering applied can also be inserted in the bitstream 110, for example as syntax elements.

The present invention more specifically focuses on the sample adaptive offset (SAO) filter which is further described below with reference to Figures 3 to 8.

It may be noted that the loop filtering can be applied block by block or CTB by OTB in the HEVC standard. A CTB can be considered as a square area of pixels within the frame for a color component. The post filtered pixels of CIB are not used as reference pixels for Intra prediction.

The filtered frames, also referred to as reconstructed frames, are then stored as reference frames 116 in order to allow subsequent "Inter" predictions based on these frames to take place during the compression of the following frames in the current video sequence.

Figure 2 is a block diagram of a standard HEVC decoding system 20 in which one or more embodiments of the invention may be implemented. This decoding process of a bit-stream 201 starts by the entropy decoding 202 of each block or frame area" (array of pixels) of each coded frame in the bit-stream. This entropy decoding 202 provides the coding mode, the motion information (reference picture indexes, motion vectors of INTER coded blocks) or the intra prediction directions, the residual data and post filtering parameters (e.g. SAO parameters as described below). The residual data comprises quantized and transformed DCI coefficients. Next, these quantized DCT coefficients undergo inverse quantization (203) and inverse transform operations 204.

In function of the coding mode that is decoded from the bitstream, an INTRA type decoding or an INTER type decoding is performed for the current block.

If the coding mode is INTRA, an INTRA predictor is determined using the decoded prediction direction (205).

If the coding mode is INTER, the motion information is extracted from the bitstream (202). This is composed of the reference frame index and the motion vector residual. The motion vector predictor (determined in a similar way as the encoder) is added to the motion vector residual to obtain the motion vector (210). The motion vector is then used to locate the reference block predictor in the reference frame (206).

Note that the motion vector field data (211) is updated with the decoded motion vector in order to be used for the next motion vectors to be decoded (i.e. for the next blocks in the frame).

This first reconstruction of the decoded frame is then post filtered (207) with exactly the same post filter as used at encoder side. The output of the decoder is the un-compressed video (209).

The decoded residual is added to the INTRA predictor or the INTER reference block predictor to provide a reconstructed block.

The reconstructed block finally undergoes one or more in-loop post-filtering processes 207 similarly to the decoding loop of the encoder of Figure 1, e.g. deblocking and SAO, which aim at reducing the blocking artifact inherent to any block-based video codec (deblocking), and improve the quality of the decoded picture.

The full filtered reconstructed frame is then stored in the Decoded Picture Buffer (DPB), represented by the frame memory 208, which stores frames that will serve as references to predict future frames to decode. The decoded frames 209 are also ready to be displayed on screen.

Details of the conventional SAC filtering are now given with reference to Figures 3to 10.

The aim of Sample Adaptive Offset (SAC) filtering is to improve the quality of the reconstructed frame by sending additional data in the bitstream in contrast to the deblocking filter where no information is transmitted.

In a conventional SAC filter, each pixel is classified into a predetermined class or group and the same offset value is added to every pixel sample of the same class/group. One offset is encoded in the bitstream for each class. SAC loop filtering has two SAC types for a Coding Tree Block (CTB): the Edge Offset type and the Band Offset type. An example of Edge Offset type is schematically illustrated in Figure 3, and an example of Band Offset type is schematically illustrated in Figure 4.

SAC filtering is applied CTB by CTB. In this case the parameters are selected for each CTB at the encoder side and parameters are decoded and/or derived for each CTB at the decoder side. This offers the possibility of easily encoding and decoding the video sequence by processing each CTB at once without introducing delays in the processing of the whole frame.. Moreover, when SAC filtering is enabled, only one SAC type is used: either the Edge Offset type filter or the Band Offset type filter according to the related parameters transmitted in the bitstream for each classification. These parameters can be copied from the upper and left CTB, for example, instead of transmitting all the SAC data.

SAC filtering may be applied independently for different color components (e.g. YUV of the frame. For example, one set of SAC parameters may be provided for the luma component Y and another set of SAC parameters may be provided for both chroma components U and V in common.

A description of the Edge Offset type is now provided with reference to Figure 3.

Edge Offset type involves determining an edge index for each pixel by comparing its pixel value to the values of two neighboring pixels. Moreover, these two neighboring pixels depend on a parameter which indicates the direction of these two neighboring pixels with respect to the current pixel. These directions are the 0-degree (horizontal direction), 45-degree (diagonal direction), 90-degree (vertical direction) and 135-degree (second diagonal direction). These four directions are schematically illustrated in Figure 3A.

The table of Figure 3B gives the offset value to be applied to the pixel value of a particular pixel C" according to the value of the two neighboring pixels Cnl and Cn2 at the decoder side.

Wtien the value of C is less than the two values of neighboring pixels Cnl and Cn2, the offset to be added to the pixel value of the pixel C is "+ 01". When the pixel value of C is less than one pixel value of its neighboring pixels (either Cnl or Cn2) and C is equal to one value of its neighbors, the offset to be added to this pixel sample value is "+ 02".

Wtien the pixel value of c is less than one of the pixel values of its neighbors (Cnl or Cn2) and the pixel value of C is equal to one value of its neighbors, the offset to be applied to this pixel sample is "-03". When the value of C is greater than the two values of Cnl or Cn2, the offset to be applied to this pixel sample is 04".

When none of the above conditions is met on the current sample and its neighbors, no offset value is added to the current pixel C as depicted by the Edge Index value "2" of the table.

It is important to note that for the particular case of the Edge Offset type, the absolute value of each offset (01, 02, 03, 04) is encoded in the bitstream. The sign to be applied to each offset depends on the edge index (or the Edge Index in the HEVC specifications) to which the current pixel belongs. According to the table represented in Figure 3B, for Edge Index 0 and for Edge Index 1 (01, 02) a positive offset is applied. For Edge Index 3 and Edge Index 4 (03, 04), a negative offset is applied to the current pixel.

In the HEVC specifications, the direction for the Edge Offset amongst the four directions of Figure 3A is specified in the bitstream by a "sao_eo_class_luma" field for the luma component and a "sao_eo_class_chroma" field for both chroma components U and V. The SAC Edge Index corresponding to the index value is obtained by the following formula: Edgelndex = sign (C -Cn2) -sign (Cnl-C) +2 where the definition of the function sign) is given by the following relationships sign(x) = 1, when x>0 sign(x) = -1, when xcO sign(x) = 0, when x=0.

In order to simplify the Edge Offset determination for each pixel, the difference between the pixel value of C and the pixel value of both its neighboring pixels Cnl and Cn2 can be shared for current pixel C and its neighbors. Indeed, when SAO Edge Offset filtering is applied using a raster scan order of pixels of the current CTB or frame, the term sign (Cnl-C) has already computed for the previous pixels (to be precise it was computed as C'-Cn2' at a time when the current pixel C' at that time was the piesent neighboring pixel Cnl and the neighboring pixel Cn2' was what is now the current pixel C). As a consequence this sign (a0 1-a) does not need to be computed again.

A description of the Band Offset type is now provided with reference to Figure 4.

Band Offset type in SAO also depends on the pixel value of the sample to be processed. A class in SAO Band offset is defined as a range of pixel values.

Conventionally, for all pixels within a range, the same offset is added to the pixel value.

In the HEVC specifications, the number of offsets for the Band Offset filter is four for each reconstructed block or frame area of pixels (CTB), as schematically illustrated in Figure 4.

One implementation of SAC Band offset splits the full range of pixel values into 32 ranges of the same size. These 32 ranges are the classes of SAO Band offset.

The minimum value of the range of pixel values is systematically 0 and the maximum value depends on the bit depth of the pixel values according to the following relationship Max = . Classifying the pixels into 32 ranges of the full interval includes 5 bits checking needed to classify the pixels values for fast implementation i.e. only the 5 first bits (5 most significant bits) are checked to classify a pixel into one of the 32 classes! ranges of the full range.

For example, when the bitdepth is 8 bits per pixel, the maximum value of a pixel can be 255. Hence, the range of pixel values is between 0 and 255. For this bitdepth of 8 bits, each class contains 8 pixel values.

In conventional Band Offset type filtering, the start of the band, represented by the grey area (40), that contains four ranges or classes, is signaled in the bitstream to identify the position of the first class of pixels or the first range of pixel values. The syntax element representative of this position is the "sao_band_position" field in the HEVC specifications. This corresponds to the start of class 41 in Figure 4. According to the HEVC specifications, 4 consecutive classes (41, 42, 43 and 44) of pixel values are used and 4 corresponding offsets are signaled in the bitstream.

Figure 5 is a flow chart illustrating steps of a process for decoding SAO parameters according to the HEVC specifications. The process of Figure 5 is applied for each CTB to generate a set of SAO parameters for the Y component and another set of SAO parameters common to the U and V components. In order to avoid encoding one set of SAO parameters per CTB (which is very costly), a predictive scheme is used for the CTB mode. This predictive mode involves checking if the CTB on the left of the current CTB uses the same SAO parameters (this is specified in the bitstream through a flag named "sao_merge_left_flag"). If not, a second check is performed with the CTB above the current CTB (this is specified in the bitstream through a flag named "sao_merge_up_flag"). This predictive technique enables the amount of data representing the SAO parameters for the CTB mode to be reduced.

Steps of the process are set out below.

In step 501, the process starts by selecting the colour component of the video sequence. In the current version of HEVC, SAO parameters are provided for the luma component Y and for both U and V components (together). In the example of a YUV sequence the process starts with the Y component. In step 503, the "sao_merge_left_flag" is read from the bitstream 502 and decoded. If its value is true, then the process proceeds to step 504 where the SAO parameters of left CTB are copied for the current CTB. This enables the type of the SAO filter for the current CTB to be determined in step 508.

If the outcome is negative in step 503 then the "sao_merge_up_flag" is read from the bitstream and decoded. If its value is true, then the process proceeds to step 505 where the SAO parameters of the above CTB are copied for the current CTB.

This enables the type of the SAO filter for the current CTB to be determined in step 508.

If the outcome is negative in step 505, then the SAO parameters for the current CTB are read and decoded from the bitstream in step 507. The details of this step are described later with reference to Figure 6. After this step, the parameters are obtained and the type of SAO filter is determined in step 508.

In subsequent step 509 a check is performed to determine if the three colour components (Y and U & \J) for the current CTB have been processed. If the outcome is positive, the determination of the SAO parameters for the three components is complete and the next CTB can be processed in step 510. Otherwise, (Only Y was processed) U and V are processed together and the process restarts from initial step 501 previously described.

Figure 6 is a flow chart illustrating steps of a process of parsing of SAO parameters in the bitstream 601 at the decoder side. In initial step 602, the "sao_type_idx_X" syntax element is read and decoded. The code word representing this syntax element can use a fixed length code or could use any method of arithmetic coding. The syntax element sao_type_idx_X" enables determination of the type of SAO applied for the frame area to be processed for the colour component Y or for both Chroma components U & V. For example, for a YUV 4:2:0 sequence, two components are considered: one for Y, and one for U and V. The "sao_type_idx_X" can take 3 values as follows depending on the SAO type encoded in the bitstream. 0' corresponds to no SAG, 1' corresponds to the Band Offset case illustrated in Figure 4 and 2' corresponds to the Edge Offset type filter illustrated in Figure 3.

In the same step 602, a test is performed to determine if the "sao_type_idx_X" is strictly positive. If "sao_type_idx_X" is equal to "0" signifying that there is no SAO for this frame area (CTB) for Y if X is set equal to Y and that there is no SAG for this frame area for U and V if X is set equal to U and V. The determination of the SAG parameters is complete and the process proceeds to step 608. Otherwise if the "sao_type_idx" is strictly positive, this signifies that SAO parameters exist for this CTB in the bitstream.

Then the process proceeds to step 606 where a loop is performed for four iterations. The four iterations are carried in step 607 where the absolute value of offsetj is read and decoded from the bitstream. These four offsets correspond either to the four absolute values of the offsets (01, 02, 03, 04) of the four Edge indexes of SAO Edge Offset (see Figure 3B) or to the four absolute values of the offsets related to the four ranges of the SAG band Offset (see Figure 4).

Note that for the coding of an SAG offset, a first part is transmitted in the bitstream corresponding to the absolute value of the offset. This absolute value is coded with a unary code. The maximum value for an absolute value is given by the following formula: MAX_abs_SAG_offset_value = (1 cc (Min(bitDepth, 10)-5))-1 where cc is the left (bit) shift operator.

This formula means that the maximum absolute value of an offset is 7 for a pixel value bitdepth of 8 bits, and 31 for a pixel value bitdepth of 10 bits and beyond.

The current HEVC standard amendment addressing extended bitdepth video sequences provides similar formula for a pixel value having a bitdepth of 12 bits and beyond. The absolute value decoded may be a quantized value which is dequantized before it is applied to pixel values at the decoder for SAO filtering. An indication of use or not of this quantification is transmitted in the slice header.

For Edge Offset type, only the absolute value is transmitted because the sign can be inferred as explained previously.

For Band Offset type, the sign is signaled in the bitstream as a second part of the offset if the absolute value of the offset is not equal to 0. The bit of the sign is bypassed when CABAC is used.

After step 607, the process proceeds to step 603 where a test is performed to determine if the type of SAO corresponds to the Band Offset type (sao_type_idx_X == 1).

If the outcome is positive, the signs of the offsets for the Band Offset mode are decoded in steps 609 and 610, except for each offset that has a zero value, before the following step 604 is performed in order to read in the bitstream and to decode the position "sao_band_position_X" of the SAO band as illustrated in Figure 4.

If the outcome is negative in step 603 ("sao_type_idx_X" is set equal to 2), this signifies that the Edge Offset type is used. Consequently, the Edge Offset class (corresponding to the direction 0, 45, 90 and 135 degrees) is extiacted from the bitstream 601 in step 605. If X is equal to Y, the read syntax element is "sao_eo_class_luma" and if X is set equal to U and V, the read syntax element is "sao_eo_class_chroma".

When the four offsets have been decoded, the reading of the SAO parameters is complete and the process proceeds to step 608.

Figure 7 is a flow chart illustrating a decoding process of the conventional SAO according the HEVC specifications, for example during step 207. This decoding process is also applied in the decoding loop (step 115) at the encoder in order to produce the reference frames used for the motion estimation and compensation of the following frames. This process is related to the SAO filtering for one color component (thus suffix "_X" in the syntax elements has been omitted below).

An initial step 701 comprises determining the SAO filteling parameters according to processes depicted in Figures 5 and 6. Step 701 gives the sao_type_idx and if it equals 1 the sao_band_position 702 and if it equals 2 the sao_eo_class_luma or sao_eo_class_chroma (according to the colour component processed). It may be noted that if the element sao_type_idx is equal to 0 the SAC filtering is not applied.

Step 701 gives also the offsets table of the 4 offsets 703.

A variable i, used to successively consider each pixel Pi of the current block or frame area (CTB), is set to 0 in step 704. In step 706, pixel 1 is extracted from the frame area 705 (the current CTB in the HEVC standard) which contains N pixels. This pixel 1 is classified in step 707 according to the Edge offset classification or Band offset classification as described respectively in Figure 3 and Figure 4 or in accordance with embodiments of the invention to be described later. The decision module 708 tests if J is in a class that is to be filtered using the conventional SAC filtering.

If] is in a filtered class, the related class number] is identified and the related offset value Otftet1is extracted in step 710 from the offsets table 703. In the case of the conventional SAC filtering this Office7 is then added to the pixel value i in step 711 in order to produce the filtered pixel value J' 712. This filtered pixel J' is inserted in step 713 into the filtered frame area 716. In embodiments of the invention, steps 710 and 711 are carried out differently, as will be explained later in the

description of those embodiments.

If / is not in a class to be SAC filtered then] (709) is inserted in step 713 into the filtered frame area 716 without filtering.

After step 713, the variable i is incremented in step 714 in order to filter the subsequent pixels of the current frame area 705 (if any -test 715). After all the pixels have been processed (i>=N) in step 715, the filtered frame area 716 is reconstructed and can be added to the SAC reconstructed frame (see frame 208 of Figure 2 or 116 of Figure 1).

Figure 8 is a flow chart illustrating steps of an example of an offset selection process at the encoder side that can be applied for the Edge Cffset type filter, in the case of the conventional SAC filtering. A similar approach may also be used for the Band Cffset type filter.

Figure 8 illustrates the offset selection at the encoder side for the current frame area 803. This frame area contains N pixels. In an initial step 801, the variables Sum3 and SumP'ThPix3 are set to zero. j is the current range number to determine the four offsets (related to the four edge indexes shown in Figure SB for Edge Cffset type or to the four ranges of pixel values shown in Figure 4 for Band Offset type). Sum1 is the sum of the differences between the pixels in the range j and their original pixels.

SumA/blix, is the number of pixels in the frame area, the pixel value of which belongs to the range j.

In step 802, a variable i, used to successively consider each pixel Pi of the current frame area, is set to zero. Then, the first pixel I of the frame area 803 is extracted in step 804. In step 805 the class of the current pixel is determined by checking the conditions defined in Figure 3B. Then a test is performed in step 805.

During step 805, a check is performed to determine if the class of the pixel value J-corresponds to the value "none of the above" of Figure 3B.

If the outcome is positive, then the value i" is incremented in step 808 in order to consider the next pixels of the frame area 803.

Otherwise, if the outcome is negative in step 806, the next step is 807 where the related SumNb.Pix3 (i.e. the sum of the number of pixels for the class determined in step 805) is incremented and the difference between] and its original value.F°'' is added to Sum1. In the next step 808, the variable i is incremented in order to consider the next pixels of the frame area 803.

Then a test is performed to determine if all pixels have been considered and classified. If the outcome is negative, the process loops back to step 804 described above. Otherwise, if the outcome is positive, the process proceeds to step 810 where the parameter Offset3 of each class] is computed in order to compute the offset table 811 which is an outcome of the offset selection algorithm.

This offset 0/flee, may be the average of the differences between the pixels of class] and their original values. Thus, Offset1 is given by the following formula: 0/fret = SuinNbPtx1 Note that the offset 0/ftc!3 is an integer value. As a consequence, the ratio defined in this formula may be rounded, either to the closest value or using the ceiling or floor function.

Each offset Qffet1 is an optimal offset Ooptj in terms of distortion.

Next, the encoding process illustrated in Figure 9 is applied in order to find the best offset in terms of rate distortion criterion, offset referred to as ORDj.

In an initial step 901 of the encoding process of Figure 9, the rate distortion value Jj is initialized to the maximum possible value. Then a loop on Ci from Ooptj to 0 is applied in step 902. Note that Ci is modified by 1 at each new iteration of the loop. If Ooptj is negative, the value Di is incremented and if Coptj is positive, the value Ci is decremented. The rate distortion cost related to Cj is computed in step 903 according to the following formula: J(Cfl= SumNbPix x Ci x Ci -Sum x Ci x 2 + A R(Oj) where A is the Lagrange parameter and R(Cj) is a function which provides the number of bits needed for the code word associated with Ci.

Formula SumNbPix x Di x Di -Sum x Ci x 2' gives the improvement in terms of the distortion provided by the use of the offset Di. If J(Oj) is inferior to Jj then Jj = J(OJ) and ORDJ is equal to Oj in step 904. If Cj is equal to 0 in step 905, the loop ends and the best ORDj for the class j is selected.

This algorithm of Figures 8 and 9 provides a best ORDj for each class.

This algorithm is repeated for each of the four directions of Figure 3A. Then the direction that provides the best rate distortion cost (sum of Ji for each direction) is selected as the direction to be used for the current CTB.

This algorithm (Figures 8 and 9) for selecting the offset values at the encoder side for the Edge offset tool can be easily applied to the Band Cffset filter to select the best position (SAC_band_position) where i is in the interval [0,32] instead of the interval [1,4] in Figure 8. It involves changing the value 4 to 32 in modules 801, 810, 811. More specifically, for the 32 classes of Figure 4, the parameter Sum (j=[0,32]) is computed. This corresponds to computing for each range, the difference between the current pixel value (P) and its original value (Porg), each pixel of the image belonging to a single range i. Then the best offset in terms of rate distortion DRDj is computed for the 32 classes, with the same process as described in Figure 9.

The next step involves finding the best position of the SAC band position of Figure 4. This is determined with the encoding process set out in Figure 10. The RD cost Ji for each range has been computed with the encoding process of Figure 9 with the optimal offset CRDi in terms of rate distortion. In Figure 10, in an initial step 1001 the rate distortion value J is initialized to the maximum possible value. Then a loop on the 28 positions j of 4 consecutive classes is run in step 1002. Next, the variable Jj corresponding to the RD cost of the band (of 4 consecutive classes) is initialized to 0 in step 1003. Then the loop on the four consecutive offset j is run in step 1004. Ji is incremented by the RD costs of the four classes Jj in step 1005 (j=i to i+4).

If this cost Ji is inferior to the Best RD cost J, J is set to Ji, and sao_band_position = i in step 1007, and the next step is step 1008.

Otherwise, the next step is step 1008.

Text 1008 checks whether or not the loop on the 28 positions has ended. If not, the process continues in step 1002, otherwise the encoding process returns the best band position as being the current value of sao_band_position 1009.

Embodiments of the present invention adapt the conventional sample adaptive Band offset method as described above. In the conventional Band offset approach, the filtered pixel P'i of the filtered frame area is obtained using the following formula: P'i = Pi + Oj where Pi is the ith pixel of the current frame area and Oj is the offset of the range (or class) jto which pixel Pi belongs, among the 32 ranges of contiguous pixel values. Note that only four consecutive ranges have an offset Oj.

According to the main idea of the invention, all the pixel values Pi of the frame area that belong to one and the same range, say range j, are set equal to one and the same filtered sample value: P'i = vj for each pixel Pi belonging to the range (or class) j. Thus value vi that replaces each pixel belonging to range j is directly associated with the range or class j.

The value vj is therefore not an offset value per se. Rather it is the common value to which all pixel values in the whole range j are set. In practice, as described later, it may still be useful in the decoder to compute the common value vj by using an coded value received in the bitstream from the encoder but this is not necessary and the common value can be signaled directly. Because offsets are not required, the filtering approach of embodiments of the invention will be termed "sample adaptive filtering" (as distinct from band-type SAO filtering).

Embodiments of this particular sample adaptive filtering require some adaptations. Especially the signaling of parameters should be adapted to match the entropy which is different from the conventional SAO band offset parameters. However, except these adaptations, the remainder of the mechanisms relies on the conventional sample adaptive Band Offset method.

A first aspect of this approach to filtering relates to the value vj that has to be the same at the encoder and the decoder. Several embodiments to share this value may be contemplated as described below with reference to Figure II.

A second aspect relates to which range or ranges of pixel values are subjected to filtering in accordance with this approach, i.e. for which the pixel values are replaced by value vj. Again, this information needs to be shared with the decoder.

Several embodiments may be contemplated as described below with reference to Figure 12.

In a first embodiment regarding the first aspect, as mentioned above, the encoder transmits to the decoder a coded value vcj associated with the reconstructed frame area. This coded value vcj represents the difference between the common value vj for the range j and a representative sample value of the range. In this case, the sample adaptive filtering performed by the decoder includes decoding the value vcj associated with the reconstructed frame area from the encoded data. This makes it possible to use exactly the same process as the offset syntax encoding/decoding in HEVC: the value vcj is coded exactly as an offset of the conventional SAO band filter.

For a bit depth of 8 bits, the offset can be equal to any integer values from -7 to +7 for the band offset.

In this embodiment, the one and the same filtered pixel value vj is obtained by adding the decoded value vcj to one of the pixel values that is representative of the one and the same range. This is illustrated in Figure ha in which a current range 1109a is made of 8 pixel values (ilOla, 1102a, 1103a, 1104a, 1105a, 1106a, 1107a, 1108a). This range represents one of the 32 ranges when the pixel values are 8-bit values (i.e. bitdepth = 8).

As an example, a value in the middle of the range, i.e. value 1104a or 1105a, can be used as the representative value, denoted mj, to be added to the decoded value to retrieve vj: vj = min(max(0,mj + vcj), 2Abitdepth1) Note that in this formula vj is limited to values from 0 to 2Abitdepth1.

The value mj can be seen as the reference value on which the value vcj once decoded, will be added to compute the final vj value to be applied to pixel values belonging to range j. It may be noted that vcj can be a negative value.

More precisely, referring to Figure ha, it may be considered, as an example, that in the range referenced 1109a, the pixel value liOla is equal to 56 and the pixel value 1108a is equal to 63. In that case, the value mj value can be represented by 1104a which is equal to 59 (or by 1105a which corresponds to value 60) The mj value is fixed and known by default by both the encoder and the decoder.

There is therefore no need to transmit the value of mj in the bitstream. Now if one consider that the decoded value of vcj is equal to -4 the final value of vj is thus equal to (Or 56 if we considerllo5a for mj). Consequently in that particular example, all pixel values ranging in the interval 56 to 63 will be replaced by the pixel value 55, which is outside the range.

This first embodiment is quite interesting because the same coding can be applied to code the offsets Oj of the conventional Band offset mode and the values vj of the present invention. In that case vcj is coded with a unary code for the absolute value and an additional flag is used to signal the sign of the vcj if needed.

However, the value vj can be any integer from 0 to 2"bitdepth-l. It means that the values vj are not systematically close to the middle of the ranges considered.

On the other hand, the offset in conventional Band Offset mode was designed for values close to 0 (due to unary code) and not for an equi-probable division of values.

To avoid restricting vj around the middle of the current range j, value vj may be directly coded, instead of encoding the value vcj as defined above, meaning that the sample adaptive filtering may include decoding a value vj associated with the reconstructed frame area from the encoded data, and the one and the same filtered pixel value vj is the decoded offset value.

In this case, vj may be represented by up to bitdepth bits. This means that, for a bitdepth of 8 bits, 8x4 = 32 bits may need to be transmitted for a Band of four ranges. In terms of memory storage for the prediction of SAO parameters for each CTB, 32 bits are required to store the values, as well. This is an increased amount of bits compared to the 16 bits required for the conventional offsets of the Band offset mode in HEVC for 8-bit pixel values.

To mitigate this increase of the amount of bits, one embodiment of the invention provides that the sample adaptive filtering includes decoding a n-bit value vcj associated with the reconstructed frame area from the encoded data, n being less than a bitdepth of the pixel values. Thus, the decoded n-bit value vcj is added to one (mj) of the pixel values that is representative of the one and the same range to obtain the one and the same filtered pixel value vj.

In one embodiment, the value vj may be restricted to be a value inside the range j considered, meaning that only 3 bits are required (for 8-bit pixel values) for defining this value relatively to the first value of the range j considered. This results in substantially reducing the needs in memory storage to 3x4 ranges= 12 bits, to be opposed to the conventional 16 bits or the 32 bits above. In this embodiment, n is thus the logarithm of the number of pixel values forming the one and the same range with respect to base 2 (i.e. log2), and the representative value mj is the first pixel value forming the one and the same range: vi = vcj + mj.

In addition, this scheme for obtaining vi simplifies the SAO band value type which can simply be obtained with the following formula: P'i = vj = (Pi & 11111000)+ vcj where & is the bitwise AND operator. It compares each bit of the first operand to each bit of the second operand.

The expression Pi & 11111000' provides the first or starting pixel value of the range considered due to the belonging of Pi to this range (thus the most significant bits of Pi are representative of the range as explained above). This expression is also used to classify into one of 32 ranges each pixel of the current CTB.

In another embodiment that relies on a similar principle (n less than a bitdepth of the pixel values), n is strictly higher than the logarithm of the number of possible values forming the one and the same range with respect to base 2. In the example above (bitdepth equal to 8), it means that 3<n<8.

In this other embodiment, the value of a range j is coded (i.e. vcj) with for instance 4 bits in order to contain the current range j and neighboring pixel values adjacent to the range. An example is shown in Figure lIb in which the current rangej 1109b (in grey) allows vito take any pixel value from the pixel values in the dashed area (1119b, 1118b, 1117b, 1116b, ilOib, 1102b, 1103b, 1104b, 1105b, 1106b, 1107b, 1108b, 1112b, 1113b, 1114b, 1115b), dependingonthevalueof4-bitvcj.

This scheme offers an efficient balance in term of coding efficiency because the cost of 4 bits per range is not high and the value vj can exceed the current range j which could be useful in some cases.

For this specific case, the value vj is obtained according to the following pseudo code: if(pos == 0)//first range of the full range vi = vcj else if(pos = 31) //last range of the full range vj = (pos -1) x nbPelByRange + vci else vj = (pos x nbPelByRange) -nbPelByRange/2 + vcj where pos is the position of the current range j among the 32 ranges of the band segmentation (pos belongs to [0; 31]) and nbPelByRange is the number of pixels in the range considered.

Figure lIc illustrates the situation for pos=0 (first range [0,7]), while Figure lid illustrates the situation for pos=31 (last range [250,257]). This is because these two ranges have neighboring pixel values only on one of their two sides.

So if the first range is used, the value vj is set equal to the decoded value vcj (because the starting pixel value of the range is 0) as shown in Figure lic by the dashed area. As can be seen, the value of the range exceeds the current range only on the right part and overlaps the second range of the histogram.

In the same way, when the last range (pos = 31) of the full range is considered, the value can be considered for the last two ranges as shown in Figure lid.

Finally for the other ranges (pos = 1 to 30), the value vj is equal to the position of the starting pixel of the rangej (ilOib in Figure lib) (pos x nbPelByRange) minus half the number of pixels in a range (in order to obtain the pixel value of 1119b) plus the decoded value vcj. The pixel value (pos x nbPelByRange) -nbPelByRange/2' is half of the left range of range j.

In yet another embodiment not shown in the Figures, the pixel value of the range that has the highest number of occurrences (i.e. the most frequent value) from amongst the pixel of the reconstructed frame area is selected as the one and the same filtered sample value. This makes it possible to have no value to be transmitted, thus reducing coding costs.

To allow the use of the idea of the invention for specific ranges and not for other ranges, it is proposed to add flags in the bitstream, which flags are associated with respective ranges, to specify whether a replacement value vj as filtered pixel value is used or the conventional SAO Band Offset mode is used. To do so, the method of one embodiment of the invention also includes determining, based on at least one item of information (e.g. a flag) in the encoded data, whether or not the sample adaptive filtering is to be applied for pixel values belonging to a given range of possible values.

In some embodiments, when the item of information indicates that the operation of setting samples is to be applied, the one and the same filtered sample value for the obtained range is calculated based on said item of information. In examples, the item of information is the decoded value vi or vcj described above, In other words, among all possible values for vcj or vj, a specific value means that the value vj is not applied to the pixels of the range. Of course, the other values of vcj are used to calculate vj (or the other values of vj are used directly to set vj, as the case may be) according to any of the embodiments described above. For example, when 4-bit value is used to code vcj, if the decoded value vcj is set equal to (1111') the method is not applied for the pixels of the range currently considered.

The examples above use a bitdepth of 8 bits but a person skilled in the art will appreciate that it is possible to use other bitdepths in these various embodiments.

As far as the second aspect introduced above (which range or ranges of pixel values are concerned with this new approach according to the invention), it is recalled that for the conventional Band offset mode, a band of four consecutive ranges is signaled in the bitstream.

The inventors have observed that splitting this band into two or four signaled bands (each with fewer ranges) provide a more efficient coding for screen content video sequences, in particular due to the high contrast of the screen content.

In a more general way, a plurality of ranges (not limited to four) can be signaled, possibly using the band approach as in the conventional SAO to signal a set of contiguous ranges. It is recalled that in conventional SAO Band offset, a band contains at least one range.

Figure 12a illustrates an embodiment where two set of two contiguous ranges are signaled. This requires providing two positions syntax elements.

In the same way, Figure 12b illustrates an embodiment where four sets of a single range are coded independently. This requires providing four position syntax elements coded as the conventional SAO_band_position.

It is possible to imagine other configurations with higher numbers of ranges and sets (each set made of several contiguous ranges). For example, a value N is transmifted indicating the number of ranges (of sets) that are defined. Thus N position syntax elements are also provided in the bitstream. Note that if the considering set includes only one range, the flag or item of information to indicate that the method of the invention is not used for the range considered can be omitted. Indeed this is signaled by the number N of sets.

As noted above, the number of "position" syntax elements to be transmitted increases with the number of ranges or sets to be considered independently. However, this syntax element is encoded/decoded with 5 fixed bits is bypassed when CABAC is used. To improve coding efficiency, this number of bits may be modified by changing the coding of one or more position syntax elements.

In one embodiment the full range of pixel values may be subdivided into first and second sub-ranges, the first sub-range comprising the pixel values in the lower half of the full range and the second sub-range comprising the pixel values in the upper half of the full range. In this case, where only two sets of contiguous ranges are signaled in the encoded data, and the first set is restricted to being in the first sub-range and the second set is restricted to being in the second sub-range, the signaling of the position of each set within the sub-range concerned may be done using a m-bit identifier where m is the logarithm of the number of ranges splitting all the pixel values with respect to base 2 less one.

This saves one bit for each position syntax element. Only four bits are required for encoding a position syntax element.

This embodiment can be extended to the coding of more than two sets, for example when four set or ranges (more generally Z' sets/ranges) are positioned each in a different quarter (more generally in a different 1/2-th) of the full range of pixel values.

In another embodiment where the full range of pixel values is split into 2 sub-ranges having the same number N of pixel values (i.e. a regular splitting of the full range of pixel values) and where the plurality of bands or ranges as signaled in the encoded data are ordered according to an increasing range order among the sub-ranges, each time a set or range is identified in the last 1/2-th of the ordered ranges, the next set or range is signaled in the encoded data using a (n-fl-bit identifier, where n = log2(N).

For example by considering two sets, if the first set is in the second half of the full range, the second range is after the first one. So one bit can be saved because it is known that the second set is not in the first half of the full range of pixel values.

Note that this method does not save bit if the first set is in the first half range. This embodiment can be extended to the 1/4, the 1/8, the 1/16, and so on, of the full range in order to save more bits on the position signaling. This embodiment can be extended to the coding of more than two sets as indicated above. Note that for this extended part some bits can be saved only if the last previous sets are in the last 1/2 sub range.

In one embodiment, more than 32 ranges (as conventionally provided) may be defined, in particular ranges than may overlap. For example, the number of pixel values per range is not changed compared to conventional SAO Band Offset mode (number of pixel values per range = bitdepth/32), but additional overlapping ranges can be defined. It implies that the position syntax element is coded over more bits to enable coding more than 32 values. This embodiment improves the classification of pixels according to the method of the invention.

Of course, defining another number of ranges means also that this number may be increased or decreased using an item of information transmitted in the bitstream. This offers a better classification for a specific content. As an example, the number of pixel values per range may be decreased when the number of ranges increases and, reversely may be increased when the number of ranges decreases.

In one particular embodiment that makes it possible to omit the position syntax element, obtaining at least one range of contiguous pixel values includes determining the number of pixels belonging to each of a plurality of ranges splitting all the pixel values and selecting the ranges having the highest number of samples values.

For example, the encoder and the decoder may look for the set or N sets (or ranges) which contain the highest number of pixels. The method according to the invention of using vj as the filtered pixel value for all the pixels of the range j may thus be applied to this band or these sets or ranges.

The various embodiments to determine or transmit the value vj or vcj for each range j and the various embodiments to control the plurality of independent ranges or sets to apply such values vj or vcj have been described. Compared to the conventional SAO Band Offset mode, additional items of information may need to be transmitted. This implies that signaling is slightly modified for some embodiments. This is now described with reference to Figure 13.

Figure 13 is a flowchart illustrating steps of decoding the signaling of the proposed band value method. This Figure is based on Figure 6 already described above, where the bitstream (601) has been omitted to clarify the Figure.

Steps 1302 to 1310 are the same as 602 to 610.

After step 1302 is step 1311 similar to step 1303.

A difference occurs when the SAO_type is set equal to 1 in step 1311. A Band_type flag is extracted from the bitstream to check whether the current CTB is Band Offset type (i.e. to be processed according to conventional SAO Band Offset mode Band Type = 0) or Band Value type/sample adaptive filtering (i.e. to be processed according to embodiments of the invention using vj -Band Type = 1). This is step 1312. This step implies that the encoder provides this Band_type flag in the bitstream for each CTB.

An alternative to band type coding is to add the band value type as value of the variable sao_type_idx_X.

Below is described only the case where the Band Type = 1, i.e. the Band Value type. In that case, four 4-bit values vcj are extracted from the bitstream in steps 1313 and 1314. Then four (in this example, but more can be used) position syntax elements are extracted if the proposed method uses four (or more) separate sets or ranges as suggested above. This is steps 1315 and 1316.

Having the four vcj (and thus the four corresponding vj) and the four positions, the sample adaptive filtering according to the invention can be performed on the pixels of the current CTB, for example in a step similar to step 711.

Note that the proposed method does not affect the CTB mode signaling of conventional SAO as illustrated in Figure 5. Only new items of information, namely the Band_type and the specific values vcj, are added to the SAO parameters.

In some embodiments, the sample adaptive filtering approach of the invention may replace the conventional Band Offset method.

In some embodiments, the sample adaptive filtering method is turn on or off at sequence and/or frame and/or slice level. When it is turn off, the Band_type is not transmitted and the method can be processed.

Turning now to which values vi must be used, reference is made to Figure 14 which is a flowchart illustrating steps for selecting parameters of the Band Value approach. These steps are performed at the encoder.

First the encoder sets the variable HistOrg[j][k] to 0 in step 1401. This variable is designed to store the histogram of the original pixels of the current frame area according to each range j. So j is between 0 and 31 (for 32 ranges splitting the full range of pixel values) and k is between 0 and 255 for 8-bit pixel values. So, this variable HistOrg contains for each range j of pixels to be filtered, the occurrence of each original pixel k.

A variable i, used to successively consider each pixel Pi of the CTB is set equal to 0, in step 1402. Next, the CTB of the component X is set in memory with its original pixels and the cardinal N of pixels in a CTB is also determined in step 1403.

Each pixel Pi of the current CTB is extracted in step 1404 and classified in step 1405 in order to know the range j associated with the pixel value.

The histogram HistOrg is updated based on this value j and the value of the original pixel P°: HistOrg[j][P°'] is incremented in step 1407.

Then the variable i is incremented in step 1408 to consider the next pixel through the loop going back to step 1404.

When / reaches N, all the pixels of the current CTB have been processed and the loop ends.

At the end of the loop, the variable HistOrg contains all the pixel values sorted in associated range j. The value vj associated with each range j is then computed in step 1410. For example, vj equals the average value of occurrences k of the HistOrg[j]: (HisiOrg[ j][k] * k) vj= k=Oto2SS HistOrg[j][k] ?. =Utu2SS So it is the average value of the original pixel associated to the range j.

Note that if this computed value does not belong to the interval to which it must belong (see for instance the examples of Figures lla-d), the encoder can change the value, for instance by force value vj to take the closest value of said interval to the computed value.

Then a distortion value Dist[j] associated with each range j is computed in step 1411.

First the variable Dist[j] is set to 0 for all j. The distortion for each range j is set equal to the sum of the distortion associated with each pixel value of the histogram belonging to the range j. The distorsion for a pixel value i is equal to the square difference between computed (or forced) vj and HistOrg[j][i]. So the distorsion is obtained by the following pseudo code: For all j=0 to 31 Dist[j] = 0 For all i=0 to 255 DistU] = DistU] +HistOrg[j][i] *(vj -Historg[j][i])2 At the end of the process of Figure 14, the encoder has the values vj for all the ranges j (0 to 31) and the associated distortion values Dist[j].

In a similar way as described above for the conventional SAO Band Offset mode (for instance Figure 10), the encoder then determines the best range or ranges (and thus band or bands) using a rate distortion approach.

The encoder needs to compute the reduction in terms of distortion obtained by the filtering in order to select the best SAO type. This can be easily obtained because it consists in substracting the distortion before filtering the current range j which is known in the classical band offset method to DistU] Figure 15 schematically illustrates a data communication system in which one or more embodiments of the invention may be implemented. The data communication system comprises a transmission device, in this case a server 1501, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 1502, via a data communication network 1500. The data communication network 1500 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.lla or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks. In a particular embodiment of the invention the data communication system may be a digital television broadcast system in which the server 1501 sends the same data content to multiple clients.

The data stream 1504 provided by the server 1501 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 1501 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 1501 or received by the server 1501 from another data provider, or generated at the server 1501. The server 1501 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.

In order to obtain a better ratio of the quality of transmitted data to quantity of transmitted data, the compression of the video data may be for example in accordance with the HEVC format and the present invention.

The client 1502 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker.

Although a streaming scenario is considered in the example of Figure 15, it will be appreciated that in some embodiments of the invention the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.

In one or more embodiments of the invention a video image is transmitted with data signaling SAO Band Value mode of the invention for application to reconstructed pixels of the image to provide filtered pixels in a final image.

Figure 16 schematically illustrates a processing device 1600 configured to implement at least one embodiment of the present invention. The processing device 1600 may be a device such as a micro-computer, a workstation or a light portable device. The device 1600 comprises a communication bus 1613 connected to: -a central processing unit 1611, such as a microprocessor, denoted CPU; -a read only memory 1607, denoted ROM, for storing computer programs for implementing the invention; -a random access memory 1612, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing a method of encoding a sequence of digital images and/or a method of decoding a bitstream according to embodiments of the invention; and -a communication interface 1602 connected to a communication network 1603 over which digital data to be processed are transmitted or received.

Optionally, the apparatus 1600 may also include the following components: -a data storage means 1604 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention; -a disk drive 1605 for a disk 1606, the disk drive being adapted to read data from the disk 1606 or to write data onto said disk; -a screen 1609 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 1610 or any other pointing means.

The apparatus 1600 can be connected to various peripherals, such as for example a digital camera 1620 or a microphone 1608, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 1600.

The communication bus provides communication and interoperability between the various elements included in the apparatus 1600 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 1600 directly or by means of another element of the apparatus 1600.

The disk 1606 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables a method of encoding a sequence of digital images and/or a method of decoding a bitstream according to the invention to be implemented.

The executable code may be stored either in read only memory 1607, on the hard disk 1604 or on a removable digital medium such as for example a disk 1606 as described previously. According to a variant, the executable code of the programs can be received by means of the communication network 1603, via the interface 1602, in order to be stored in one of the storage means of the apparatus 1600 before being executed, such as the hard disk 1604.

The central processing unit 1611 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 1604 or in the read only memory 1607, are transferred into the random access memory 1612, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.

In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).

Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

For example, while the previous embodiments have been described in relation to pixels of an image and their corresponding pixel values, it will be appreciated that within the context of the invention a group of pixels may be considered together with a coriesponding group pixel value. A sample may thus correspond to one or more pixels of an image.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method of processing a set of reconstructed samples of a frame, the reconstructed samples having respective sample values, the method comprising: obtaining at least one range of sample values forming a subpart of a full range of all the possible sample values, and setting all reconstructed samples whose sample values are within such an obtained range to one and the same filtered sample value.
2. The method of Claim 1, wherein the sample values in a range are contiguous sample values.
3. The method of Claim 1, further including obtaining a value associated with the frame, and the one and the same filtered sample value is obtained by adding the obtained value to one of the sample values that is representative of the obtained range.
4. The method of Claim 3, wherein the representative value is one of the sample values that corresponds to the middle of the obtained range.
5. The method of Claim 3, wherein the representative value corresponds to a value of a sample value outside the obtained range.
6. The method of Claim 1, further including obtaining a value associated with the frame, and the one and the same filtered sample value is the obtained value.
7. The method of Claim 1, further including obtaining a n-bit value associated with the frame, n being less than a bitdepth of the sample values, and adding the decoded n-bit value to one of the sample values that is representative of the obtained range, to obtain the one and the same filtered sample value.
8. The method of Claim 1, wherein the one and the same filtered sample value has a fixed length.
9. The method of Claim 1, wherein the sample value of the obtained range that has the highest number of occurrences in the reconstructed samples of the frame is selected as the one and the same filtered sample value.
10. The method of Claim 1, further comprising determining, based on at least one item of information, whether or not the operation of setting samples to one and the same filtered sample value is to be applied to reconstructed samples whose samples values belong to a given range of sample values.
11. The method of Claim 1, wherein when the item of information indicates that the operation of setting samples is to be applied, the one and the same filtered sample value for the obtained range is calculated based on said item of information.
12. The method of Claim 1, further comprising determining the number of sample values belonging to each of a plurality of ranges splitting the full range of the possible sample values, and obtaining at least one range of sample values includes selecting the range or ranges having the highest number of samples values.
13. The method of Claim 1, comprising: obtaining a second set of reconstructed samples of the frame; obtaining, for the second set of reconstructed samples, at least one second range of sample values, forming part of the full range of the possible sample values; and setting all reconstructed samples of the other set, whose sample values are within such obtained second range to one and the same second filtered sample value, wherein the ranges obtained for the two set of reconstructed samples and/or the two filtered sample values are different.
14. A method of decoding a frame comprising: receiving encoded data comprising encoded sample values; decoding the encoded sample values to provide a set of reconstructed samples; and processing the set of reconstructed samples using the method of anyone of Claims ito 13
15. The method of Claim 14, wherein the or each obtained range is read from the encoded data received from an encoder.
16. The method of Claim 14, wherein the or each range is obtained based on an analysis of the reconstructed samples.
17. The method of Claim 14, wherein the one and the same filtered sample value for the obtained range is obtained from the encoded data.
18. A method of encoding a frame comprising: encoding a set of samples of the frame; decoding the encoded samples to provide a set of reconstructed samples, the reconstructed samples having respective sample values; processing the set of reconstructed samples using the method as described above; and transmitting encoded data comprising encoded sample values of the encoded samples.
19. The method of Claim 18, wherein transmitting encoded data comprises transmitting, to a decoder, information from which the decoder can obtain the or each range.
20. The method of Claim 18, wherein the or each range is selected using a rate-distortion criterion.
21. The method of Claim 18, wherein a plurality of sets including contiguous ranges of sample values or a plurality of ranges of sample values for which respective operation of setting samples to one and the same filtered sample value are applied, are signaled in the encoded data.
22. The method of Claim 21, wherein only two sets of contiguous ranges are signaled in the encoded data using a m-bit identifier, the ranges of a first set belonging to the first half of ranges among all the ranges splitting the full range of the possible sample values and the ranges of the second set belonging to the second half.
23. The method of Claim 21, wherein the full range of the possible sample values is split into 2 ranges having the same number N of sample values; and the plurality of ranges or of sets of contiguous ranges as signaled in the encoded data are ordered according to an increasing range order among all the ranges splitting the full range of the possible sample values; and each time a range or a set of contiguous ranges is identified in the last 1/2-th of the ordered ranges, the next set or range is signaled in the encoded data using a (n-fl-bit identifier, where n = log2(N).
24. A device for processing a set of reconstructed samples of a frame, the reconstructed samples having respective sample values, the device comprising: a range determining module configured for obtaining at least one range of sample values forming a subpart of a full range of all the possible sample values, and a filtering module configured for setting all reconstructed samples whose sample values are within such an obtained range to one and the same filtered sample value.
25. A non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in a device, causes the device to perform the steps of any of Claims ito 22.