WO2019233998A1 - Video coding and decoding - Google Patents

Video coding and decoding Download PDF

Info

Publication number
WO2019233998A1
WO2019233998A1 PCT/EP2019/064455 EP2019064455W WO2019233998A1 WO 2019233998 A1 WO2019233998 A1 WO 2019233998A1 EP 2019064455 W EP2019064455 W EP 2019064455W WO 2019233998 A1 WO2019233998 A1 WO 2019233998A1
Authority
WO
WIPO (PCT)
Prior art keywords
sao
temporal
image
derivation
ctu
Prior art date
Application number
PCT/EP2019/064455
Other languages
French (fr)
Inventor
Guillaume Laroche
Patrice Onno
Christophe Gisquet
Jonathan Taquet
Original Assignee
Canon Kabushiki Kaisha
Canon Europe Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Kabushiki Kaisha, Canon Europe Limited filed Critical Canon Kabushiki Kaisha
Publication of WO2019233998A1 publication Critical patent/WO2019233998A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Definitions

  • the present invention relates to video coding and decoding.
  • VVC Versatile Video Coding
  • the goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020.
  • the main target applications and services include— but not limited to— 360-degree and high-dynamic-range (HDR) videos.
  • HDR high-dynamic-range
  • JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs.
  • Some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
  • UHD ultra-high definition
  • JEM JVET exploration model
  • SAO sample adaptive offset
  • US 9769450 discloses an SAO filter for three dimensional or 3D Video Coding or 3DVC such as implemented by the HEVC standard.
  • the filter directly re-uses SAO filter parameters of an independent view or a coded dependent view to encode another dependent view, or re-uses only part of the SAO filter parameters of the independent view or a coded dependent view to encode another dependent view.
  • the SAO parameters are re-used by copying them from the independent view or coded dependent view.
  • US 2014/0192860 Al relates to the scalable extension of HEVC.
  • HEVC scalable extension aims at allowing coding/decoding of a video having multiple scalability layers, each layer being made up of a series of frames. Coding efficiency is improved by inferring, or deriving, SAO parameters to be used at an upper layer (e.g. an enhancement layer) from the SAO parameters actually used at a lower (e.g. base) layer. This is because inferring some SAO parameters makes it possible to avoid transmitting them.
  • SAO parameters to be used at an upper layer (e.g. an enhancement layer) from the SAO parameters actually used at a lower (e.g. base) layer. This is because inferring some SAO parameters makes it possible to avoid transmitting them.
  • a device for performing sample adaptive offset (SAO) filtering as defined in claim 25.
  • an encoder as defined by claim 26.
  • a decoder as defined by claim 27.
  • the program may be provided on its own or may be carried on, by or in a carrier medium.
  • the carrier medium may be non-transitory, for example a storage medium, in particular a computer-readable storage medium.
  • the carrier medium may also be transitory, for example a signal or other transmission medium.
  • the signal may be transmitted via any suitable network, including the Internet.
  • Such a signal may be in transitory form or in non-transitory form.
  • the signal may be stored in a media storage device such as a Blu-ray disk.
  • the signal may then be converted from non-transitory form to transitory form by reproducing it from the media storage device.
  • Figure 1 is a diagram for use in explaining a coding structure used in HEVC
  • Figure 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the invention may be implemented;
  • Figure 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented;
  • Figure 4 is a flow chart illustrating steps of an encoding method according to embodiments of the invention;
  • Figure 5 is a flow chart illustrating steps of a loop filtering process of in accordance with one or more embodiments of the invention.
  • Figure 6 is a flow chart illustrating steps of a decoding method according to embodiments of the invention.
  • Figure 7 A and 7B are diagrams for use in explaining edge-type SAO filtering in HEVC
  • Figure 8 is a diagram for use in explaining band-type SAO filtering in HEVC.
  • Figure 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications
  • Figure 10 is a flow chart illustrating in more detail one of the steps of the Figure 9 process
  • Figure 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications
  • Figure 12 is a schematic view for use in explaining a temporal derivation of SAO parameters in a first embodiment of the present invention
  • Figure 13 is a flow chart for use in explaining a method of decoding an image in the first embodiment
  • Figure 14 is a flow chart for use in explaining a method of decoding an image in a third embodiment of the present invention.
  • Figure 15 is a flow chart for use in explaining a method of decoding an image in a sixth embodiment of the present invention
  • Figure 16 is a flow chart illustrating a process to build a list of reference frames for SAO temporal derivation in a seventh embodiment of the present invention
  • Figure 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in a CTU-level non-temporal derivation of SAO parameters in an eighth embodiment of the present invention
  • Figure 18 shows one of the steps of Figure 17 in more detail
  • Figure 19 shows another one of the steps of Figure 17 in more detail
  • Figure 20 shows yet another one of the steps of Figure 17 in more detail
  • Figure 21 is a flow chart for use in explaining how to evaluate a cost of a temporal derivation in the eighth embodiment
  • Figure 22 is a flow chart for use in explaining how to compare the costs of the temporal derivation and a further, non-temporal derivation, in the eighth embodiment;
  • Figure 23 shows various different groupings of CTUs in a slice;
  • Figure 24 is a diagram showing image parts of a frame in a non-temporal derivation of SAO parameters in which a first method of sharing SAO parameters is used;
  • Figure 25 is a flowchart of an example of a process for setting SAO parameters in the non-temporal derivation of Figure 24;
  • Figure 26 is a flowchart of an example of a process for setting of SAO parameters in another non-temporal derivation using the first sharing method to share SAO parameters among a column of CTUs;
  • Figure 27 is a flowchart of an example of a process for setting of SAO parameters in yet another non-temporal derivation using the first sharing method to share SAO parameters among a group of NxN CTUs;
  • Figure 28 is a diagram showing image parts of one NxN group in the non-temporal derivation of Figure 27;
  • Figure 29 illustrates an example of how to select the SAO parameter derivation in an eleventh embodiment of the present invention;
  • Figure 30 is a flow chart illustrating a decoding process suitable for a second method of sharing SAO parameters among image parts of a group
  • Figure 31 is a diagram showing image parts of multiple 2x2 groups in a sixteenth embodiment of the present invention.
  • Figure 32 is a schematic view for use in explaining a process of deriving SAO parameters in a temporal rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention
  • Figure 33 is a schematic view of the temporal rotation derivation of Figure 32;
  • Figure 34 is a schematic view for use in explaining a process of deriving SAO parameters in which different temporal derivations are available;
  • Figure 35 is a flowchart for use in explaining a decoding process in a twenty-fifth embodiment of the present invention.
  • Figure 36 is a schematic view for use in explaining a process of deriving SAO parameters in a spatial rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention
  • Figure 37 is a flowchart for use in explaining a decoding process in the twenty-seventh embodiment
  • Figure 38 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments of the present invention.
  • Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard.
  • a video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
  • HEVC High Efficiency Video Coding
  • An image 2 of the sequence may be divided into slices 3.
  • a slice may in some instances constitute an entire image.
  • These slices are divided into non-overlapping Coding Tree Units (CTUs).
  • a Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards.
  • a CTU is also sometimes referred to as a Largest Coding Unit (LCU).
  • LCU Largest Coding Unit
  • a CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
  • CTB Coding Tree Block
  • a CTU is generally of size 64 pixels x 64 pixels.
  • Each CTU may in turn be iteratively divided into smaller variable-size Coding Units (CUs) 5 using a quadtree decomposition.
  • CUs variable-size Coding Units
  • Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU).
  • the maximum size of a PU or TU is equal to the CU size.
  • a Prediction Unit corresponds to the partition of the CU for prediction of pixels values.
  • Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs.
  • a Transform Unit is an elementary unit that is subjected to spatial transformation using DCT.
  • a CU can be partitioned into TUs based on a quadtree representation 607.
  • NAL Network Abstraction Layer
  • coding parameters of the video sequence are stored in dedicated NAL units called parameter sets.
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream.
  • the VPS is a new type of parameter set defined in HEVC, and applies to all of the layers of a bitstream.
  • a layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer.
  • HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
  • Figure 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented.
  • the data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200.
  • the data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.1 la or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks.
  • the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
  • the data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201.
  • the server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
  • the compression of the video data may be for example in accordance with the HEVC format or H.264/AVC format.
  • the client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker.
  • the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
  • a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
  • FIG. 3 schematically illustrates a processing device 300 configured to implement at least one embodiment of the present invention.
  • the processing device 300 may be a device such as a micro-computer, a workstation or a light portable device.
  • the device 300 comprises a communication bus 313 connected to: -a central processing unit 311, such as a microprocessor, denoted CPU;
  • ROM read only memory
  • RAM random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention;
  • the apparatus 300 may also include the following components:
  • -a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention;
  • the disk drive being adapted to read data from the disk 306 or to write data onto said disk;
  • -a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means.
  • the apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
  • peripherals such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
  • the communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it.
  • the representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
  • the disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
  • CD-ROM compact disk
  • ZIP disk or a memory card
  • the executable code may be stored either in read only memory 306, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously.
  • the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304.
  • the central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means.
  • the program or programs that are stored in a non-volatile memory for example on the hard disk 304 or in the read only memory 306, are transferred into the random access memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
  • the apparatus is a programmable apparatus which uses software to implement the invention.
  • the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
  • Figure 4 illustrates a block diagram of an encoder according to at least one embodiment of the invention.
  • the encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
  • An original sequence of digital images / ⁇ to m 401 is received as an input by the encoder
  • Each digital image is represented by a set of samples, known as pixels.
  • a bitstream 410 is output by the encoder 400 after implementation of the encoding process.
  • the bitstream 410 comprises a plurality of encoding units or slices, each slice comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
  • the input digital images / ⁇ to m 401 are divided into blocks of pixels by module 402.
  • the blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered).
  • a coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
  • Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
  • Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405.
  • a reference image from among a set of reference images 416 is selected, and a portion of the reference image, also called reference area or image portion, which is the closest area to the given block to be encoded, is selected by the motion estimation module 404.
  • Motion compensation module 405 then predicts the block to be encoded using the selected area.
  • the difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405.
  • the selected reference area is indicated by a motion vector.
  • a residual is computed by subtracting the prediction from the original block.
  • a prediction direction is encoded.
  • at least one motion vector is encoded.
  • Motion vector predictors of a set of motion information predictors is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417.
  • the encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion.
  • an encoding cost criterion such as a rate-distortion criterion.
  • a transform such as DCT
  • the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409.
  • the encoded residual block of the current block being encoded is inserted into the bitstream 410.
  • the encoder 400 also performs decoding of the encoded image in order to produce a reference image for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames.
  • the inverse quantization module 411 performs inverse quantization of the quantized data, followed by an inverse transform by reverse transform module 412.
  • the reverse intra prediction module 413 uses the prediction information to determine which predictor to use for a given block and the reverse motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images 416.
  • Post filtering is then applied by module 415 to filter the reconstructed frame of pixels.
  • an SAO loop filter is used in which compensation offsets are added to the pixel values of the reconstructed pixels of the reconstructed image
  • Figure 5 is a flow chart illustrating steps of loop filtering process according to at least one embodiment of the invention.
  • the encoder generates the reconstruction of the full frame.
  • a deblocking filter is applied on this first reconstruction in order to generate a deblocked reconstruction 53.
  • the aim of the deblocking filter is to remove block artifacts generated by residual quantization and block motion compensation or block Intra prediction. These artifacts are visually important at low bitrates.
  • the deblocking filter operates to smooth the block boundaries according to the characteristics of two neighboring blocks. The encoding mode of each block, the quantization parameters used for the residual coding, and the neighboring pixel differences in the boundary are taken into account.
  • the deblocking filter improves the visual quality of the current frame by removing blocking artifacts and it also improves the motion estimation and motion compensation for subsequent frames. Indeed, high frequencies of the block artifact are removed, and so these high frequencies do not need to be compensated for with the texture residual of the following frames.
  • the deblocked reconstruction is filtered by a sample adaptive offset (SAO) loop filter in step 54 using SAO parameters determined in accordance with embodiments of the invention.
  • the resulting frame 55 may then be filtered with an adaptive loop filter (ALF) in step 56 to generate the reconstructed frame 57 which will be displayed and used as a reference frame for the following Inter frames.
  • SAO sample adaptive offset
  • ALF adaptive loop filter
  • step 54 each pixel of the frame region is classified into a class or group.
  • the same offset value is added to every pixel value which belongs to a certain class or group.
  • FIG. 6 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention.
  • the decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
  • the decoder 60 receives a bitstream 61 comprising encoding units, each one being composed of a header containing information on encoding parameters and a body containing the encoded video data.
  • the encoded video data is entropy encoded, and the motion vector predictors’ indexes are encoded, for a given block, on a predetermined number of bits.
  • the received encoded video data is entropy decoded by module 62.
  • the residual data are then dequantized by module 63 and then a reverse transform is applied by module 64 to obtain pixel values.
  • the mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks of image data.
  • an INTRA predictor is determined by intra reverse prediction module 65 based on the intra prediction mode specified in the bitstream.
  • the motion prediction information is extracted from the bitstream so as to find the reference area used by the encoder.
  • the motion prediction information is composed of the reference frame index and the motion vector residual.
  • the motion vector predictor is added to the motion vector residual in order to obtain the motion vector by motion vector decoding module 70.
  • Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor, for the current block has been obtained the actual value of the motion vector associated with the current block can be decoded and used to apply reverse motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to apply the reverse motion compensation 66. The motion vector field data 71 is updated with the decoded motion vector in order to be used for the inverse prediction of subsequent decoded motion vectors.
  • Post filtering is applied by post filtering module 67 similarly to post filtering module 815 applied at the encoder as described with reference to
  • a decoded video signal 69 is finally provided by the decoder 60.
  • SAO filtering is to improve the quality of the reconstructed frame by sending additional data in the bitstream in contrast to the deblocking filter where no information is transmitted.
  • each pixel is classified into a predetermined class or group and the same offset value is added to every pixel sample of the same class/group.
  • One offset is encoded in the bitstream for each class.
  • SAO loop filtering has two SAO types: an Edge Offset (EO) type and a Band Offset (BO) type.
  • EO Edge Offset
  • BO Band Offset
  • An example of Edge Offset type is schematically illustrated in Figures 7A and 7B
  • an example of Band Offset type is schematically illustrated in Figure 8.
  • SAO filtering is applied CTU by CTU.
  • the parameters needed to perform the SAO filtering are selected for each CTU at the encoder side and the necessary parameters are decoded and/or derived for each CTU at the decoder side.
  • This offers the possibility of easily encoding and decoding the video sequence by processing each CTU at once without introducing delays in the processing of the whole frame.
  • SAO filtering is enabled, only one SAO type is used: either the Edge Offset type filter or the Band Offset type filter according to the related parameters transmitted in the bitstream for each classification.
  • One of the SAO parameters in HEVC is an SAO type parameter saojypejdx which indicates for the CTU whether EO type, BO type or no SAO filtering is selected for the CTU concerned.
  • the SAO parameters for a given CTU can be copied from the upper or left CTU, for example, instead of transmitting all the SAO data.
  • One of the SAO parameters in HEVC is a sao_merge_up flag, which when set indicates that the SAO parameters for the subject CTU should be copied from the upper CTU.
  • Another of the SAO parameters in HEVC is a saojnergejeft flag, which when set indicates that the SAO parameters for the subject CTU should be copied from the left CTU.
  • SAO filtering may be applied independently for different color components (e.g. YUV) of the frame.
  • one set of SAO parameters may be provided for the luma component Y and another set of SAO parameters may be provided for both chroma components U and V in common.
  • one or more SAO parameters may be used as common filtering parameters for two or more color components, while other SAO parameters are dedicated (per-component) filtering parameters for the color components.
  • the SAO type parameter saojypejdx is common to U and V, and so is a EO class parameter which indicates a class for EO filtering (see below), whereas a BO class parameter which indicates a group of classes for BO filtering has dedicated (per-component) SAO parameters for U and V.
  • Edge Offset type involves determining an edge index for each pixel by comparing its pixel value to the values of two neighboring pixels. Moreover, these two neighboring pixels depend on a parameter which indicates the direction of these two neighboring pixels with respect to the current pixel. These directions are the 0-degree (horizontal direction), 45-degree (diagonal direction), 90-degree (vertical direction) and 135-degree (second diagonal direction). These four directions are schematically illustrated in Figure 7A.
  • the table of Figure 7B gives the offset value to be applied to the pixel value of a particular pixel“C” according to the value of the two neighboring pixels Cnl and Cn2 at the decoder side.
  • the offset to be added to the pixel value of the pixel C is“+ 01”.
  • the offset to be added to this pixel sample value is“+ 02”.
  • the offset to be applied to this pixel sample is“- 03”.
  • the value of C is greater than the two values of Cnl or Cn2, the offset to be applied to this pixel sample is“- 04”.
  • each offset (01, 02, 03, 04) is encoded in the bitstream.
  • the sign to be applied to each offset depends on the edge index (or the Edge Index in the HEVC specifications) to which the current pixel belongs. According to the table represented in Figure 7B, for Edge Index 0 and for Edge Index 1 (01, 02) a positive offset is applied. For Edge Index 3 and Edge Index 4 (03, 04), a negative offset is applied to the current pixel.
  • the direction for the Edge Offset amongst the four directions of Figure 7A is specified in the bitstream by a sao_eo_class_luma” field for the luma component and a“sao_eo_class_chroma” field for both chroma components U and V.
  • the SAO Edge Index corresponding to the index value is obtained by the following formula:
  • Edgelndex sign (C - Cn2) - sign (Cnl- C) +2
  • the difference between the pixel value of C and the pixel value of both its neighboring pixels Cnl and Cn2 can be shared for current pixel C and its neighbors.
  • the term sign (Cnl- C) has already computed for the previous pixels (to be precise it was computed as C’-Cn2’ at a time when the current pixel C’ at that time was the present neighboring pixel Cnl and the neighboring pixel Cn2’ was what is now the current pixel C).
  • this sign (c n l- c) does not need to be computed again.
  • Band Offset type in SAO also depends on the pixel value of the sample to be processed.
  • a class in SAO Band offset is defined as a range of pixel values. Conventionally, for all pixels within a range, the same offset is added to the pixel value. In the HEVC specifications, the number of offsets for the Band Offset filter is four for each reconstructed block or frame area of pixels (CTU), as schematically illustrated in Figure 8.
  • SAO Band offset splits the full range of pixel values into 32 ranges of the same size. These 32 ranges are the bands (or classes) of SAO Band offset.
  • Classifying the pixels into 32 ranges of the full interval includes 5 bits checking needed to classify the pixels values for fast implementation i.e. only the 5 first bits (5 most significant bits) are checked to classify a pixel into one of the 32 classes/ ranges of the full range.
  • each band or class contains 8 pixel values.
  • a group 40 of bands represented by the grey area (40), is used, the group having four successive bands 41, 42, 43 and 44, and information is signaled in the bitstream to identify the position of the group, for example the position of the first of the 4 bands.
  • the syntax element representative of this position is the “ sao_band jpositiorT’ field in the HEVC specifications. This corresponds to the start of band
  • FIG. 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications.
  • the process of Figure 9 is applied for each CTU to generate a set of SAO parameters for all components.
  • a predictive scheme is used for the CTU mode. This predictive mode involves checking if the CTU on the left of the current CTU uses the same SAO parameters (this is specified in the bitstream through a flag named “ saojnergejeft Jlag”). If not, a second check is performed with the CTU above the current CTU (this is specified in the bitstream through a flag named“ sao_merge_up Jlag”). This predictive technique enables the amount of data representing the SAO parameters for the CTU mode to be reduced. Steps of the process are set out below.
  • step 503 the“ saojnergejeft Jlag” is read from the bitstream 502 and decoded. If its value is true, then the process proceeds to step 504 where the SAO parameters of left CTU are copied for the current CTU. This enables the types for YUV of the SAO filter for the current CTU to be determined in step 508.
  • step 503 If the outcome is negative in step 503 then the“ saojnergejup Jlag” is read from the bitstream and decoded. If its value is true, then the process proceeds to step 505 where the SAO parameters of the above CTU are copied for the current CTU. This enables the types of the SAO filter for the current CTU to be determined in step 508.
  • step 505 If the outcome is negative in step 505, then the SAO parameters for the current CTU are read and decoded from the bitstream in step 507 for the Luma Y component and both U and V components (501) (551) for the type.
  • the offsets for Chroma are independent.
  • step 508 the parameters are obtained and the type of SAO filter is determined in step 508.
  • step 511 a check is performed to determine if the three colour components (Y and U & V) for the current CTU have been processed. If the outcome is positive, the determination of the SAO parameters for the three components is complete and the next
  • CTU can be processed in step 510. Otherwise, (Only Y was processed) U and V are processed together and the process restarts from initial step 512 previously described.
  • Figure 10 is a flow chart illustrating steps of a process of parsing of SAO parameters in the bitstream 601 at the decoder side.
  • the”saoJypeJdxJC’ syntax element is read and decoded.
  • the code word representing this syntax element can use a fixed length code or could use any method of arithmetic coding.
  • the syntax element saoJypeJdx_X enables determination of the type of SAO applied for the frame area to be processed for the colour component Y or for both Chroma components U & V. For example, for a YUV 4:2:0 sequence, two components are considered: one for Y, and one for U and V.
  • the ‘sao_typeJdx ’ can take 3 values as follows depending on the SAO type encoded in the bitstream.‘O’ corresponds to no SAO,‘U corresponds to the Band Offset case illustrated in Figure 8 and‘2’ corresponds to the Edge Offset type filter illustrated in Figures 3 A and 3B.
  • YUV color components are used in HE VC (sometimes called Y, Cr and Cb components), it will be appreciated that in other video coding schemes other color components may be used, for example RGB color components.
  • the techniques of the present invention are not limited to use with YUV color components. and can be used with RGB color components or any other color components.
  • a test is performed to determine if the“sao_type_idx_X” is strictly positive.
  • W‘sao ypeJdx_X” is equal to“0” signifying that there is no SAO for this frame area (CTU) for Y if X is set equal to Y and that there is no SAO for this frame area for U and V if X is set equal to U and V.
  • the determination of the SAO parameters is complete and the process proceeds to step 608. Otherwise if the“ sao ypejdx” is strictly positive, this signifies that SAO parameters exist for this CTU in the bitstream.
  • step 606 a loop is performed for four iterations.
  • step 607 the absolute value of offset j is read and decoded from the bitstream.
  • These four offsets correspond either to the four absolute values of the offsets (01, 02, 03, 04) of the four Edge indexes of SAO Edge Offset (see Figure 7B) or to the four absolute values of the offsets related to the four ranges of the SAO band Offset (see Figure 8).
  • MAX abs SAO offset value (1 « (Min(bitDepth, l0)-5))-l
  • « is the left (bit) shift operator.
  • This formula means that the maximum absolute value of an offset is 7 for a pixel value bitdepth of 8 bits, and 31 for a pixel value bitdepth of 10 bits and beyond.
  • the current HE VC standard amendment addressing extended bitdepth video sequences provides similar formula for a pixel value having a bitdepth of 12 bits and beyond.
  • the absolute value decoded may be a quantized value which is dequantized before it is applied to pixel values at the decoder for SAO filtering. An indication of use or not of this quantification is transmitted in the slice header.
  • the sign is signaled in the bitstream as a second part of the offset if the absolute value of the offset is not equal to 0.
  • the bit of the sign is bypassed when CAB AC is used.
  • the read syntax element is“sao eo class luma” and if X is set equal to U and V, the read syntax element is“sao eo class chroma”.
  • FIG 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications, for example during the step 67 in Figure 6.
  • this image part is a CTU.
  • This same process 700 is also applied in the decoding loop (step 415 in Figure 4) at the encoder in order to produce the reference frames used for the motion estimation and compensation of the following frames.
  • This process is related to the SAO filtering for one color component (thus suffix“_X” in the syntax elements has been omitted below).
  • An initial step 701 comprises determining the SAO filtering parameters according to processes depicted in Figures 9 and 10.
  • the SAO filtering parameters are determined by the encoder and the encoded SAO parameters are included in the bitstream. Accordingly, on the decoder side in step 701 the decoder reads and decodes the parameters from the bitstream.
  • Step 701 obtains the saojypejdx and if it equals 1 also obtains the sao_band josition 702 and if it equals 2 also obtains the sao o lass Junta or sao_eo_class_chroma (according to the color component processed). If the element saojypejdx is equal to 0 the SAO filtering is not applied.
  • Step 701 obtains also an offsets table 703 of the 4 offsets.
  • a variable i used to successively consider each pixel Pi of the current block or frame area (CTU), is set to 0 in step 704.
  • “frame area” and“image area” are used interchangeably in the present specification.
  • a frame area in this example is a CTU in the p
  • step 706 pixel ' is extracted from the frame area 705 which contains N p
  • This pixel ' is classified in step 707 according to the Edge offset classification described with reference to Figures 7A & 7B or Band offset classification as described with reference to Figure 8.
  • the decision module 708 tests if r ‘ is in a class that is to be filtered using the conventional SAO filtering.
  • value J is extracted in step 710 from the offsets table 703.
  • step 713 the variable i is incremented in step 714 in order to filter the subsequent pixels of the current frame area 705 (if any - test 715).
  • step 715 the filtered frame area 716 is reconstructed and can be added to the SAO reconstructed frame (see frame 68 of Figure 6 or 416 of Figure 4).
  • JEM JVET exploration model
  • SAO sample adaptive offset
  • Embodiments of the present invention described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in a current image from one or more SAO parameters of a collocated image part in a reference image. These techniques may be referred to as temporal derivation techniques for SAO parameters. Further embodiments described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in an image from one or more SAO parameters of another image part of the same image. These techniques may be referred to as spatial derivation techniques for SAO parameters.
  • a first group of embodiments focusses on improving the signalling efficiency.
  • SAO filtering is performed CTU by CTU.
  • Temporal derivation of SAO parameters is not used in HEVC.
  • temporal derivation is introduced.
  • a group of image parts is formed and the use of temporal prediction is signalled for this group of image parts, rather than for each image part individually.
  • temporal derivation is used to derive at least one of the SAO parameters of the image part from at least one SAO parameter of a collocated image part in a reference image.
  • the collocated image part in the reference image therefore serves as a source image part for the image part to be derived.
  • image parts of the group can have different SAO parameters depending on the SAO parameters of the respective collocated image parts. Accordingly, with very light signalling, image parts belonging to a given group of image parts can use temporal derivation and benefit from different (and efficient) SAO parameters.
  • a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1.
  • a group could also be NxN CTUs, where N is an integer greater than 1 , or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
  • a group of image parts can be a CTU, and each constituent block of the CTU can be an image part.
  • each block of a CTU may have its own SAO parameters, but the signalling to use temporal derivation of the SAO parameters can be made for the CTU as a whole.
  • a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
  • the manner in which the SAO parameters are derived in the temporal derivation is not particularly limited except that at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part in a reference image.
  • the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part by copying the SAO parameter of the collocated image part.
  • One, more than one, or all SAO parameters may be copied.
  • one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
  • the group of image parts is a whole image.
  • each CTU of a current image 2001 derives its SAO parameters temporally from a collocated CTU in a reference image 2002.
  • the SAO parameters for the CTU 2003 in the current image 2001 are obtained by copying the SAO parameters from its collocated
  • the SAO parameters for the CTU 2004 in the current image 2001 are obtained by copying the SAO parameters from its collocated CTU 2006 in the reference image 2002.
  • CTU 2005 uses EO filtering with a direction of 0 degrees
  • CTU 2006 uses BO filtering.
  • the CTU 2003 also uses EO filtering with a direction of 0 degrees.
  • the CTU 2004 also uses BO filtering.
  • SAO parameters are copied in this embodiment, including the SAO type parameter sao ypejdx, parameters such as EO class (specifying a direction of EO filtering) and BO group sao_band j position (specifying a first class of a group of classes), and offsets.
  • SAO type parameter sao ypejdx parameters such as EO class (specifying a direction of EO filtering) and BO group sao_band j position (specifying a first class of a group of classes), and offsets.
  • the current image 2001 and its reference image 2002 may have different partitionings.
  • this is not a problem, as long as there is a suitable mechanism (known to the encoder and decoder) to associate a CTU in the current image 2001 with a“collocated” CTU in the reference image.
  • the mapping may identify the CTU in the reference image closest in position to the CTU in the current image. The closest position may be based on any suitable reference position in the CTUs concerned, for example the top-left position of each CTU.
  • Figure 13 is a flow chart for use in explaining a method of decoding an image in the first embodiment.
  • a first syntax element is read from the bitstream 2103 and decoded.
  • This first syntax element in this example is a simple temporal merge flag which indicates for the whole image whether or not temporal derivation of SAO parameters is to be used.
  • a reference image means another image of a sequence of images (previous or future image) which is used to perform temporal prediction for an image to be encoded.
  • a reference image means another image of the sequence (previous or future image which is used to perform temporal derivation of SAO parameters.
  • the reference images for the temporal derivation of SAO parameters may be the same as the reference images for the temporal prediction, or may be different.
  • the HEVC specification uses the term“reference frame” instead of “reference image” and refidx is usually referred to as a reference frame index accordingly.
  • the terms “reference image” and “reference frame” are used interchangeably in the present specification.
  • the decoder has a storage unit 2106, which may be called a Decoded Picture Buffer (DPB), which stores the SAO parameters for each CTU of the reference image.
  • DPB Decoded Picture Buffer
  • the DPB 2106 stores the SAO parameters for each CTU explicitly, without relying on merge flags such as merge_up and mergejeft because reading merge flags as part of the SAO parameters temporal derivation increases the complexity and slows down the derivation.
  • step 2107 the SAO parameters stored in the DPB 2106 for the collocated CTU in the reference image identified by refidx or by Ly and refidx are obtained. These are then set as the SAO parameters 2108 for the current CTU.
  • the SAO parameters comprise dedicated SAO parameters for each color component X (Y, U, V) and in steps 2109-2111 SAO filtering is performed for each color component in turn using the dedicated SAO parameters for the color component concerned.
  • the SAO parameters may be common to two or more components, for example U and V, or a mixture of common and dedicated (per-component) SAO parameters may be used.
  • the SAO parameters for the CTUs of the group are read from the bitstream, for example using the process of Figure 5.
  • Figure 13 relates to the steps carried out on the decoder side.
  • the steps involve reading and decoding the syntax elements for the group of image parts (whole image in this case) from the bitstream and then performing SAO filtering on the image parts of the group.
  • the same SAO filtering as on the decoder side is performed on the image parts of the group to ensure that the encoder has the same reference images as the decoder.
  • the syntax elements do not need to be read and decoded from the bitstream, as the related information is available in the encoder already.
  • the determination of whether or not to use the temporal derivation of SAO parameters for the group is made on the encoder side in this embodiment.
  • the choice of reference image for the temporal derivation is made on the encoder side.
  • the reference image is simply the first reference image of the first list L0. In that case, no syntax elements are necessary to identify refidx and Ly and step 2104 can be omitted. This removes some signalling and simplifies the decoder design.
  • the group of image parts was a whole image.
  • the group of image parts is a slice.
  • the decoding is the same as described in connection with the first embodiment except that the first syntax element indicates for the slice (as opposed to for the whole image) whether or not temporal derivation of SAO parameters is to be used and the second syntax element (or the second and third syntax elements in combination) identifies one reference image for the temporal derivation of SAO parameters of the CTUs of the slice.
  • a single reference frame is signalled for the frame or slice (or, in the variant, inferred without signalling) and the SAO parameters for CTUs of the frame or slice must come from that reference frame.
  • the number of CTUs for which the collocated CTU in that one reference frame uses SAO may be limited, resulting in a limitation on the number of CTUs subjected to SAO filtering.
  • the group is a slice. No reference frame is signalled in the slice header but the SAO parameters for CTUs of the frame may come from different reference frames.
  • Figure 14 is a flow chart for use in explaining a method of decoding an image in the third embodiment.
  • step 2201 a first syntax element is read from the bitstream 2103 and decoded. This first syntax element indicates for the slice or for the whole image whether or not temporal derivation of SAO parameters is to be used.
  • step 2202 it is checked if the syntax element indicates temporal derivation is to be used.
  • step 2202 if the outcome of the test in step 2202 is that the
  • the SAO parameters derivation is not the temporal derivation, the SAO parameters for the CTUs of the group (slice in this case) are read from the bitstream, for example using the process of
  • step 2205 A first, outer loop through all the CTUs of the current image is then started in step 2205.
  • a second, inner loop through all the possible reference frames from 0 to MAXrefidx is also started in 2203.
  • the Decoded Picture Buffer stores the SAO parameters for each CTU of all the reference frames.
  • step 2207 the SAO parameters stored in the DPB 2106 are obtained for the collocated CTU in the reference image under consideration identified by refidx or by Ly and refidx.
  • the collocated CTU may not have used SAO (saojypejdx is“no SAO”), in which case no SAO parameters may be obtainable in step 2207.
  • Step 2204 tests for this outcome (“yes”) and if so the second loop moves on to the next reference frame. If SAO parameters are obtained in step 2207 (outcome“no”) the second loop ends and the obtained SAO parameters are used to perform SAO filtering on the three color components (independent filtering in this embodiment but other variants are possible as described in connection with the first embodiment).
  • the test in step 2204 may be based on the luma component alone (i.e. whether saojypejdx is“no SAO” for the luma color component alone), in which case the SAO parameters for all three color components are considered unobtainable from the reference frame under consideration if the luma component SAO parameters are unobtainable.
  • each color component may be treated separately, or luma may be treated separately from the two chroma components.
  • the first loop continues CTU by CTU through all the CTUs of the slice or frame.
  • This embodiment improves the coding efficiency compared to the first and second embodiments by increasing the number of CTUs subjected to SAO filtering.
  • each list L0 and Ll are ordered.
  • a first reference frame more likely to have useful SAO parameters than a second reference frame, is placed ahead of the second reference frame.
  • the first reference frame may be the closest reference frame to the current frame.
  • the first reference frame may be the frame with the best quality in the list of reference frames. Any suitable measure of quality may be used, for example the quantisation parameter (QP) could be used as a measure of quality.
  • QP quantisation parameter
  • the frame with the lowest QP could be chosen as the first reference frame.
  • the first reference frame could also be chosen based on how often it is used for temporal prediction in the current slice or frame. This could be a good criterion at the encoder side but not for a decoder as it involves building statistics for the whole frame before applying SAO.
  • the first reference in the first list of reference frame is the most selected reference.
  • the first reference frame in each list is processed before the second reference frame and SAO parameters are picked up preferentially from the first reference frame compared to the second reference frame.
  • the reference frames are ordered from best to worst in terms of coding efficiency.
  • different reference images may be used for the temporal derivation of SAO parameters for different image parts of the group.
  • a particular reference image is identified by searching through a plurality of available reference images and selecting a reference image whose collocated image part satisfies at least one search condition.
  • the search condition may be that said collocated image part uses SAO filtering.
  • the reference images may be searched in order from highest coding efficiency to lowest coding efficiency.
  • SAO parameters are derived temporally whether the SAO type is BO or EO.
  • SAO parameters are only derived temporally only when the SAO type is EO. This is because the EO type is generally more efficient than the BO type. This can be implemented by modifying step 2204 in Figure 14 to test for“no SAO” or “BO”, instead of just“no SAO”.
  • the search condition in the fourth embodiment is that the collocated image part uses edge-type SAO filtering
  • a secondary search may be performed through the reference frames to find if there is a collocated CTU in one of those reference frames that uses BO, in which case BO may still be used for the temporal derivation of the subject CTU.
  • This variant results in performing a first search through the available reference images using a first search condition and if none of the available reference images satisfies the first search condition performing a second search through the available reference images using a second search condition different from the first search condition.
  • the first search condition may be that the collocated image part uses edge-type SAO filtering and the second search condition may be that said collocated image part uses band-type SAO filtering.
  • the collocated CTU in the reference frame has“no SAO”, or none of the collocated CTUs in any of the plural reference frames uses SAO, no SAO parameters can be obtained.
  • a default set of SAO parameters is used. This default set may be determined by the encoder and transmitted to the decoder, for example in the sequence parameter set or per slice. This is efficient because the default set may be optimised for the sequence or for the slice by the encoder.
  • the temporal derivation is determined for a slice and the second syntax element identifies a single reference frame for SAO parameter derivation in CTUs of the slice.
  • the second syntax element is at the slice level.
  • the temporal derivation is determined for the slice it is possible instead to use a syntax element at the CTU level to identify a reference frame for SAO parameter derivation of the CTU concerned. This approach is taken in the sixth embodiment.
  • Figure 15 is a flow chart for use in explaining a method of decoding an image in the sixth embodiment.
  • a first syntax element is read from the bitstream 2403 and decoded. This first syntax element indicates for the slice whether or not temporal derivation of SAO parameters is to be used.
  • step 2402 it is checked if the syntax element indicates temporal derivation is to be used. If the outcome is“NO” the process of Figure 5 may be used, as already described in connection with the first embodiment. If the outcome is“YES” a loop through all the CTUs of the current slice is then started in step 2405.
  • a second syntax element is extracted from the bitstream in step 2404. This second syntax element is a reference frame index refidx which identifies a reference image to be used for the temporal derivation.
  • a third syntax element is extracted from the bitstream 2403 in step 2404 as well. This is a list index Ly indicating whether the reference index is from List 0 (L0) or List 1 (Ll).
  • step 2407 the SAO parameters stored in a DPB 2406 for the collocated CTU in the reference image identified by refidx or by Ly and refidx are obtained. These are then set as the SAO parameters 2408 for the current CTU.
  • the SAO parameters comprise dedicated SAO parameters for each color component X (Y, U, V) and in steps 2409-2411 SAO filtering is performed for each color component in turn using the dedicated SAO parameters for the color component concerned.
  • the SAO parameters may be common to two or more components, for example U and V, or a mixture of common and dedicated (per-component) SAO parameters may be used.
  • different reference images may be used for the temporal derivation of SAO parameters for different image parts of the group, as in the third embodiment. Unlike in the third embodiment, it is not necessary for the decoder to search for the particular reference image and the particular reference image is identified by an item of information included in the bitstream. This simplifies the decoder design.
  • the per-CTU reference frames for SAO temporal derivation are signalled using refidx or using Ly and refidx.
  • This has the advantage that the signalling is the same as the traditional signalling of reference frames for temporal prediction.
  • this way of signalling the SAO-temporal-derivation reference frames is very costly because generally the lists for temporal prediction contain redundant frames. Indeed, a reference frame can be in both list L0 and Ll .
  • a list may contain the same reference frame several times. It is not necessary to signal each occurrence of the same reference frame and removing redundant frames can save the signalling rate associated with them. These redundancies can be removed by checking that each reference frame in the list has a POC (Picture Order Count) different from all other reference frames.
  • POC Physical Order Count
  • a specific list of reference frames for SAO temporal derivation is created, distinct from the lists LO and Ll used for temporal prediction, and the reference frame for SAO temporal derivation for each CTU is signalled based on this specific list, for example using a syntax element SAO reference frame index representing a reference frame.
  • the specific list contains non-redundant reference frames.
  • This in order to reduce the rate dedicated to the syntax element obtained in step 2404 in Figure 15. It corresponds to a merge between the two lists L0 and Ll .
  • Figure 16 is a flow chart illustrating a process to build a non-redundant list SAO Ref List of reference frames for SAO temporal derivation. In a first variant this process is carried out only in the encoder. In a second variant both the encoder and the decoder carry out this process.
  • the lists of reference frames (list L0, 2501, and list Ll, 2502) are the input of this process.
  • SAO Ref List is empty at the beginning of the process.
  • the step 2504 tests if the reference frame number i of list L0 Ref i LO is already in the list of reference frames for SAO (SAO Ref List) (2508). If Ref i LO is not in SAO Ref List, Ref i LO is added (2505) in SAO Ref List. In the same way, the reference frame number i in the list Ll Ref i Ll is added (2507) to SAO Ref List if it is not already present (2506).
  • SAO Ref List is signalled explicitly in the bitstream, for example in the slice header.
  • SAO Ref List is not signalled by the encoder and instead the decoder creates the same list by following the same list creation process as the encoder.
  • the advantage of using the second variant is to avoid the explicit signalling of the reference frame list for SAO at slice level in order to reduce the rate of the slice header. Yet, for some applications, it is preferable to explicitly signal this list. This can be particularly efficient when SAO is disabled for some reference frames for a given slice.
  • Both variants have the advantage that when a reference frame is signalled per CTU (as in step 2404 in Figure 15) the syntax element representing the reference frame can be more compact and efficient since there are fewer frames in SAO Ref List than in L0 and Ll together.
  • the maximum number of reference frames in the list SAO Ref List is explicitly signalled in the slice header.
  • the advantage of this variant is to reduce the rate dedicated to the syntax element representing the reference frame in step 2404.
  • a reference frame from L0 or Ll is added to the list SAO Ref List in step 2504 or in step 2506 if it is new (not already in the list).
  • a new reference frame is added to S AO Ref List only if SAO is enabled for at least one color component, i.e. the conditions in steps 2504 and 2506 are modified.
  • SAO may be disabled for each slice independently. Indeed, there may a first flag for disabling SAO at the slice level for the luma component, and a second flag for disabling SAO at the slice level for both chroma components (common flag) or separate flags for all three components.
  • the advantage is that some unnecessary reference frames are omitted from SAO Ref List, enabling a reduction of the signalling of the syntax element representing a reference frame for SAO obtained in step 2404.
  • a first option is to code the syntax element with a fixed length code.
  • a second option is to code the syntax element with a unary max code , where the“max” is the number of reference frames in SAO Ref List.
  • an arithmetic coding can be used. The arithmetic coding can be applied on top unary max code of fixed length code.
  • the list of possible SAO parameters set collocated with the current CTU in the previous reference frames of the SAO Ref List is reduced by checking if SAO is enabled for each collocated CTU or if the SAO parameters set is redundant.
  • the reduction can be achieved by comparing the SAO parameters in the different SAO parameter sets including the classification result (e.g. the edge direction sao_eo_class ) and the related offsets.
  • the syntax element representing the reference frame for SAO derivation is a syntax element representing the position of the SAO parameters set in the reduced list.
  • Another way is to use the exact needed number of bits to signal which SAO parameters set have been selected at the encoder side among the possible temporal SAO parameters sets. But in that case, the bitstream is not parseable without this SAO checking which is not recommended for many video applications using network.
  • the size of SAO Ref List will vary from one slice to the next. If the number of bits of the index is allowed to vary too it could be efficient (saving some bits when the size is below the maximum) but parsing in the decoder then requires the decoder to reconstruct SAO Ref List.
  • To simplify the parsing of the bitstream when list reduction is used it is possible for the encoder to signal in the bitstream the number of elements in the reduced SAO Ref List (that is the number of possible temporal SAO parameters sets). This enables the decoder to know the amount of bits dedicated to the signalling of the index without having to do any list reduction before parsing.
  • the seventh embodiment creates a list of reference images for the temporal derivation of SAO parameters based on one or more lists of reference images used for temporal prediction of the image parts of the group, wherein one or more reference images among the one or more temporal-prediction lists are excluded from the list of reference images for the temporal derivation of SAO parameters.
  • a reference image for the temporal derivation of SAO parameters is then selected from the list of reference images for the temporal derivation of SAO parameters. Redundant reference images among the one or more temporal-prediction lists may be excluded from the list of reference images for the temporal derivation of SAO parameters.
  • reference images whose respective collocated image parts do not use SAO filtering or whose respective collocated parts do not use edge-type SAO filtering may be excluded from the list of reference images for the temporal derivation of SAO parameters. It may be effective to impose a maximum on a number of reference images includable in the list of reference images for the temporal derivation of SAO parameters.
  • the decoder may create the same list of reference images for the temporal derivation of SAO parameters as the encoder. In this case, the list does not need to be signalled in the bitstream, which can reduce the rate of the slice header.
  • the encoder it is possible for only the encoder to create the list of reference images for the temporal derivation of SAO parameters based on one or more lists of reference images used for temporal prediction of the image parts of the group, the list of reference images for the temporal derivation of SAO parameters being signalled explicitly. This can still be effective when SAO is disabled for some reference frames used for temporal prediction as the list of reference images for the temporal derivation of SAO parameters may then be suitably compact.
  • the eighth embodiment relates to an encoding process.
  • a temporal derivation of SAO parameters is applied to a group of image parts.
  • a temporal derivation is applied to a whole image.
  • a temporal derivation is applied to a slice.
  • a non-temporal derivation of the SAO parameters is used in which SAO parameters are determined by the encoder for each image part (CTU) and signalled in the bitstream.
  • This may be referred to as a CTU-level non-temporal derivation of SAO parameters.
  • the decoder reads from the bitstream the first syntax element (e.g. temporal merge flag) and when it indicates temporal derivation is not applied to the group the decoder reads the per-CTU SAO parameters from the bitstream and filters each CTU according to the SAO parameters for the CTU concerned, for example using the decoding process of Figure 5.
  • the temporal derivation and the CTU-level non-temporal derivation are available derivations and the encoder selects one of them to apply to the group (e.g. frame or slice).
  • FIG 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in the CTU-level non-temporal derivation of SAO parameters.
  • the process starts with a current CTU (1101).
  • First the statistics for all possible SAO types and classes are accumulated in the variable CTUStats (1102).
  • the process of Step 1102 is described below with reference to Figure 18.
  • the RD cost for the SAO merge Left is evaluated if the Left CTU is in the current Slice (1103) as the RD cost of the SAO Merge UP (1104).
  • RD costs are also compared to disable SAO independently for the Luma and the Chroma components (1113, 1114).
  • the use of a new SAO parameters set (1115) is compared to the SAO parameters set“Merging” or sharing (1116) from the left and up CTU.
  • Figure 18 is a flow chart illustrating steps of an example of a statistics computed at the encoder side that can be applied for the Edge Offset type filter, in the case of the conventional SAO filtering. The similar approach may also be used for the Band Offset type filter.
  • Figure 18 illustrates the setting of the variable CTUStats containing all information needed to derive each best rate distortion offsets for each class. Moreover, it illustrates the selection of the best SAO parameters set for the current CTU.
  • each SAO type is evaluated.
  • the variables Sum ⁇ and SumNbPiX j are set to zero in an initial step 801.
  • the current frame area 803 contains N pixels.
  • j is the current range number to determine the four offsets (related to the four edge indexes shown in Figure 7B for Edge Offset type or to the 32 ranges of pixel values shown in Figure 8 for Band Offset type).
  • Sum j is the sum of the differences between the pixels in the range j and their original pixels.
  • SumNbPiX j is the number of pixels in the frame area, the pixel value of which belongs to the range j .
  • step 802 a variable i, used to successively consider each pixel Pi of the current frame area, is set to zero. Then, the first pixel P t of the frame area 803 is extracted in step 804.
  • step 805 the class of the current pixel is determined by checking the conditions defined in Figure 7B. Then a test is performed in step 805. During step 805, a check is performed to determine if the class of the pixel value P t corresponds to the value“none of the above” of
  • step 808 If the outcome is positive, then the value“i” is incremented in step 808 in order to consider the next pixels of the frame area 803.
  • step 806 the next step is 807 where the related SumNbPix ⁇ (i.e. the sum of the number of pixels for the class determined in step 805) is incremented and the difference between P ; and its original value P ⁇ ’ " ' is added to Sum j .
  • the variable i is incremented in order to consider the next pixels of the frame area 803.
  • Each offset Offset . is an optimal offset Ooptj in terms of distortion
  • the encoder uses the statistics set in table CTUCStats.
  • the distortion can be obtained by the following formula:
  • variable Shift is designed for a distortion adjustment.
  • the distortion should be negative as SAO is a post filtering.
  • the same computing is applied for Chroma components.
  • the Lambda of the Rate distortion cost is fixed for the three components.
  • the rate is only 1 flag which is CABAC coded.
  • Jj is initialized to the maximum possible value. Then a loop on Oj from Ooptj to 0 is applied in step 902. Note that Oj is modified by 1 at each new iteration of the loop. If Ooptj is negative, the value Oj is incremented and if Ooptj is positive, the value Oj is decremented.
  • This algorithm of Figures 18 and 19 provides a best ORDj for each class j. This algorithm is repeated for each of the four directions of Figure 7A. Then the direction that provides the best rate distortion cost (sum of Jj for each direction) is selected as the direction to be used for the current CTU.
  • the next step involves finding the best position of the S AO band position of Figure 8. This is determined with the encoding process set out in Figure 20.
  • the RD cost Jj for each range has been computed with the encoding process of Figure 19 with the optimal offset ORDj in terms of rate distortion.
  • the rate distortion value J is initialized to the maximum possible value.
  • a loop on the 28 positions j of 4 consecutive classes is run in step 1002.
  • the variable Jj corresponding to the RD cost of the band (of 4 consecutive classes) is initialized to 0 in step 1003.
  • the loop on the four consecutive offset j is run in step 1004.
  • Test 1008 checks whether or not the loop on the 28 positions has ended. If not, the process continues in step 1002, otherwise the encoding process returns the best band position as being the current value of sao_band _position 1009. Thus, the CTUStats table in the case of determining the SAO parameters at the
  • CTU level is created by the process of Figure 17. This corresponds to evaluating the CTU level in terms of the rate-distortion compromise. The evaluation may be performed for the whole image or for just the current slice.
  • FIG. 21 shows the RD cost evaluation of temporal derivation at Slice level.
  • First the distortion for the current colour component X is set equal to 0 (1601).
  • the temporal SAO parameters set of the collocated CTU in a reference frame (Ly, refidx) (1605) is extracted (1604) from the DPB (1603). If the SAO parameters set (1605) is equal to OFF (No SAO), the next CTU is processed (1610).
  • the distortion Distortion TEMPORAL X is incremented by an amount equal to the associated distortion of the offset Oi (1609). This is the same process as the RD cost evaluation for a merge of SAO parameters as described previously. Please note that sao_band _position is set equal to 0 when the SAO type is equal to an Edge type. When the distortion of all offsets have been added to Distortion TEMPORAL X (1608), the next CTU is processed (1610).
  • the RDCost for the temporal mode at Slice level, for component X is set equal to the sum of this computed distortion Distortion TEMPORAL X and l multiplied by the rate for this temporal mode at Slice level (1611). This rate is equal to the rate of the signalling of temporal mode plus, if needed, the rate of the reference frame index refidx and if needed plus the rate of the list Ly.
  • FIG 22 illustrates the competition between the CTU level for SAO and for temporal derivation at encoder side.
  • the current slice/frame 1901 is used to set the CTUStats table (1903) for each CTU (1902).
  • This table (1903) is used to evaluate the CTU level derivation (1904) and the temporal derivation for the whole slice (1915) as described previously in Figure 21.
  • This table (1903) is also used to evaluate several reference frames for temporal derivation.
  • the best derivation for the slice is selected according to the rate distortion criterion computed for each available derivation (1910).
  • the SAO parameters sets for each CTU are set (1911) according to the derivation selected in step 1910. These SAO parameters are then used to apply the SAO filtering (1913) in order to obtain the filtered frame/slice.
  • the selected derivation may be signalled in the slice header, for example using a syntax element indicating temporal derivation (which the decoder reads, see 2101 and 2201 in Figures 13 and 14).
  • the temporal derivation was put into competition with one alternative non-temporal method of deriving the SAO parameters.
  • two alternative methods are in competition with the temporal derivation.
  • Figure 23 shows various different groupings 1201-1206 of CTUs in a slice.
  • a first grouping 1201 has individual CTUs. This first grouping requires one set of SAO parameters per CTU. It corresponds to the CTU-level derivation in the eighth embodiment.
  • a second grouping 1202 makes all CTUs of the entire image one group.
  • all CTUs of the frame and hence the slice which is either the entire frame or a part thereof) share the same SAO parameters.
  • the encoder first computes a set of SAO parameters to be shared by all CTUs of the image. Then, in the first method, these SAO parameters are set for the first CTU of the slice. For each remaining CTU from the second CTU to the last CTU of the slice, the sao_merge_left flag is set equal to 1 if the flag exists (that is, if the current CTU has a left CTU). Otherwise, the sao_merge_up flag is set equal to 1.
  • Figure 24 shows an example of CTUs with SAO parameters set according to the first method. This method has the advantage that no signalling of the grouping to the decoder is required.
  • groupings do not increase the rate too much. This is because the merge flags are generally CAB AC coded in the same context. Since for the second group (entire image) these flags all have the same value (1), the rate consumed by these flags is very low. This follows because they always have the same value and the probability is 1.
  • the grouping is signalled to the decoder in the bitstream.
  • the SAO parameters are also signalled as SAO parameters for the group (whole image), for example in the slice header.
  • the signalling of the grouping consumes bandwidth.
  • the merge flags can be dispensed with, saving the rate related to the merge flags, so that overall the rate is reduced.
  • the first and second groupings 1201 and 1202 provide very different rate-distortion compromises.
  • the first grouping 1201 is at one extreme, giving very fine control of the SAO parameters (CTU by CTU), which should lower distortion, but at the expense of a lot of signalling.
  • the second grouping is at the other extreme, giving very coarse control of the SAO parameters (one set for the whole image), which raises distortion but has very light signalling.
  • the determination is done for a whole image and all CTUs of the slice/frame share the same SAO parameters.
  • FIG 25 is an example of the setting of SAO parameters for a frame/slice level using the first method of sharing SAO parameters (i.e. without new SAO classifications at encoder side). This figure is based on Figure 17.
  • the CTUStats table is set for each CTU (in the same way as the CTU level encoding choice). This CTUStats can be used for the traditional CTU level (1302).
  • the table FrameStats is set by adding each value for all CTUs of the table CTUStats (1303). Then the same process as for CTU level is applied to find the best SAO parameters (1305 to 1315).
  • the selected SAO parameters set at step 1315 is set for the first CTU of the slice/frame. Then for each CTU from the second CTU to the last CTU of the slice/frame, the sao_merge_left_flag is set equal to 1 if it exists otherwise the sao_merge_up_flag is set equal to 1 (indeed for the second CTU to the last CTU a merge Left or Up or both exist) (1317).
  • the syntax of the SAO parameters set is unchanged from that presented in Figure 9. At the end of the process the SAO parameters are set for the whole slice/frame.
  • CTUStats table in the case of determining the SAO parameters for the whole image (frame level) is created by the process of Figure 25. This corresponds to evaluating the frame level in terms of the rate-distortion compromise.
  • the encoder also evaluates the CTU level non-temporal derivation and the temporal derivation in terms of their respective rate-distortion compromises. Each evaluation is performed for the whole image in this case.
  • the three evaluations are then compared and the one with the best performance is selected.
  • the selected derivation (temporal or CTU level or frame level) is then signalled to the decoder in the bitstream.
  • the signalling of the selected derivation can be made in many different ways.
  • a grouping index can be signalled.
  • the first syntax element can then still be used to signal whether the SAO parameters for all CTUs of the slice are derived temporally or not (e.g. temporal merge flag), supplemented by the grouping index in the case when temporal derivation is not used.
  • the CTU level may have grouping index 0 and the frame level may have grouping index 1.
  • the first syntax element may be adapted to signal everything, for example CTU level and frame level may have index 0 and index 1 respectively and temporal derivation may have another index such as 2. In this case, in Figures 21, 22 and 24 the first syntax element is changed accordingly.
  • the example of determining the SAO parameters in Figure 25 corresponds to the first method of sharing SAO parameters as it uses the merge flags to share the SAO parameters among all CTUs of the image (see steps 1316 and 1317). These steps can be omitted if the second method of sharing SAO parameters is used.
  • the CTU-level non-temporal derivation is in competition with the temporal derivation.
  • the CTU-level non-temporal derivation is not available and instead the frame-level non-temporal derivation is in competition with the temporal derivation.
  • the CTU and Frame levels used in the ninth embodiment offer extreme rate-distortion compromises. It is also possible to include other groupings intermediate between the CTU and frame levels which can offer other rate-distortion compromises.
  • a third grouping 1203 makes a column of CTUs a group.
  • FIG 26 is an example of the setting of SAO parameters sets for the third grouping 1203 at the encoder side. This Figure is based on Figure 17. To reduce the amount of steps in the figure, the modules 1105 to 1115 have been merged in one step 1405 in this Figure 26.
  • the CTUStats table is set for each CTU. This CTUStats can be used for the traditional CTU level (1302) encoding choice.
  • the table ColumnStats is set by adding each value (1405) from CTUStats (1402), for each CTUs of the current column (1404). Then the new SAO parameters are determined as for CTU level (1406) encoding choice (cf. Figure 17).
  • the RD cost to share the SAO parameters with the previous left column is also evaluated (1407), in the same way as the sharing of SAO parameters set between left and up CTU (1103, 1104) is evaluated. If the sharing of SAO parameters gives a better RD cost (1408) than the RD cost for the new SAO parameters set, the sao merge left flag is set equal to 1 for the first CTU of the column. This CTU has the address number equal to the value“Column”. Otherwise, the SAO parameters set for this first CTU of the column is set equal (1409) to the new SAO parameters obtained in step 1406.
  • step 1412 can be processed once per frame.
  • CTU grouping is another RD compromise between the CTU level encoding choice and the frame level which can be useful for some conditions.
  • merge flags are used within the group, which means that the third grouping can be introduced without modifying the decoder (i.e. the grouping can be HEVC-compliant).
  • the second method of sharing SAO parameters described in the third embodiment can be used instead. In that case, merge flags are not used in the group (CTU column) and steps 1411 and 1412 are omitted.
  • the Merge between columns doesn’t need to be checked. It means that steps 1407 1408 1410 are removed from the process of Figure 26.
  • the advantage of removing this possibility is a simplification of the implementation and the ability to parallelize the process. This has a small impact on coding efficiency.
  • Another possible compromise intermediate between the CTU level and the frame level can be offered by a fourth grouping 1204 in Figure 23 which makes a line of CTUs a group.
  • a similar process to that of Figure 25 can be applied. In that case, the variable ColumnStats is changed by LineStats.
  • the step 1404 is replaced by
  • the New SAO parameters and the merge with the up CTU is evaluated based on this LineStats table (steps 1406 1407).
  • the step 1410 is replaced by setting of sao merge up flag to 1 for the first CTU of the Line. And for all CTUs of the slice/ffame except each first CTU of each Line, sao merge left flag is set equal to 1.
  • the advantage of the line is another RD compromise between the CTU level and Frame level. Please note that the frame or slice are most of the time rectangles and their width is larger than their height. So the line CTUs grouping 1204 is expected to be an RD compromise closer to the frame CTU grouping 1202 than the column CTU grouping 1203.
  • the line CTU grouping can be HE VC compliant if the merge flags are used within the groups.
  • RD compromises can be offered by putting two or more columns of CTUs or two or more lines of CTUs together as a group.
  • the process of Figure 25 can be adapted to determine SAO parameters to such groups.
  • the number N of columns or lines in a group may depend on the number of groups that are targeted.
  • the merge between these groups containing two or more columns or two or more lines doesn’t need to be evaluated.
  • Another possible grouping includes split columns or split lines, where the split is tailored to the current slice/frame.
  • FIG. 23 Another possible compromise between the CTU level and the frame level can be offered by square CTU groupings 1205 and 1206 as illustrated in Figure 23.
  • the grouping 1205 makes 2x2 CTUs a group.
  • the grouping 1206 makes 3x3 CTUs a group.
  • Figure 27 shows an example of how to determine the SAO parameters for such groupings.
  • NxNStats (1507) is set (1504, 1505, 1506) based on CTUstats. This table is used to determine the New SAO parameters (1508) and its RD cost, in addition to the RD cost for a Left (1510) sharing or Up (1509) sharing of SAO parameters.
  • the SAO parameters of the first CTU (top left CTU) of the NxN group is set equal to this new SAO parameters (1514). If the best RD cost is the sharing of SAO parameters with the up NxN group (1512), the sao merge up flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 and the sao merge left flag to 0 (1515). If the best RD cost is the sharing of SAO parameters with the left NxN group (1513), the sao_merge_left_flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 (1516).
  • FIG. 28 illustrates this setting for a 3x3 SAO group.
  • the top left CTU is set equal to the SAO parameters determined in step 1508 to 1516.
  • the sao_merge_left_flag is set equal to 1.
  • the sao_merge_left_flag is the first flag encoded or decoded and as it is set to 1, there is no need to set the sao merge up flag to 0.
  • the sao merge left flag is set equal to 0 and sao merge up flag is set equal to 1.
  • the sao merge left flag is set equal to 1.
  • NxN CTU groupings The advantage of the NxN CTU groupings is to create several RD compromises for SAO. As for the other groupings, these groupings can be HEVC compliant if merge flags within the groups are used. As for the other groupings, the test of Merge left and Merge up between groups can be dispensed with in Figure 27. So steps 1509, 1510, 1512, 1513, 1515 and 1516 can be removed, especially when N is high.
  • the value N depends on the size of the frame/slice.
  • N 2 and 3 are evaluated. This offers an efficient compromise.
  • Figure 29 illustrates an example of how to select the SAO parameter derivation using a rate-distortion compromise comparison.
  • the first method of sharing SAO parameters among the CTUs of a group is used. Accordingly, merge flags are used within groups. If applied to HEVC, the resulting bitstream can be decoded by an HEVC-compliant decoder.
  • the current slice/frame 1701 is used to set the CTUStats table (1703) for each CTU (1702).
  • This table (1703) is used to evaluate the CTU level (1704), the temporal derivation
  • the best derivation (a non-temporal derivation with a CTU grouping or the temporal derivation) is selected according to the rate distortion criterion computed for each available derivation (1710).
  • the SAO parameters sets for each CTU are set (1711) according to the derivation selected in step 1710. These SAO parameters are then used to apply the SAO filtering (1713) in order to obtain the filtered frame/slice.
  • the second method of sharing SAO parameters among the CTUs of the CTU grouping may be used instead of the first method. Both methods have the advantage of offering a coding efficiency increase.
  • a second advantage, obtained when the first method is used but not when the second method is used, is that this competition method doesn’t require any additional SAO filtering or classification. Indeed, the main impacts on encoder complexity are the step 1702 which needs SAO classification for all possible SAO type and the step 1713 which filters the samples. All other CTU groupings evaluations are only some additions of values already obtained during the CTU level encoding choice (set in the table CTUStats).
  • the encoder signals in the bitstream which derivation of the SAO parameters is selected (CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation).
  • a possible indexing scheme is shown in Table 1 below:
  • the derivation index is also referred to as a grouping index hereinafter.
  • Figure 30 is a flow chart illustrating a decoding process when the CTU grouping is signaled in the slice header according to the second method of sharing SAO parameters among the CTUs of the group.
  • the corresponding CTUs grouping index (1804) is used to select the CTUs grouping method (1805).
  • This grouping method will be applied to extract the SAO syntax and to determine the SAO parameters set for each CTU (1806). Then the next slice header syntax element is decoded. If the CTU grouping index (1804) corresponds to the temporal derivation, other parameters can be extracted from the bitstream such as the reference frame index and/or other parameters necessary for the temporal derivation.
  • the CTUs grouping index uses a unary max code in the slice header. In that case, the CTUs groupings are ordered according to their probabilities of occurrences (highest to lowest).
  • At least one non-temporal derivation is an intermediate level derivation (SAO parameters not at CTU level or at group level).
  • SAO parameters When applied to a group it causes the group (e.g. frame or slice) to be subdivided into subdivided parts (CTU groupings 1203-1206, e.g. columns of CTUs, lines of CTUs, NxN CTUs, etc.) and derives SAO parameters for each of the subdivided parts.
  • Each subdivided part is made up of two or more said image parts (CTUs).
  • the advantage of the intermediate level derivation(s) is introduction of one or more effective rate-distortion compromises.
  • the intermediate level derivation(s) can be used without the CTU-level derivation or without the frame-level derivation or without either of those two derivations.
  • the temporal derivation is in competition with CTU level derivation and the frame level derivation.
  • the twelfth embodiment builds on this and adds one or more of the intermediate groupings set out in the sixth embodiment, so that the competition includes CTU level, frame level, one or more groupings intermediate between the CTU and frame levels, and the temporal derivation.
  • the temporal derivation is in competition with CTU level derivation but not the frame level derivation.
  • the thirteenth embodiment builds on this and adds one or more NxN CTU groups so that the competition includes CTU level, one or more NxN CTU groups, and the temporal derivation.
  • the temporal derivation is in competition with CTU level derivation but not the frame level derivation.
  • the eighth embodiment builds on this and adds the third grouping 1203 (column of CTUs) or the fourth grouping 1204 (line of CTUs) or both the third and fourth groupings 1203 and 1204.
  • the competition therefore includes CTU level, the third and/or fourth grouping, and the temporal derivation.
  • the ninth and eleventh to fourteenth embodiments each promote diversity for the SAO parameter derivation to be applied to a group by making at least first and second said non temporal derivations available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level.
  • the levels may any two levels from the frame level to a CTU level.
  • the levels may correspond to the groupings 1201-1206 in Figure 23.
  • the smallest grouping is the first grouping 1201 in which each CTU is a group and there is one set of S AO parameters per CTU.
  • a set of SAO parameters can be applied to a smaller block than the CTU.
  • the non-temporal derivation is not at the CTU level, frame level or an intermediate level between the CTU and frame levels but at a sub-CTU level (a level smaller than an image part).
  • index 0 means that each CTU is divided into 16 blocks and each may have its own SAO parameters.
  • Index 1 means that each CTU is divided into 4 blocks, again each having its own SAO parameters.
  • the selected derivation is then signalled to the decoder in the bitstream.
  • the signalling may comprise a temporal/non-temporal syntax element plus a depth syntax element (e.g. using the indexing scheme above).
  • a combined syntax element may be used to signal temporal/non-temporal and the depth.
  • Temporal derivation could be assigned index 6 ,for example, with the non-temporal derivations having index 0-5.
  • At least one non-temporal derivation when applied to a group causes the group to be subdivided into subdivided parts and derives SAO parameters for each of the subdivided parts, and each image part is made up of two or more said sub-divided parts.
  • At least first and second said non-temporal derivations are available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level.
  • the levels may any two levels from the frame level to a sub-CTU level.
  • the levels may correspond to the groupings 1201- 1206 in Figure 23.
  • the selected derivation of the SAO parameters is signalled for a slice, which means that the temporal derivation (when selected) is used for all CTUs of the slice. It is not possible to determine at the CTU level whether to use temporal derivation or not.
  • the available non temporal derivations include derivations having SAO parameters at different levels (depths) lower than the slice or frame level, it is not possible to determine at the chosen level of the SAO parameters whether to use temporal prediction or not.
  • the SAO parameters derivation is modified so that a temporal derivation at the CTU level is available, rather than only a temporal derivation at the group level.
  • the temporal derivation at the CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts.
  • a level of the SAO parameters is selected for a slice or frame, which may include the CTU level. Then, when the CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each CTU of the slice or frame.
  • a temporal derivation or non-temporal derivation may be selected per CTU group (e.g. each column of CTUs) of the slice or frame.
  • the temporal derivation does still apply to a group of two or more CTUs (image parts).
  • One or more CTU groups within the slice may then use temporal derivation (with each CTU deriving an SAO parameter from a collocated CTU of a reference image), whilst other CTU groups use a non-temporal derivation.
  • the SAO merge flags are usable between groups of the CTUs grouping. As depicted in Figure 31, for the 2x2 CTU grouping, the SAO merge Left and SAO merge up are kept for each group of 2x2 CTUs. But they are removed for CTUs inside the group. Please note that only the saojnergejeft Jlag is used for the grouping 1203 of a column of CTUs, and only the sao_merge_up Jlag is used for the grouping 1204 of a line of CTUs.
  • a flag signals if the current CTU group shares its SAO parameters or not. If it is true, a syntax element representing one of the previous groups is signalled. So each group of a slice can be predicted by a previous group except the first one. This improves the coding efficiency by adding several new possible predictors.
  • a depth of the SAO parameters was selected for a slice, including depths smaller than a CTU, making it possible to have a set of SAO parameters per block in a CTU.
  • no depth could be selected and all CTUs of the slice had to use temporal derivation.
  • the SAO parameters derivation is modified so that a depth is selected for the slice and then it is selected for an image part at the selected depth whether or not to use temporal derivation.
  • the depths may be the ones in Table 2.
  • the SAO parameters derivation is modified so that a temporal derivation at the sub-CTU level is available, rather than only a temporal derivation at the group level.
  • the temporal derivation at the sub-CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts.
  • a level of the SAO parameters is selected for a slice or frame, which may include the sub-CTU level. Then, when the sub-CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each block of the slice or frame.
  • a temporal derivation or non-temporal derivation may be selected per CTU or per CTU group (e.g. each column of CTUs) of the slice or frame.
  • the temporal derivation does still apply to a group of two or more blocks (image parts).
  • One or more CTUs or CTU groups within the slice may then use temporal derivation (with each block deriving an SAO parameter from a collocated block of a reference image), whilst other CTUs or CTU groups use a non-temporal derivation.
  • the benefit of selecting between temporal and non-temporal SAO parameter derivation per CTU or CTU group is achieved in addition to the benefit of applying the temporal derivation on a CTU or CTU group basis.
  • one possibility is to remove the SAO merge flags for all levels. It means that steps 503 504 505 506 of Figure 9 are removed.
  • the advantage is that it reduces significantly the signalling of SAO and consequently it reduces the bitrate. Moreover, it simplifies the design by removing 2 syntax elements at CTU level.
  • the merge flags are kept for CTU level but removed for all other CTU groupings.
  • the advantage is a flexibility of the CTU level.
  • the merge flags are used for CTU when the SAO signalling is lower or equal to the CTU level (1/16 CTU or 1 ⁇ 4 CTU) and removed for other CTUs groupings having larger groups.
  • the merge flags are important for small block sizes because a SAO parameters set is costly compared to the amount of samples that it can improve. In that case, these syntax elements reduce the cost of SAO parameters signalling. For large groups, the SAO parameters set is less costly so the usage of merge flags is not efficient. So the advantage of these embodiments is a coding efficiency increase.
  • the level where the SAO merge flags are enabled is explicitly signalled in the bitstream.
  • a flag indicates if the SAO merge flags are used or not.
  • the flag may be included after the index of the CTUs grouping (or the depth) in the slice header.
  • the competition between the different permitted derivations is modified so that only one derivation is permitted in the encoder for any given slice or frame.
  • the permitted derivation may be determined in dependence upon one or more characteristics of the slice or frame. For example, the permitted derivation may be selected based on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP).
  • the Intra Frames and the Inter frames at the highest position in the hierarchy of the GOP structure or with the low QP may be permitted only to use the CTU level.
  • the other frames which have lower positions in the GOP hierarchy or a high QP may be permitted only to use temporal derivation.
  • the different parameters can be set depending on the rate distortion compromise.
  • the advantage of this embodiment is a complexity reduction. Instead of evaluating two or more competing derivations just one derivation is selected, which can be useful for a hardware encoder.
  • a first derivation is associated with first groups of the image (e.g. Intra slices) and a second derivation is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, the first derivation is used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, the second derivation is used to filter the image parts of the group. Evaluation of the two derivations is not required.
  • Whether a group to be filtered is determined to be a first group or a second group may depend on one or more of:
  • a slice type a frame type of the image to which the group to be filtered belongs;
  • the first derivation may have fewer image parts per group than the second derivation.
  • the competition for a given slice or frame is still permitted but the set of competing derivations is adapted to the slice or frame.
  • the set of competing derivations may depend on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP).
  • the set of competing derivations may depend on the slice type.
  • the set preferably contains groupings with groups containing small numbers CTUs (e.g. CTU level, 2x2 CTU, 3x3 CTU, and Column). Also, if depths lower than a CTU are available (as in the tenth embodiment), these depths are preferably also included. Of course, the temporal derivation is not used.
  • the set of derivations preferably contains groupings with groups containing large numbers of CTUs such as Fine, Frame level, and the temporal derivation. However, smaller groupings can also be considered down to the CTU level.
  • the advantage of this embodiment is a coding efficiency increase thanks to the use of derivations adapted for a slice or frame.
  • the set of derivations can be different for an Inter B slice from that for an Inter P slice.
  • the set of competing derivations depends on the characteristics of the frame in the GOP. This is especially beneficial for frames which vary in quality (QP) based on a quality hierarchy. For the frames with the highest quality or highest position in the hierarchy, the set of competing derivations should include groups containing few CTUs or even sub-CTU depths (same as for Intra slices above). For frames with a lower quality or lower position in the hierarchy, the set of competing derivations should include groups with more CTUs.
  • the set of competing derivations can be defined in the sequence parameters set.
  • a first set of derivations is associated with first groups of the image (e.g. Intra slices) and a second set of derivations is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, a derivation is selected from the first set of derivations and used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, a derivation is selected from the second set of derivations and used to filter the image parts of the group. Evaluation of derivations not in the associated set of derivations is not required.
  • Whether a group to be filtered is a first group or a second group may be determined in the preceding embodiment. For example, when the first groups have a higher quality or higher position in the quality hierarchy than the second groups, the first set of derivations may have at least one derivation with fewer image parts per group than the derivations of the second set of derivations.
  • the temporal derivation involves simply copying SAO parameters from a collocated CTU (or from a collocated block within a CTU if SAO parameters at the block level are used).
  • SAO parameters In a video, there are generally background and moving objects.
  • a large part can be static.
  • the SAO temporal derivation is applied on this static part for several consecutive frames, the SAO filtering should filter nothing, especially for edge offset. As a result, the temporal derivation will not be selected.
  • the set of SAO parameters from the previous frame is changed according to some defined rules.
  • Figure 32 is an example of an algorithm to produce such a modification of the set of SAO parameters.
  • a 90° rotation is applied to the edge classification. If sao_eo_class_Luma or sao_eo_class_Chroma (2301) from the collocated CTU is equal to 0, which corresponds to edge type 0° (2302), the edge type for the current frame (2310) is set equal to 1 (2303) corresponding to SAO edge type 90°.
  • sao_eo_class_X is set equal to 0.
  • the edge offset type 135° ⁇ sao_eo_class_X is rotated to edge offset type 45° (2307).
  • the edge offset type 45° ⁇ sao_eo_class_X is rotated to edge offset type 45° (2309).
  • the offsets values have not been changed.
  • the effect of the algorithm of Figure 32 is to apply a rotation
  • the changes to the edge classification parameters may be effected by using a mapping table.
  • the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index.
  • the mapping table implements the required rotation.
  • Figure 33 illustrates this temporal rotation by 90°.
  • the temporal derivation with 90° rotation is applied to a whole frame or slice as in the first and second embodiments.
  • the 45° and the 135° rotations can be considered instead of 90°.
  • the rotation of temporal SAO parameters sets is the 90° rotation. This gives the best coding efficiency.
  • band offsets are not copied and SAO is not applied on this CTU.
  • a default SAO parameter set can be used for the CTUs concerned as described in connection with the fifth embodiment.
  • the“rotation” temporal derivation is introduced.
  • the“rotation” temporal derivation is put in competition with the “copying” temporal derivation as shown in Figure 34.
  • the competition is applied to each slice or each frame.
  • the best temporal derivation may be selected based on a rate-distortion criterion.
  • the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths).
  • the“rotation” temporal derivation is put into competition with the same non-temporal derivation(s) instead of the“copying” temporal derivation.
  • the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths).
  • both the “copying” and “rotation” temporal derivations are put into competition with the same non-temporal derivation(s) instead of just the“copying” temporal derivation.
  • Table 3 below shows the competing derivations when the eleventh embodiment is modified in this way:
  • a first frame FO is followed by second, third, fourth and fifth frames F1-F4.
  • the first frame FO does not use temporal derivation of SAO parameters.
  • the“copying” temporal derivation is applied (i.e. copying the SAO parameters from FO).
  • the temporal derivation is a 90° rotation of SAO parameters of F0.
  • the temporal derivation is a 135° rotation of SAO parameters of F0.
  • the temporal derivation is a 45° rotation of SAO parameters of F0.
  • F0 is a reference image for each of Fl to F4.
  • Frame F2 (temporal‘90°’ Frame 1)
  • Frame F3 (temporal‘45°’ Frame 2)
  • Frame F4 (temporal‘90°’ Frame 3)
  • the direction of edge filtering of an image part may be switched successively through all possible edge filtering directions.
  • HEVC High Efficiency Video Coding
  • SAO filtering is performed CTU by CTU.
  • temporal derivation is introduced, and to improve the signalling efficiency, a group of image parts is formed and the use of temporal prediction is signalled for this group of image parts, rather than for each image part individually.
  • the“rotation” temporal derivation is applied to all CTUs of a slice or frame.
  • a rotation temporal derivation is signalled for a group (slice, frame, column, line, NxN CTUs, etc.) composed of two or more image parts (CTUs).
  • the image parts (CTUs) may still have different SAO parameters depending on the SAO parameters of the respective collocated image parts.
  • Signalling the temporal derivation at the slice of frame level is useful for compatibility with the embodiments described previously a grouping of CTUs is selectable for the slice or frame from among plural groupings (e.g. the groupings 1201-1206 in Figure 23), the selected grouping also being signalled at the slice or frame level.
  • groupings e.g. the groupings 1201-1206 in Figure 23
  • the signalling of the use of temporal derivation can be at the CTU level or at the block level (i.e. sub-CTU).
  • a syntax element may be provided per CTU to indicate whether or not rotation temporal derivation is used for the CTU concerned.
  • a syntax element may be provided per block (i.e. sub-CTU) to indicate whether or not rotation temporal derivation is used for the block concerned.
  • the process of Figure 35 is performed CTU by CTU.
  • the sao jnerge Jemporal Jlag_X is extracted from the bitstream if other merge flags are off (2613). If sao jnerge Jemporal Jlag_X is equal to 1, a syntax element representing a reference frame is extracted from the bitstream (2614). Please note that this step is not needed if only one reference frame is used for the derivation. Then a syntax element representing a rotation of the parameters is decoded (2615). Please note that this step is not needed, if no“rotation” option is available. This would be the case if the only type of temporal derivation is the basic“copy” type.
  • step 2615 is not performed if the collocated CTU in the reference frame is not EO type. Then the respective sets of SAO parameters for the 3 color components are copied from the collocated CTU to the current CTU. Processing then moves to the next CTU in step 2610.
  • temporal merge flag signalling compared to temporal/ CTU grouping signalling at slice level is a simplification of the encoder design for some implementations. Indeed, there is no need to wait for the encoding of the whole frame before starting SAO selection, unlike in the slice level approach. But the extra signalling at the CTU level can have a significant impact on the coding efficiency is not negligible.
  • the syntax element per CTU extracted in step 2615 may indicate the selected temporal derivation, e.g. using an index.
  • the syntax element could also specify the angle of rotation. In this way, in the same slice or frame, some CTUs may have no temporal derivation, other CTUs may use “copy”, still others may use“rotate by 90°”, and so on.
  • Signalling a grouping for a slice or frame and then signalling for each group of two or more CTUs whether to use temporal derivation or not or, if two or more temporal derivations are in competition with one another, which one of them is selected, is an effective way to achieve adaptability without having per-CTU syntax elements. For example, if the selected grouping for a slice is 3x3 CTUs, some groups may have no temporal derivation, other groups may use“copy”, still others may use“rotate by 90°”, and so on.
  • the number of groups is only l/9 th of the number of CTUs the number of syntax elements is correspondingly smaller compared to per-CTU signalling too, yet the different CTUs in each group may still have different SAO parameters depending on the collocated CTUs.
  • rotation temporal derivations are introduced. These rotation temporal derivations are preferred examples from a wider class of transformations that can be applied to change the direction of EO filtering in a CTU of the current frame compared to the direction of EO filtering in a collocated CTU of a reference frame.
  • the direction-changing transformation could be a reflection about the x- axis or y-axis. Such a reflection has the effect of swapping two directions and leaving the other two directions unchanged. It could also be a reflection about a diagonal line at 45° or 135°.
  • the effect of the algorithm of Figure 32 is to apply a transformation
  • the changes to the edge classification parameters may be effected by using a mapping table.
  • the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index.
  • the mapping table implements the required transformation.
  • This embodiment is applicable to the first group of embodiments (which use a group- wise derivation) or to the second group of embodiments (which do not use a group-wise derivation).
  • temporal derivation of SAO parameters was introduced, either as a group-wise derivation (applied to a group of two or more image parts) or for individual image parts.
  • new spatial derivations of SAO parameters are introduced. These may be group-wise derivations or for individual image parts.
  • a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1.
  • a group could also be NxN CTUs, where N is an integer greater than 1, or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
  • a group of image parts can be a CTU, and each constituent block of the CTU can be an image part.
  • each block of a CTU may have its own SAO parameters, but the signalling to use spatial derivation of the SAO parameters can be made for the CTU as a whole.
  • a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
  • the manner in which the SAO parameters are derived in the spatial derivation is not particularly limited except that the source image part belongs to another group of image parts in the same image as the subject group.
  • the source image part and the image part to be derived are at the same positions in their respective groups. For example, in a 3x3 CTU grouping, there are 9 positions from the top left to the bottom right.
  • the other group is, for example, the left group of the subject group
  • at least one SAO parameter of an image part at position 1 (the top left position, say) in the subject group is derived from an SAO parameter of the image part at the same position (position 1 or top left position) in the left group.
  • This image part in the left group serves as a source image part for the image part to be derived in the subject group. The same is true for each other position in the subject group.
  • the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the source image part by copying the SAO parameter of the source image part.
  • One, more than one, or all SAO parameters may be copied.
  • one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
  • FIG. 31 For example, in Figure 31, several 2x2 CTU groups are illustrated.
  • saojnergejeft Jlag is set in a current 2x2 group
  • one or more SAO parameters of the CTU in the top-left position of the left 2x2 group are copied to the CTU in the top-left position of the current 2x2 group
  • one or more SAO parameters of the CTU in the top-right position of the left 2x2 group are copied to the CTU in the top-right position of the current 2x2 group
  • one or more SAO parameters of the CTU in the bottom-left position of the left 2x2 group are copied to the CTU in the botom-left position of the current 2x2 group
  • one or more SAO parameters of the CTU in the bottom-right position of the left 2x2 group are copied to the CTU in the bottom-right position of the current 2x2 group.
  • each CTU has its own“new” SAO parameters, which may be at the group level (one set per group) or at the CTU level.
  • spatial and temporal group-wise derivations are both“group- wise sourcing derivations”. Each involves applying a group-wise sourcing derivation of SAO parameters to a group of two or more image parts, the group-wise sourcing derivation permitting different image parts belonging to the group to have different SAO parameters and comprising deriving at least one said SAO parameter of an image part belonging to the group from an SAO parameter of another image part serving as a source image part for the image part to be derived.
  • the source image part is a collocated image part in a reference image having a position in the reference image collocated with a position of the image part to be derived in its image.
  • the source image part belongs to another group of image parts in the same image as the image part to be derived, said source image part and said image part to be derived being at the same positions in their respective groups.
  • the rotation derivation was a non-group-wise temporal derivation.
  • a spatial rotation derivation is used as a derivation, i.e. where the SAO parameters of a CTU in a current image are derived by rotation from the SAO parameters of another CTU of the same image (as opposed to being derived by rotation from the SAO parameters of a collocated CTU of a reference image).
  • the other CTU in the“rotation” spatial derivation may be a left CTU or an upper CTU, in which case a sao_merge_rotation_left flag or sao_merge_rotation_up flag may be used to signal when the rotation spatial derivation is selected.
  • Figure 36 shows two examples where the other CTU is the left CTU and the rotation from the left CTU to the current CTU is 90 degrees.
  • the rotation spatial derivation may be in competition with the temporal copy derivation and/or the rotation temporal derivation.
  • the“rotation” derivation is applied on a spatial basis to generate additional SAO merge parameters set candidates to predict the SAO parameters set of the current CTU. Accordingly, the“rotation” can be applied to increase the list of SAO Merge candidates or to find new SAO Merge candidates for empty positions.
  • the advantage of using the twenty-seventh embodiment instead of using several SAO parameters set from previously decoded SAO parameters set is an increase of coding efficiency performance. Moreover it offers additional flexibility for encoder implementation by accessing to a limited number of already encoded SAO parameters sets.
  • Figure 37 is a flow chart represented on of example of the possible usage of the rotation derivation of SAO parameters.
  • the sao_merge_rotation_Left_X flag is extracted from the bitstream if other merge flags are off (3613). If sao_merge_rotation_Left_X is equal to 1, for each color component YUV of the current CTU the set of SAO parameters is derived from the set SAO parameters for the same component of the left CTU YUV by applying rotation to the edge classification as described in the twenty-fifth embodiment. The SAO parameters other than the direction may be simply copied. Twenty-eighth Embodiment
  • the rotation spatial derivation was applied to one CTU.
  • a group-based rotation spatial derivation is applied. Then, each CTU of a current group derives its SAO parameters by rotation from the CTU at the same position in another group of the same image.
  • the group may be 3x3 CTUs.
  • the other group may be a group above or on the left.
  • group-based spatial derivation may be in competition with a group-based temporal derivation (either copy or rotation or both).
  • the group-based spatial derivation may be in competition with a group-based
  • a rotation spatial derivation was introduced.
  • the rotation temporal derivation is one of a wider class of possible direction transforming temporal derivations
  • the rotation spatial derivation is one of a wider class of possible direction-changing spatial derivations.
  • the direction-changing spatial derivation may be applied to an individual CTU or to a group of CTUs. It may be in competition with other spatial and/or temporal derivations.
  • Thirtieth Embodiment Figure 38 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention.
  • the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100.
  • a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user.
  • the system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal - e.g. while earlier video/audio are being displayed/output) via the communication network 199.
  • the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time.
  • the system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191.
  • the bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus.
  • the system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g.
  • the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content.
  • the decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer- readable medium and executed by a hardware-based processing unit.
  • Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol.
  • computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non- transitory or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • a computer program product may include a computer- readable medium.
  • such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • a computer-readable medium For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
  • DSL digital subscriber line
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
  • the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
  • IC integrated circuit
  • a set of ICs e.g., a chip set.
  • Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An image comprising a plurality of image parts is subjected to sample adaptive offset (SAO) filtering. The SAO filtering comprises selecting, from among two or more available temporal derivations of SAO parameters, a temporal derivation of SAO parameters to apply to an image part, the available temporal derivations comprising different ways of deriving at least one SAO parameter of said image part from an SAO parameter of a collocated image part of a reference image. SAO filtering is performed on the image part using the derived SAO parameters. The two temporal derivations may be a temporal copy (merge) derivation and a temporal derivation involving modifying at least one SAO parameter of the collocated image part, for example a direction-changing or rotation modification.

Description

VIDEO CODING AND DECODING
The present invention relates to video coding and decoding.
Recently, the Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group l6’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include— but not limited to— 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
The JVET exploration model (JEM) uses all the HEVC tools. One of these tools is sample adaptive offset (SAO) filtering. However, SAO is less efficient in the JEM reference software than in the HEVC reference software. This arises from fewer evaluations and from signalling inefficiencies compared to other loop filters.
US 9769450 discloses an SAO filter for three dimensional or 3D Video Coding or 3DVC such as implemented by the HEVC standard. The filter directly re-uses SAO filter parameters of an independent view or a coded dependent view to encode another dependent view, or re-uses only part of the SAO filter parameters of the independent view or a coded dependent view to encode another dependent view. The SAO parameters are re-used by copying them from the independent view or coded dependent view.
US 2014/0192860 Al relates to the scalable extension of HEVC. HEVC scalable extension aims at allowing coding/decoding of a video having multiple scalability layers, each layer being made up of a series of frames. Coding efficiency is improved by inferring, or deriving, SAO parameters to be used at an upper layer (e.g. an enhancement layer) from the SAO parameters actually used at a lower (e.g. base) layer. This is because inferring some SAO parameters makes it possible to avoid transmitting them.
It is desirable to improve the coding efficiency of images subjected to the SAO filtering.
According to a first aspect of the present invention there is provided a method of performing sample adaptive offset (SAO) filtering as defined by any one of claims 1 to 20.
According to a second aspect of the present invention there is provided a method of encoding an image as defined by any one of claims 21 to 23. According to a third aspect of the present invention there is provided a method of decoding an image as defined in claim 24.
According to a fourth aspect of the present invention there is provided a device for performing sample adaptive offset (SAO) filtering as defined in claim 25.
According to a fifth aspect of the present invention there is provided an encoder as defined by claim 26.
According to a fifth aspect of the present invention there is provided a decoder as defined by claim 27.
According to sixth to eighth aspects of the present invention there is provided a program which, when executed by a computer or processor, causes the computer or processor to carry out the method of the first to third aspects respectively.
The program may be provided on its own or may be carried on, by or in a carrier medium. The carrier medium may be non-transitory, for example a storage medium, in particular a computer-readable storage medium. The carrier medium may also be transitory, for example a signal or other transmission medium. The signal may be transmitted via any suitable network, including the Internet.
According to a ninth aspect of the present invention there is provided a signal as defined by claim 29.
Such a signal may be in transitory form or in non-transitory form. For example, the signal may be stored in a media storage device such as a Blu-ray disk. The signal may then be converted from non-transitory form to transitory form by reproducing it from the media storage device.
Reference will now be made, by way of example, to the accompanying drawings, in which:
Figure 1 is a diagram for use in explaining a coding structure used in HEVC;
Figure 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the invention may be implemented;
Figure 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented; Figure 4 is a flow chart illustrating steps of an encoding method according to embodiments of the invention;
Figure 5 is a flow chart illustrating steps of a loop filtering process of in accordance with one or more embodiments of the invention;
Figure 6 is a flow chart illustrating steps of a decoding method according to embodiments of the invention;
Figure 7 A and 7B are diagrams for use in explaining edge-type SAO filtering in HEVC;
Figure 8 is a diagram for use in explaining band-type SAO filtering in HEVC;
Figure 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications;
Figure 10 is a flow chart illustrating in more detail one of the steps of the Figure 9 process;
Figure 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications;
Figure 12 is a schematic view for use in explaining a temporal derivation of SAO parameters in a first embodiment of the present invention; Figure 13 is a flow chart for use in explaining a method of decoding an image in the first embodiment;
Figure 14 is a flow chart for use in explaining a method of decoding an image in a third embodiment of the present invention;
Figure 15 is a flow chart for use in explaining a method of decoding an image in a sixth embodiment of the present invention; Figure 16 is a flow chart illustrating a process to build a list of reference frames for SAO temporal derivation in a seventh embodiment of the present invention;
Figure 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in a CTU-level non-temporal derivation of SAO parameters in an eighth embodiment of the present invention;
Figure 18 shows one of the steps of Figure 17 in more detail; Figure 19 shows another one of the steps of Figure 17 in more detail;
Figure 20 shows yet another one of the steps of Figure 17 in more detail;
Figure 21 is a flow chart for use in explaining how to evaluate a cost of a temporal derivation in the eighth embodiment;
Figure 22 is a flow chart for use in explaining how to compare the costs of the temporal derivation and a further, non-temporal derivation, in the eighth embodiment; Figure 23 shows various different groupings of CTUs in a slice;
Figure 24 is a diagram showing image parts of a frame in a non-temporal derivation of SAO parameters in which a first method of sharing SAO parameters is used; Figure 25 is a flowchart of an example of a process for setting SAO parameters in the non-temporal derivation of Figure 24;
Figure 26 is a flowchart of an example of a process for setting of SAO parameters in another non-temporal derivation using the first sharing method to share SAO parameters among a column of CTUs;
Figure 27 is a flowchart of an example of a process for setting of SAO parameters in yet another non-temporal derivation using the first sharing method to share SAO parameters among a group of NxN CTUs; Figure 28 is a diagram showing image parts of one NxN group in the non-temporal derivation of Figure 27; Figure 29 illustrates an example of how to select the SAO parameter derivation in an eleventh embodiment of the present invention;
Figure 30 is a flow chart illustrating a decoding process suitable for a second method of sharing SAO parameters among image parts of a group;
Figure 31 is a diagram showing image parts of multiple 2x2 groups in a sixteenth embodiment of the present invention;
Figure 32 is a schematic view for use in explaining a process of deriving SAO parameters in a temporal rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention;
Figure 33 is a schematic view of the temporal rotation derivation of Figure 32; Figure 34 is a schematic view for use in explaining a process of deriving SAO parameters in which different temporal derivations are available;
Figure 35 is a flowchart for use in explaining a decoding process in a twenty-fifth embodiment of the present invention;
Figure 36 is a schematic view for use in explaining a process of deriving SAO parameters in a spatial rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention; Figure 37 is a flowchart for use in explaining a decoding process in the twenty-seventh embodiment; and
Figure 38 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments of the present invention. Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard. A video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
An image 2 of the sequence may be divided into slices 3. A slice may in some instances constitute an entire image. These slices are divided into non-overlapping Coding Tree Units (CTUs). A Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards. A CTU is also sometimes referred to as a Largest Coding Unit (LCU). A CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
A CTU is generally of size 64 pixels x 64 pixels. Each CTU may in turn be iteratively divided into smaller variable-size Coding Units (CUs) 5 using a quadtree decomposition.
Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU). The maximum size of a PU or TU is equal to the CU size. A Prediction Unit corresponds to the partition of the CU for prediction of pixels values. Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs. A Transform Unit is an elementary unit that is subjected to spatial transformation using DCT. A CU can be partitioned into TUs based on a quadtree representation 607.
Each slice is embedded in one Network Abstraction Layer (NAL) unit. In addition, the coding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC and H.264/AVC two kinds of parameter sets NAL units are employed: first, a Sequence Parameter Set (SPS) NAL unit that gathers all parameters that are unchanged during the whole video sequence. Typically, it handles the coding profile, the size of the video frames and other parameters. Secondly, a Picture Parameter Set (PPS) NAL unit includes parameters that may change from one image (or frame) to another of a sequence. HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream. The VPS is a new type of parameter set defined in HEVC, and applies to all of the layers of a bitstream. A layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer. HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer. Figure 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented. The data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200. The data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.1 la or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks. In a particular embodiment of the invention the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
The data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201. The server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
In order to obtain a better ratio of the quality of transmitted data to quantity of transmitted data, the compression of the video data may be for example in accordance with the HEVC format or H.264/AVC format.
The client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker.
Although a streaming scenario is considered in the example of Figure 2, it will be appreciated that in some embodiments of the invention the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
In one or more embodiments of the invention a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
Figure 3 schematically illustrates a processing device 300 configured to implement at least one embodiment of the present invention. The processing device 300 may be a device such as a micro-computer, a workstation or a light portable device. The device 300 comprises a communication bus 313 connected to: -a central processing unit 311, such as a microprocessor, denoted CPU;
-a read only memory 306, denoted ROM, for storing computer programs for implementing the invention;
-a random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention; and
-a communication interface 302 connected to a communication network 303 over which digital data to be processed are transmitted or received
Optionally, the apparatus 300 may also include the following components:
-a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention;
-a disk drive 305 for a disk 306, the disk drive being adapted to read data from the disk 306 or to write data onto said disk;
-a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means.
The apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
The communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
The disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
The executable code may be stored either in read only memory 306, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously. According to a variant, the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304.
The central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 304 or in the read only memory 306, are transferred into the random access memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Figure 4 illustrates a block diagram of an encoder according to at least one embodiment of the invention. The encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
An original sequence of digital images /Ό to m 401 is received as an input by the encoder
400. Each digital image is represented by a set of samples, known as pixels.
A bitstream 410 is output by the encoder 400 after implementation of the encoding process. The bitstream 410 comprises a plurality of encoding units or slices, each slice comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
The input digital images /Ό to m 401 are divided into blocks of pixels by module 402. The blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered). A coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405. Firstly a reference image from among a set of reference images 416 is selected, and a portion of the reference image, also called reference area or image portion, which is the closest area to the given block to be encoded, is selected by the motion estimation module 404. Motion compensation module 405 then predicts the block to be encoded using the selected area. The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405. The selected reference area is indicated by a motion vector.
Thus, in both cases (spatial and temporal prediction), a residual is computed by subtracting the prediction from the original block.
In the INTRA prediction implemented by module 403, a prediction direction is encoded. In the temporal prediction, at least one motion vector is encoded.
Information relative to the motion vector and the residual block is encoded if the Inter prediction is selected. To further reduce the bitrate, assuming that motion is homogeneous, the motion vector is encoded by difference with respect to a motion vector predictor. Motion vector predictors of a set of motion information predictors is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417.
The encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion. In order to further reduce redundancies a transform (such as DCT) is applied by transform module 407 to the residual block, the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bitstream 410.
The encoder 400 also performs decoding of the encoded image in order to produce a reference image for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames. The inverse quantization module 411 performs inverse quantization of the quantized data, followed by an inverse transform by reverse transform module 412. The reverse intra prediction module 413 uses the prediction information to determine which predictor to use for a given block and the reverse motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images 416. Post filtering is then applied by module 415 to filter the reconstructed frame of pixels. In the embodiments of the invention an SAO loop filter is used in which compensation offsets are added to the pixel values of the reconstructed pixels of the reconstructed image
Figure 5 is a flow chart illustrating steps of loop filtering process according to at least one embodiment of the invention. In an initial step 51 , the encoder generates the reconstruction of the full frame. Next, in step 52 a deblocking filter is applied on this first reconstruction in order to generate a deblocked reconstruction 53. The aim of the deblocking filter is to remove block artifacts generated by residual quantization and block motion compensation or block Intra prediction. These artifacts are visually important at low bitrates. The deblocking filter operates to smooth the block boundaries according to the characteristics of two neighboring blocks. The encoding mode of each block, the quantization parameters used for the residual coding, and the neighboring pixel differences in the boundary are taken into account. The same criterion/classification is applied for all frames and no additional data is transmitted. The deblocking filter improves the visual quality of the current frame by removing blocking artifacts and it also improves the motion estimation and motion compensation for subsequent frames. Indeed, high frequencies of the block artifact are removed, and so these high frequencies do not need to be compensated for with the texture residual of the following frames.
After the deblocking filter, the deblocked reconstruction is filtered by a sample adaptive offset (SAO) loop filter in step 54 using SAO parameters determined in accordance with embodiments of the invention. The resulting frame 55 may then be filtered with an adaptive loop filter (ALF) in step 56 to generate the reconstructed frame 57 which will be displayed and used as a reference frame for the following Inter frames.
In step 54 each pixel of the frame region is classified into a class or group. The same offset value is added to every pixel value which belongs to a certain class or group.
The derivation of the SAO parameters for the sample adaptive offset filtering in different embodiments of the present invention will be explained in more detail hereafter with reference to Figures 12 to 38.
Figure 6 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
The decoder 60 receives a bitstream 61 comprising encoding units, each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. As explained with respect to Figure 4, the encoded video data is entropy encoded, and the motion vector predictors’ indexes are encoded, for a given block, on a predetermined number of bits. The received encoded video data is entropy decoded by module 62. The residual data are then dequantized by module 63 and then a reverse transform is applied by module 64 to obtain pixel values.
The mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks of image data.
In the case of INTRA mode, an INTRA predictor is determined by intra reverse prediction module 65 based on the intra prediction mode specified in the bitstream.
If the mode is INTER, the motion prediction information is extracted from the bitstream so as to find the reference area used by the encoder. The motion prediction information is composed of the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual in order to obtain the motion vector by motion vector decoding module 70.
Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor, for the current block has been obtained the actual value of the motion vector associated with the current block can be decoded and used to apply reverse motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to apply the reverse motion compensation 66. The motion vector field data 71 is updated with the decoded motion vector in order to be used for the inverse prediction of subsequent decoded motion vectors.
Finally, a decoded block is obtained. Post filtering is applied by post filtering module 67 similarly to post filtering module 815 applied at the encoder as described with reference to
Figure 5. A decoded video signal 69 is finally provided by the decoder 60.
The aim of SAO filtering is to improve the quality of the reconstructed frame by sending additional data in the bitstream in contrast to the deblocking filter where no information is transmitted. As mentioned above, each pixel is classified into a predetermined class or group and the same offset value is added to every pixel sample of the same class/group. One offset is encoded in the bitstream for each class. SAO loop filtering has two SAO types: an Edge Offset (EO) type and a Band Offset (BO) type. An example of Edge Offset type is schematically illustrated in Figures 7A and 7B, and an example of Band Offset type is schematically illustrated in Figure 8. In HEVC, SAO filtering is applied CTU by CTU. In this case the parameters needed to perform the SAO filtering (set of SAO parameters) are selected for each CTU at the encoder side and the necessary parameters are decoded and/or derived for each CTU at the decoder side. This offers the possibility of easily encoding and decoding the video sequence by processing each CTU at once without introducing delays in the processing of the whole frame. Moreover, when SAO filtering is enabled, only one SAO type is used: either the Edge Offset type filter or the Band Offset type filter according to the related parameters transmitted in the bitstream for each classification. One of the SAO parameters in HEVC is an SAO type parameter saojypejdx which indicates for the CTU whether EO type, BO type or no SAO filtering is selected for the CTU concerned.
The SAO parameters for a given CTU can be copied from the upper or left CTU, for example, instead of transmitting all the SAO data. One of the SAO parameters in HEVC is a sao_merge_up flag, which when set indicates that the SAO parameters for the subject CTU should be copied from the upper CTU. Another of the SAO parameters in HEVC is a saojnergejeft flag, which when set indicates that the SAO parameters for the subject CTU should be copied from the left CTU.
SAO filtering may be applied independently for different color components (e.g. YUV) of the frame. For example, one set of SAO parameters may be provided for the luma component Y and another set of SAO parameters may be provided for both chroma components U and V in common. Also, within the set of SAO parameters one or more SAO parameters may be used as common filtering parameters for two or more color components, while other SAO parameters are dedicated (per-component) filtering parameters for the color components. For example, in HEVC, the SAO type parameter saojypejdx is common to U and V, and so is a EO class parameter which indicates a class for EO filtering (see below), whereas a BO class parameter which indicates a group of classes for BO filtering has dedicated (per-component) SAO parameters for U and V.
A description of the Edge Offset type in HEVC is now provided with reference to
Figures 7A and 7B.
Edge Offset type involves determining an edge index for each pixel by comparing its pixel value to the values of two neighboring pixels. Moreover, these two neighboring pixels depend on a parameter which indicates the direction of these two neighboring pixels with respect to the current pixel. These directions are the 0-degree (horizontal direction), 45-degree (diagonal direction), 90-degree (vertical direction) and 135-degree (second diagonal direction). These four directions are schematically illustrated in Figure 7A. The table of Figure 7B gives the offset value to be applied to the pixel value of a particular pixel“C” according to the value of the two neighboring pixels Cnl and Cn2 at the decoder side.
When the value of C is less than the two values of neighboring pixels Cnl and Cn2, the offset to be added to the pixel value of the pixel C is“+ 01”. When the pixel value of C is less than one pixel value of its neighboring pixels (either Cnl or Cn2) and C is equal to one value of its neighbors, the offset to be added to this pixel sample value is“+ 02”.
When the pixel value of c is less than one of the pixel values of its neighbors (Cnl or Cn2) and the pixel value of C is equal to one value of its neighbors, the offset to be applied to this pixel sample is“- 03”. When the value of C is greater than the two values of Cnl or Cn2, the offset to be applied to this pixel sample is“- 04”.
When none of the above conditions is met on the current sample and its neighbors, no offset value is added to the current pixel C as depicted by the Edge Index value“2” of the table.
It is important to note that for the particular case of the Edge Offset type, the absolute value of each offset (01, 02, 03, 04) is encoded in the bitstream. The sign to be applied to each offset depends on the edge index (or the Edge Index in the HEVC specifications) to which the current pixel belongs. According to the table represented in Figure 7B, for Edge Index 0 and for Edge Index 1 (01, 02) a positive offset is applied. For Edge Index 3 and Edge Index 4 (03, 04), a negative offset is applied to the current pixel.
In the HEVC specifications, the direction for the Edge Offset amongst the four directions of Figure 7A is specified in the bitstream by a sao_eo_class_luma” field for the luma component and a“sao_eo_class_chroma” field for both chroma components U and V.
The SAO Edge Index corresponding to the index value is obtained by the following formula:
Edgelndex = sign (C - Cn2) - sign (Cnl- C) +2
where the definition of the function sign(.) is given by the following relationships sign(x) = 1 , when x>0
sign(x) = -1, when x<0
sign(x) = 0, when x=0.
In order to simplify the Edge Offset determination for each pixel, the difference between the pixel value of C and the pixel value of both its neighboring pixels Cnl and Cn2 can be shared for current pixel C and its neighbors. Indeed, when SAO Edge Offset filtering is applied using a raster scan order of pixels of the current CTU or frame, the term sign (Cnl- C) has already computed for the previous pixels (to be precise it was computed as C’-Cn2’ at a time when the current pixel C’ at that time was the present neighboring pixel Cnl and the neighboring pixel Cn2’ was what is now the current pixel C). As a consequence this sign (cnl- c) does not need to be computed again.
A description of the Band Offset type is now provided with reference to Figure 8.
Band Offset type in SAO also depends on the pixel value of the sample to be processed. A class in SAO Band offset is defined as a range of pixel values. Conventionally, for all pixels within a range, the same offset is added to the pixel value. In the HEVC specifications, the number of offsets for the Band Offset filter is four for each reconstructed block or frame area of pixels (CTU), as schematically illustrated in Figure 8.
One implementation of SAO Band offset splits the full range of pixel values into 32 ranges of the same size. These 32 ranges are the bands (or classes) of SAO Band offset. The minimum value of the range of pixel values is systematically 0 and the maximum value depends on the bit depth of the pixel values according to the following relationship Max = 2Bltdepth-l Classifying the pixels into 32 ranges of the full interval includes 5 bits checking needed to classify the pixels values for fast implementation i.e. only the 5 first bits (5 most significant bits) are checked to classify a pixel into one of the 32 classes/ ranges of the full range.
For example, when the bitdepth is 8 bits per pixel, the maximum value of a pixel can be 255. Hence, the range of pixel values is between 0 and 255. For this bitdepth of 8 bits, each band or class contains 8 pixel values.
According to the HEVC specifications, a group 40 of bands, represented by the grey area (40), is used, the group having four successive bands 41, 42, 43 and 44, and information is signaled in the bitstream to identify the position of the group, for example the position of the first of the 4 bands. The syntax element representative of this position is the “ sao_band jpositiorT’ field in the HEVC specifications. This corresponds to the start of band
41 in Figure 8. According to the HEVC specifications, 4 offsets corresponding respectively to the 4 bands are signaled in the bitstream.
Figure 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications. The process of Figure 9 is applied for each CTU to generate a set of SAO parameters for all components. In order to avoid encoding one set of SAO parameters per CTU (which is very costly), a predictive scheme is used for the CTU mode. This predictive mode involves checking if the CTU on the left of the current CTU uses the same SAO parameters (this is specified in the bitstream through a flag named “ saojnergejeft Jlag”). If not, a second check is performed with the CTU above the current CTU (this is specified in the bitstream through a flag named“ sao_merge_up Jlag”). This predictive technique enables the amount of data representing the SAO parameters for the CTU mode to be reduced. Steps of the process are set out below.
In step 503, the“ saojnergejeft Jlag” is read from the bitstream 502 and decoded. If its value is true, then the process proceeds to step 504 where the SAO parameters of left CTU are copied for the current CTU. This enables the types for YUV of the SAO filter for the current CTU to be determined in step 508.
If the outcome is negative in step 503 then the“ saojnergejup Jlag” is read from the bitstream and decoded. If its value is true, then the process proceeds to step 505 where the SAO parameters of the above CTU are copied for the current CTU. This enables the types of the SAO filter for the current CTU to be determined in step 508.
If the outcome is negative in step 505, then the SAO parameters for the current CTU are read and decoded from the bitstream in step 507 for the Luma Y component and both U and V components (501) (551) for the type. The offsets for Chroma are independent.
The details of this step are described later with reference to Figure 10. After this step, the parameters are obtained and the type of SAO filter is determined in step 508.
In subsequent step 511 a check is performed to determine if the three colour components (Y and U & V) for the current CTU have been processed. If the outcome is positive, the determination of the SAO parameters for the three components is complete and the next
CTU can be processed in step 510. Otherwise, (Only Y was processed) U and V are processed together and the process restarts from initial step 512 previously described.
Figure 10 is a flow chart illustrating steps of a process of parsing of SAO parameters in the bitstream 601 at the decoder side. In an initial step 602, the”saoJypeJdxJC’ syntax element is read and decoded. The code word representing this syntax element can use a fixed length code or could use any method of arithmetic coding. The syntax element saoJypeJdx_X enables determination of the type of SAO applied for the frame area to be processed for the colour component Y or for both Chroma components U & V. For example, for a YUV 4:2:0 sequence, two components are considered: one for Y, and one for U and V. The ‘sao_typeJdx ’ can take 3 values as follows depending on the SAO type encoded in the bitstream.‘O’ corresponds to no SAO,‘U corresponds to the Band Offset case illustrated in Figure 8 and‘2’ corresponds to the Edge Offset type filter illustrated in Figures 3 A and 3B.
Incidentally, although YUV color components are used in HE VC (sometimes called Y, Cr and Cb components), it will be appreciated that in other video coding schemes other color components may be used, for example RGB color components. The techniques of the present invention are not limited to use with YUV color components. and can be used with RGB color components or any other color components.
In the same step 602, a test is performed to determine if the“sao_type_idx_X” is strictly positive. W‘sao ypeJdx_X” is equal to“0” signifying that there is no SAO for this frame area (CTU) for Y if X is set equal to Y and that there is no SAO for this frame area for U and V if X is set equal to U and V. The determination of the SAO parameters is complete and the process proceeds to step 608. Otherwise if the“ sao ypejdx” is strictly positive, this signifies that SAO parameters exist for this CTU in the bitstream.
Then the process proceeds to step 606 where a loop is performed for four iterations.
The four iterations are carried in step 607 where the absolute value of offset j is read and decoded from the bitstream. These four offsets correspond either to the four absolute values of the offsets (01, 02, 03, 04) of the four Edge indexes of SAO Edge Offset (see Figure 7B) or to the four absolute values of the offsets related to the four ranges of the SAO band Offset (see Figure 8).
Note that for the coding of an SAO offset, a first part is transmitted in the bitstream corresponding to the absolute value of the offset. This absolute value is coded with a unary code. The maximum value for an absolute value is given by the following formula:
MAX abs SAO offset value = (1 « (Min(bitDepth, l0)-5))-l
where « is the left (bit) shift operator.
This formula means that the maximum absolute value of an offset is 7 for a pixel value bitdepth of 8 bits, and 31 for a pixel value bitdepth of 10 bits and beyond.
The current HE VC standard amendment addressing extended bitdepth video sequences provides similar formula for a pixel value having a bitdepth of 12 bits and beyond. The absolute value decoded may be a quantized value which is dequantized before it is applied to pixel values at the decoder for SAO filtering. An indication of use or not of this quantification is transmitted in the slice header.
For Edge Offset type, only the absolute value is transmitted because the sign can be inferred as explained previously.
For Band Offset type, the sign is signaled in the bitstream as a second part of the offset if the absolute value of the offset is not equal to 0. The bit of the sign is bypassed when CAB AC is used.
After step 607, the process proceeds to step 603 where a test is performed to determine if the type of SAO corresponds to the Band Offset type (sao_type_idx_X == 1). If the outcome is positive, the signs of the offsets for the Band Offset mode are decoded in steps 609 and 610, except for each offset that has a zero value, before the following step 604 is performed in order to read in the bitstream and to decode the position“ sao_band jositionJC’ of the SAO band as illustrated in Figure 8.
If the outcome is negative in step 603 JsaoJypeJdxJC’ is set equal to 2), this signifies that the Edge Offset type is used. Consequently, the Edge Offset class
(corresponding to the direction 0, 45, 90 and 135 degrees) is extracted from the bitstream 601 in step 605. If X is equal to Y, the read syntax element is“sao eo class luma” and if X is set equal to U and V, the read syntax element is“sao eo class chroma”.
When the four offsets have been decoded, the reading of the SAO parameters is complete and the process proceeds to step 608.
Figure 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications, for example during the step 67 in Figure 6. In HEVC, this image part is a CTU. This same process 700 is also applied in the decoding loop (step 415 in Figure 4) at the encoder in order to produce the reference frames used for the motion estimation and compensation of the following frames. This process is related to the SAO filtering for one color component (thus suffix“_X” in the syntax elements has been omitted below).
An initial step 701 comprises determining the SAO filtering parameters according to processes depicted in Figures 9 and 10. The SAO filtering parameters are determined by the encoder and the encoded SAO parameters are included in the bitstream. Accordingly, on the decoder side in step 701 the decoder reads and decodes the parameters from the bitstream. Step 701 obtains the saojypejdx and if it equals 1 also obtains the sao_band josition 702 and if it equals 2 also obtains the sao o lass Junta or sao_eo_class_chroma (according to the color component processed). If the element saojypejdx is equal to 0 the SAO filtering is not applied. Step 701 obtains also an offsets table 703 of the 4 offsets.
A variable i, used to successively consider each pixel Pi of the current block or frame area (CTU), is set to 0 in step 704. Incidentally,“frame area” and“image area” are used interchangeably in the present specification. A frame area in this example is a CTU in the p
HEVC standard. In step 706, pixel ' is extracted from the frame area 705 which contains N p
pixels. This pixel ' is classified in step 707 according to the Edge offset classification described with reference to Figures 7A & 7B or Band offset classification as described with reference to Figure 8. The decision module 708 tests if r‘ is in a class that is to be filtered using the conventional SAO filtering.
p
If ' is in a filtered class, the related class number j is identified and the related offset
Offset
value J is extracted in step 710 from the offsets table 703. In the case of the conventional
Offset P.
SAO filtering this J is then added to the pixel value ' in step 711 in order to produce the filtered pixel value P '' 712. This filtered pixel P '' is inserted in step 713 into the filtered frame area 716.
P p
If 1 is not in a class to be SAO filtered then ' (709) is inserted in step 713 into the filtered frame area 716 without filtering.
After step 713, the variable i is incremented in step 714 in order to filter the subsequent pixels of the current frame area 705 (if any - test 715). After all the pixels have been processed (i>=N) in step 715, the filtered frame area 716 is reconstructed and can be added to the SAO reconstructed frame (see frame 68 of Figure 6 or 416 of Figure 4).
As noted above, the JVET exploration model (JEM) for the future VVC standard uses all the HEVC tools. One of these tools is sample adaptive offset (SAO) filtering. However, SAO is less efficient in the JEM reference software than in the HEVC reference software. This arises from fewer evaluations and from signalling inefficiencies compared to other loop filters.
Embodiments of the present invention described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in a current image from one or more SAO parameters of a collocated image part in a reference image. These techniques may be referred to as temporal derivation techniques for SAO parameters. Further embodiments described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in an image from one or more SAO parameters of another image part of the same image. These techniques may be referred to as spatial derivation techniques for SAO parameters.
First group of embodiments
A first group of embodiments focusses on improving the signalling efficiency. In HEVC, SAO filtering is performed CTU by CTU. Temporal derivation of SAO parameters is not used in HEVC. In the first group of embodiments, temporal derivation is introduced. However, to improve the signalling efficiency, a group of image parts is formed and the use of temporal prediction is signalled for this group of image parts, rather than for each image part individually. For each image part of the group temporal derivation is used to derive at least one of the SAO parameters of the image part from at least one SAO parameter of a collocated image part in a reference image. The collocated image part in the reference image therefore serves as a source image part for the image part to be derived. As a result, different image parts of the group can have different SAO parameters depending on the SAO parameters of the respective collocated image parts. Accordingly, with very light signalling, image parts belonging to a given group of image parts can use temporal derivation and benefit from different (and efficient) SAO parameters.
Here, a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1. A group could also be NxN CTUs, where N is an integer greater than 1 , or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
Alternatively, a group of image parts can be a CTU, and each constituent block of the CTU can be an image part. In such a case, each block of a CTU may have its own SAO parameters, but the signalling to use temporal derivation of the SAO parameters can be made for the CTU as a whole.
In the simplest case, where there is only one type of temporal derivation, a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
The manner in which the SAO parameters are derived in the temporal derivation is not particularly limited except that at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part in a reference image. In the simplest case, the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part by copying the SAO parameter of the collocated image part. One, more than one, or all SAO parameters may be copied. Alternatively, one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
Instead of copying a temporal derivation of SAO parameters which involves modifying one, more than one, or all SAO parameters of the collocated image part (source image part) may also be used, as described later.
First embodiment In the first embodiment, the group of image parts is a whole image. Referring now to Figure 12 each CTU of a current image 2001 derives its SAO parameters temporally from a collocated CTU in a reference image 2002. For example, the SAO parameters for the CTU 2003 in the current image 2001 are obtained by copying the SAO parameters from its collocated
CTU 2005 in the reference image 2002. Similarly, the SAO parameters for the CTU 2004 in the current image 2001 are obtained by copying the SAO parameters from its collocated CTU 2006 in the reference image 2002. In this example, CTU 2005 uses EO filtering with a direction of 0 degrees, and CTU 2006 uses BO filtering. As a result of copying the SAO parameters from CTU 2005 the CTU 2003 also uses EO filtering with a direction of 0 degrees. As a result of copying the SAO parameters from CTU 2006 the CTU 2004 also uses BO filtering. Although not shown in Figure 12, all the SAO parameters are copied in this embodiment, including the SAO type parameter sao ypejdx, parameters such as EO class (specifying a direction of EO filtering) and BO group sao_band jposition (specifying a first class of a group of classes), and offsets.
It can be seen that different CTUs such as CTU 2003 and CTU 2004 within the same CTU group can have different SAO parameters, even though the use of temporal derivation is signalled once for the whole CTU group (whole image in this embodiment).
Incidentally, in some video coding systems, including HEVC, different images may be subjected to different partitioning into image parts (LCUs, CTUs, CTBs etc.). As a result, the current image 2001 and its reference image 2002 may have different partitionings. In that case, there may not be an exact match in size and/or position between, say, the CTU 2003 in the current image 2001 and its“collocated” CTU 2005 in the reference image 2002. However, this is not a problem, as long as there is a suitable mechanism (known to the encoder and decoder) to associate a CTU in the current image 2001 with a“collocated” CTU in the reference image. For example, the mapping may identify the CTU in the reference image closest in position to the CTU in the current image. The closest position may be based on any suitable reference position in the CTUs concerned, for example the top-left position of each CTU.
Figure 13 is a flow chart for use in explaining a method of decoding an image in the first embodiment.
In step 2101 a first syntax element is read from the bitstream 2103 and decoded. This first syntax element in this example is a simple temporal merge flag which indicates for the whole image whether or not temporal derivation of SAO parameters is to be used. In step 2102 it is checked if the syntax element indicates temporal derivation is to be used. If the outcome is“YES” at least a second syntax element is extracted from the bitstream. This second syntax element is a reference frame index refidx which identifies a reference image to be used for the temporal derivation. If bidirectional temporal prediction is used, a third syntax element is extracted from the bitstream 2103. This is a list index Ly indicating whether the reference frame index is from List 0 (L0) or List 1 (Ll). In this embodiment, the same reference frame is used for the temporal derivation of SAO parameters in all CTUs of the group (whole image).
In the context of temporal prediction, a reference image means another image of a sequence of images (previous or future image) which is used to perform temporal prediction for an image to be encoded. In the context of SAO parameter derivation, a reference image means another image of the sequence (previous or future image which is used to perform temporal derivation of SAO parameters. The reference images for the temporal derivation of SAO parameters may be the same as the reference images for the temporal prediction, or may be different.
Incidentally, the HEVC specification uses the term“reference frame” instead of “reference image” and refidx is usually referred to as a reference frame index accordingly. The terms “reference image” and “reference frame” are used interchangeably in the present specification.
A loop through all the CTUs of the image is then started in step 2105. In this embodiment, the decoder has a storage unit 2106, which may be called a Decoded Picture Buffer (DPB), which stores the SAO parameters for each CTU of the reference image. Preferably, the DPB 2106 stores the SAO parameters for each CTU explicitly, without relying on merge flags such as merge_up and mergejeft because reading merge flags as part of the SAO parameters temporal derivation increases the complexity and slows down the derivation.
In step 2107 the SAO parameters stored in the DPB 2106 for the collocated CTU in the reference image identified by refidx or by Ly and refidx are obtained. These are then set as the SAO parameters 2108 for the current CTU. In this embodiment it is assumed that the SAO parameters comprise dedicated SAO parameters for each color component X (Y, U, V) and in steps 2109-2111 SAO filtering is performed for each color component in turn using the dedicated SAO parameters for the color component concerned. As noted above, in other embodiments, the SAO parameters may be common to two or more components, for example U and V, or a mixture of common and dedicated (per-component) SAO parameters may be used. After finishing the processing of the current CTU processing moves to the next CTU, or the processing ends if the current CTU is the last CTU of the image. It will be appreciated that in step 2107 no SAO parameters will be obtainable from the collocated CTU in the reference image if the collocated CTU did not use SAO filtering (saojypejdx = 0). This situation is addressed in subsequent embodiments. It means that even when temporal derivation of SAO parameters is applied to a group of image parts in the present invention, there may be some image parts of the group that do not derive an SAO parameter from an SAO parameter of the respective collocated image part. For example, in the simplest case, no SAO filtering may be performed on an image part for which the SAO parameters are unobtainable using the temporal derivation.
Although not shown in Figure 13, if the outcome of the test in step 2012 is that the SAO parameters derivation is not the temporal derivation, the SAO parameters for the CTUs of the group (whole image in this case) are read from the bitstream, for example using the process of Figure 5.
Figure 13 relates to the steps carried out on the decoder side. The steps involve reading and decoding the syntax elements for the group of image parts (whole image in this case) from the bitstream and then performing SAO filtering on the image parts of the group. On the encoder side, the same SAO filtering as on the decoder side is performed on the image parts of the group to ensure that the encoder has the same reference images as the decoder. This means using the same derivation of SAO parameters as the decoder and using the same reference image for the temporal derivation as the decoder. However, on the encoder side the syntax elements do not need to be read and decoded from the bitstream, as the related information is available in the encoder already. The determination of whether or not to use the temporal derivation of SAO parameters for the group (whole image in this case) is made on the encoder side in this embodiment. Similarly, the choice of reference image for the temporal derivation is made on the encoder side.
In a variant of the first embodiment, the reference image is simply the first reference image of the first list L0. In that case, no syntax elements are necessary to identify refidx and Ly and step 2104 can be omitted. This removes some signalling and simplifies the decoder design. Second Embodiment
In the first embodiment, the group of image parts was a whole image. In the second embodiment, the group of image parts is a slice. As a result, it is possible to use temporal derivation of SAO parameters in one slice of an image but not in another slice of the same image. This can lead to a better adaptation of the SAO parameters to the image characteristics in different parts of an image at the expense of a small increase in the signalling.
The decoding is the same as described in connection with the first embodiment except that the first syntax element indicates for the slice (as opposed to for the whole image) whether or not temporal derivation of SAO parameters is to be used and the second syntax element (or the second and third syntax elements in combination) identifies one reference image for the temporal derivation of SAO parameters of the CTUs of the slice.
Third Embodiment
In the first and second embodiments, a single reference frame is signalled for the frame or slice (or, in the variant, inferred without signalling) and the SAO parameters for CTUs of the frame or slice must come from that reference frame. As only one reference frame is used, the number of CTUs for which the collocated CTU in that one reference frame uses SAO may be limited, resulting in a limitation on the number of CTUs subjected to SAO filtering.
In the third embodiment, the group is a slice. No reference frame is signalled in the slice header but the SAO parameters for CTUs of the frame may come from different reference frames.
Figure 14 is a flow chart for use in explaining a method of decoding an image in the third embodiment.
In step 2201 a first syntax element is read from the bitstream 2103 and decoded. This first syntax element indicates for the slice or for the whole image whether or not temporal derivation of SAO parameters is to be used. In step 2202 it is checked if the syntax element indicates temporal derivation is to be used.
Although not shown in Figure 14, if the outcome of the test in step 2202 is that the
SAO parameters derivation is not the temporal derivation, the SAO parameters for the CTUs of the group (slice in this case) are read from the bitstream, for example using the process of
Figure 5.
Unlike in the first embodiment, if the outcome is“YES”, no syntax element (reference frame index refidx is extracted from the bitstream. No list index Ly is extracted either. A first, outer loop through all the CTUs of the current image is then started in step 2205. A second, inner loop through all the possible reference frames from 0 to MAXrefidx is also started in 2203. The Decoded Picture Buffer (DPB) stores the SAO parameters for each CTU of all the reference frames. In step 2207 the SAO parameters stored in the DPB 2106 are obtained for the collocated CTU in the reference image under consideration identified by refidx or by Ly and refidx. In the reference frame under consideration the collocated CTU may not have used SAO (saojypejdx is“no SAO”), in which case no SAO parameters may be obtainable in step 2207. Step 2204 tests for this outcome (“yes”) and if so the second loop moves on to the next reference frame. If SAO parameters are obtained in step 2207 (outcome“no”) the second loop ends and the obtained SAO parameters are used to perform SAO filtering on the three color components (independent filtering in this embodiment but other variants are possible as described in connection with the first embodiment).
The test in step 2204 may be based on the luma component alone (i.e. whether saojypejdx is“no SAO” for the luma color component alone), in which case the SAO parameters for all three color components are considered unobtainable from the reference frame under consideration if the luma component SAO parameters are unobtainable. Alternatively, in step 2204 each color component may be treated separately, or luma may be treated separately from the two chroma components.
The first loop continues CTU by CTU through all the CTUs of the slice or frame.
This embodiment improves the coding efficiency compared to the first and second embodiments by increasing the number of CTUs subjected to SAO filtering.
In a variant, the reference frames in each list L0 and Ll are ordered. In each ordered list, a first reference frame, more likely to have useful SAO parameters than a second reference frame, is placed ahead of the second reference frame.
In this variant, for example, the first reference frame may be the closest reference frame to the current frame. Alternatively, the first reference frame may be the frame with the best quality in the list of reference frames. Any suitable measure of quality may be used, for example the quantisation parameter (QP) could be used as a measure of quality. In this case, the frame with the lowest QP could be chosen as the first reference frame. The first reference frame could also be chosen based on how often it is used for temporal prediction in the current slice or frame. This could be a good criterion at the encoder side but not for a decoder as it involves building statistics for the whole frame before applying SAO. Incidentally, generally the first reference in the first list of reference frame is the most selected reference. That means the first reference frame in each list is processed before the second reference frame and SAO parameters are picked up preferentially from the first reference frame compared to the second reference frame. Preferably, in each list the reference frames are ordered from best to worst in terms of coding efficiency. As described above, in the third embodiment, different reference images may be used for the temporal derivation of SAO parameters for different image parts of the group. A particular reference image is identified by searching through a plurality of available reference images and selecting a reference image whose collocated image part satisfies at least one search condition. The search condition may be that said collocated image part uses SAO filtering. The reference images may be searched in order from highest coding efficiency to lowest coding efficiency.
No item of information identifying the particular reference image is included in the bitstream in this embodiment. This saves signalling and simplifies the design. The encoder and decoder can both identify the same particular reference image by performing the same search through the available reference images.
Fourth Embodiment In the third embodiment SAO parameters are derived temporally whether the SAO type is BO or EO. In the fourth embodiment, SAO parameters are only derived temporally only when the SAO type is EO. This is because the EO type is generally more efficient than the BO type. This can be implemented by modifying step 2204 in Figure 14 to test for“no SAO” or “BO”, instead of just“no SAO”. The search condition in the fourth embodiment is that the collocated image part uses edge-type SAO filtering
In a variant, if for a subject CTU it is found that none of the reference frames uses EO at the collocated CTU as a result of the search in the second loop started in step 2203, then a secondary search may be performed through the reference frames to find if there is a collocated CTU in one of those reference frames that uses BO, in which case BO may still be used for the temporal derivation of the subject CTU.
This variant results in performing a first search through the available reference images using a first search condition and if none of the available reference images satisfies the first search condition performing a second search through the available reference images using a second search condition different from the first search condition. The first search condition may be that the collocated image part uses edge-type SAO filtering and the second search condition may be that said collocated image part uses band-type SAO filtering.
Fifth Embodiment In the preceding embodiments, if the collocated CTU in the reference frame has“no SAO”, or none of the collocated CTUs in any of the plural reference frames uses SAO, no SAO parameters can be obtained. In the fifth embodiment, in this situation a default set of SAO parameters is used. This default set may be determined by the encoder and transmitted to the decoder, for example in the sequence parameter set or per slice. This is efficient because the default set may be optimised for the sequence or for the slice by the encoder.
Sixth Embodiment In the second embodiment the temporal derivation is determined for a slice and the second syntax element identifies a single reference frame for SAO parameter derivation in CTUs of the slice. The second syntax element is at the slice level. However, when the temporal derivation is determined for the slice it is possible instead to use a syntax element at the CTU level to identify a reference frame for SAO parameter derivation of the CTU concerned. This approach is taken in the sixth embodiment.
Figure 15 is a flow chart for use in explaining a method of decoding an image in the sixth embodiment.
In step 2401 a first syntax element is read from the bitstream 2403 and decoded. This first syntax element indicates for the slice whether or not temporal derivation of SAO parameters is to be used. In step 2402 it is checked if the syntax element indicates temporal derivation is to be used. If the outcome is“NO” the process of Figure 5 may be used, as already described in connection with the first embodiment. If the outcome is“YES” a loop through all the CTUs of the current slice is then started in step 2405. For the current CTU at least a second syntax element is extracted from the bitstream in step 2404. This second syntax element is a reference frame index refidx which identifies a reference image to be used for the temporal derivation. If bidirectional temporal prediction is used, a third syntax element is extracted from the bitstream 2403 in step 2404 as well. This is a list index Ly indicating whether the reference index is from List 0 (L0) or List 1 (Ll).
In step 2407 the SAO parameters stored in a DPB 2406 for the collocated CTU in the reference image identified by refidx or by Ly and refidx are obtained. These are then set as the SAO parameters 2408 for the current CTU. In this embodiment it is assumed that the SAO parameters comprise dedicated SAO parameters for each color component X (Y, U, V) and in steps 2409-2411 SAO filtering is performed for each color component in turn using the dedicated SAO parameters for the color component concerned. As noted above, in other embodiments, the SAO parameters may be common to two or more components, for example U and V, or a mixture of common and dedicated (per-component) SAO parameters may be used. After finishing the processing of the current CTU processing moves to the next CTU in step 2412, or the processing ends if the current CTU is the last CTU of the image.
In the sixth embodiment, different reference images may be used for the temporal derivation of SAO parameters for different image parts of the group, as in the third embodiment. Unlike in the third embodiment, it is not necessary for the decoder to search for the particular reference image and the particular reference image is identified by an item of information included in the bitstream. This simplifies the decoder design.
Seventh Embodiment
In the sixth embodiment the per-CTU reference frames for SAO temporal derivation are signalled using refidx or using Ly and refidx. This has the advantage that the signalling is the same as the traditional signalling of reference frames for temporal prediction. However, this way of signalling the SAO-temporal-derivation reference frames is very costly because generally the lists for temporal prediction contain redundant frames. Indeed, a reference frame can be in both list L0 and Ll . Moreover when the weighted reference tools are used a list may contain the same reference frame several times. It is not necessary to signal each occurrence of the same reference frame and removing redundant frames can save the signalling rate associated with them. These redundancies can be removed by checking that each reference frame in the list has a POC (Picture Order Count) different from all other reference frames.
In the seventh embodiment a specific list of reference frames for SAO temporal derivation is created, distinct from the lists LO and Ll used for temporal prediction, and the reference frame for SAO temporal derivation for each CTU is signalled based on this specific list, for example using a syntax element SAO reference frame index representing a reference frame.
Preferably the specific list contains non-redundant reference frames. This in order to reduce the rate dedicated to the syntax element obtained in step 2404 in Figure 15. It corresponds to a merge between the two lists L0 and Ll . Figure 16 is a flow chart illustrating a process to build a non-redundant list SAO Ref List of reference frames for SAO temporal derivation. In a first variant this process is carried out only in the encoder. In a second variant both the encoder and the decoder carry out this process. The lists of reference frames (list L0, 2501, and list Ll, 2502) are the input of this process. SAO Ref List is empty at the beginning of the process. For each reference frame number i from 0 to the maximum number in both lists (2503), the step 2504 tests if the reference frame number i of list L0 Ref i LO is already in the list of reference frames for SAO (SAO Ref List) (2508). If Ref i LO is not in SAO Ref List, Ref i LO is added (2505) in SAO Ref List. In the same way, the reference frame number i in the list Ll Ref i Ll is added (2507) to SAO Ref List if it is not already present (2506).
In the first variant SAO Ref List is signalled explicitly in the bitstream, for example in the slice header. In the second variant SAO Ref List is not signalled by the encoder and instead the decoder creates the same list by following the same list creation process as the encoder. The advantage of using the second variant is to avoid the explicit signalling of the reference frame list for SAO at slice level in order to reduce the rate of the slice header. Yet, for some applications, it is preferable to explicitly signal this list. This can be particularly efficient when SAO is disabled for some reference frames for a given slice.
Both variants have the advantage that when a reference frame is signalled per CTU (as in step 2404 in Figure 15) the syntax element representing the reference frame can be more compact and efficient since there are fewer frames in SAO Ref List than in L0 and Ll together.
In a variant, the maximum number of reference frames in the list SAO Ref List is explicitly signalled in the slice header. The advantage of this variant is to reduce the rate dedicated to the syntax element representing the reference frame in step 2404.
In the process of Figure 16, a reference frame from L0 or Ll is added to the list SAO Ref List in step 2504 or in step 2506 if it is new (not already in the list). In another variant, a new reference frame is added to S AO Ref List only if SAO is enabled for at least one color component, i.e. the conditions in steps 2504 and 2506 are modified. For example, SAO may be disabled for each slice independently. Indeed, there may a first flag for disabling SAO at the slice level for the luma component, and a second flag for disabling SAO at the slice level for both chroma components (common flag) or separate flags for all three components. The advantage is that some unnecessary reference frames are omitted from SAO Ref List, enabling a reduction of the signalling of the syntax element representing a reference frame for SAO obtained in step 2404.
There are several options for coding the syntax element representing the reference frame index in the SAO Ref List for SAO derivation.
A first option is to code the syntax element with a fixed length code.
A second option is to code the syntax element with a unary max code , where the“max” is the number of reference frames in SAO Ref List. In a third option, an arithmetic coding can be used. The arithmetic coding can be applied on top unary max code of fixed length code.
For the unary code or arithmetic coding options, the list of possible SAO parameters set collocated with the current CTU in the previous reference frames of the SAO Ref List, is reduced by checking if SAO is enabled for each collocated CTU or if the SAO parameters set is redundant. The reduction can be achieved by comparing the SAO parameters in the different SAO parameter sets including the classification result (e.g. the edge direction sao_eo_class ) and the related offsets. In that case, the syntax element representing the reference frame for SAO derivation is a syntax element representing the position of the SAO parameters set in the reduced list. The advantage of this embodiment is that the set of possible SAO parameters sets is reduced and the number of rate dedicated to its signalling is reduced especially when a unary max code is used.
Another way is to use the exact needed number of bits to signal which SAO parameters set have been selected at the encoder side among the possible temporal SAO parameters sets. But in that case, the bitstream is not parseable without this SAO checking which is not recommended for many video applications using network. The size of SAO Ref List will vary from one slice to the next. If the number of bits of the index is allowed to vary too it could be efficient (saving some bits when the size is below the maximum) but parsing in the decoder then requires the decoder to reconstruct SAO Ref List. To simplify the parsing of the bitstream when list reduction is used it is possible for the encoder to signal in the bitstream the number of elements in the reduced SAO Ref List (that is the number of possible temporal SAO parameters sets). This enables the decoder to know the amount of bits dedicated to the signalling of the index without having to do any list reduction before parsing.
The possibilities described in connection with this embodiment are also applicable when the same reference image is used for all CTUs of the slice as in the second embodiment (step 2104 in Figure 13). In this case, as the list (SAO Ref List) of reference images for SAO temporal derivation contains fewer reference images than L0 and Ll in combination, signalling a reference image from the more concise list SAO Ref List is more concise than signalling a reference image from L0 and LL
As described above, the seventh embodiment creates a list of reference images for the temporal derivation of SAO parameters based on one or more lists of reference images used for temporal prediction of the image parts of the group, wherein one or more reference images among the one or more temporal-prediction lists are excluded from the list of reference images for the temporal derivation of SAO parameters. A reference image for the temporal derivation of SAO parameters is then selected from the list of reference images for the temporal derivation of SAO parameters. Redundant reference images among the one or more temporal-prediction lists may be excluded from the list of reference images for the temporal derivation of SAO parameters. Alternatively, or in addition, reference images whose respective collocated image parts do not use SAO filtering or whose respective collocated parts do not use edge-type SAO filtering may be excluded from the list of reference images for the temporal derivation of SAO parameters. It may be effective to impose a maximum on a number of reference images includable in the list of reference images for the temporal derivation of SAO parameters.
The decoder may create the same list of reference images for the temporal derivation of SAO parameters as the encoder. In this case, the list does not need to be signalled in the bitstream, which can reduce the rate of the slice header. However, it is possible for only the encoder to create the list of reference images for the temporal derivation of SAO parameters based on one or more lists of reference images used for temporal prediction of the image parts of the group, the list of reference images for the temporal derivation of SAO parameters being signalled explicitly. This can still be effective when SAO is disabled for some reference frames used for temporal prediction as the list of reference images for the temporal derivation of SAO parameters may then be suitably compact.
Eighth Embodiment
The eighth embodiment relates to an encoding process. In the preceding embodiments a temporal derivation of SAO parameters is applied to a group of image parts. For example, in the first embodiment a temporal derivation is applied to a whole image. In the second embodiment a temporal derivation is applied to a slice.
In the eighth embodiment, when the temporal derivation is not applied a non-temporal derivation of the SAO parameters is used in which SAO parameters are determined by the encoder for each image part (CTU) and signalled in the bitstream. This may be referred to as a CTU-level non-temporal derivation of SAO parameters. The decoder reads from the bitstream the first syntax element (e.g. temporal merge flag) and when it indicates temporal derivation is not applied to the group the decoder reads the per-CTU SAO parameters from the bitstream and filters each CTU according to the SAO parameters for the CTU concerned, for example using the decoding process of Figure 5. In the eighth embodiment the temporal derivation and the CTU-level non-temporal derivation are available derivations and the encoder selects one of them to apply to the group (e.g. frame or slice).
Figure 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in the CTU-level non-temporal derivation of SAO parameters. The process starts with a current CTU (1101). First the statistics for all possible SAO types and classes are accumulated in the variable CTUStats (1102). The process of Step 1102 is described below with reference to Figure 18. According to the value set in the variable CTUStats, the RD cost for the SAO merge Left is evaluated if the Left CTU is in the current Slice (1103) as the RD cost of the SAO Merge UP (1104). Thanks to the statistics in CTUStats (1102), new SAO parameters are evaluated for Luma (1105) and for both Chroma components (1109). (Both Chroma components because the Chroma components share the same SAO type in the HEVC standard). For each SAO type (1006), the best RD offsets and other parameters for Band offset classification are obtained (1107). Steps 1107 and 1110 are explained below for Edge and Band classification with reference to Figure 19 and Figure 20 respectively. All RD costs are computed thanks to their respective SAO parameters (1108). In the same way for both Chroma components, the optimal RD offsets and parameters are selected (1111). All this RD costs are compared in order to select the best SAO parameters set (1115). These RD costs are also compared to disable SAO independently for the Luma and the Chroma components (1113, 1114). The use of a new SAO parameters set (1115) is compared to the SAO parameters set“Merging” or sharing (1116) from the left and up CTU.
Figure 18 is a flow chart illustrating steps of an example of a statistics computed at the encoder side that can be applied for the Edge Offset type filter, in the case of the conventional SAO filtering. The similar approach may also be used for the Band Offset type filter.
Figure 18 illustrates the setting of the variable CTUStats containing all information needed to derive each best rate distortion offsets for each class. Moreover, it illustrates the selection of the best SAO parameters set for the current CTU. For each colour component Y, U, V (or RGB) (811) each SAO type is evaluated. For each SAO type (812) the variables Sum } and SumNbPiXj are set to zero in an initial step 801. The current frame area 803 contains N pixels. j is the current range number to determine the four offsets (related to the four edge indexes shown in Figure 7B for Edge Offset type or to the 32 ranges of pixel values shown in Figure 8 for Band Offset type). Sum j is the sum of the differences between the pixels in the range j and their original pixels. SumNbPiXj is the number of pixels in the frame area, the pixel value of which belongs to the range j .
In step 802, a variable i, used to successively consider each pixel Pi of the current frame area, is set to zero. Then, the first pixel Pt of the frame area 803 is extracted in step 804.
In step 805, the class of the current pixel is determined by checking the conditions defined in Figure 7B. Then a test is performed in step 805. During step 805, a check is performed to determine if the class of the pixel value Pt corresponds to the value“none of the above” of
Figure 7B.
If the outcome is positive, then the value“i” is incremented in step 808 in order to consider the next pixels of the frame area 803.
Otherwise, if the outcome is negative in step 806, the next step is 807 where the related SumNbPix } (i.e. the sum of the number of pixels for the class determined in step 805) is incremented and the difference between P; and its original value P·"' is added to Sum j . In the next step 808, the variable i is incremented in order to consider the next pixels of the frame area 803.
Then a test is performed to determine if all pixels have been considered and classified. If the outcome is negative, the process loops back to step 804 described above. Otherwise, if the outcome is positive, the process proceeds to step 810 where the variable CTUStats for the current colour component X and the SAO type SAO type and the current class j are set equal to Sum j for the first value and SumNbPix . for the second value. These variables can be used to compute for example the optimal offset parameter Offset . of each class j. This offset Offset . may be the average of the differences between the pixels of class j and their original values. Thus, Offset . is given by the following formula:
Sum ,
Offset = - - -
SumNbPiXj Note that the offset Offset . is an integer value. As a consequence, the ratio defined in this formula may be rounded, either to the closest value or using the ceiling or floor function.
Each offset Offset . is an optimal offset Ooptj in terms of distortion
To evaluate an RD cost for a merge of SAO parameters, the encoder uses the statistics set in table CTUCStats. According to the following examples for the SAO Merge Left and by considering the type for Luma Left Type Y and the four related offsets O Left O, O Left l, 0_Left_2, 0_Left_3, the distortion can be obtained by the following formula:
Distortion Left Y =
(CTUStats[Y][ Left_Type_Y][0][l] x O Left O x O Left O - CTUStats[Y][ Left_Type_Y][0][0] x O Left O x 2)» Shift
+ (CTUStats[Y][ Left_Type_Y][l][l] x O Left l
Figure imgf000036_0001
O Left l -
CTUStats[Y][ Left_Type_Y][l][0] x O Left l x 2)» Shift
+ (CTUStats[Y][ Left_Type_Y][2][l] x 0_Left_2
Figure imgf000036_0002
O Left 2 -
CTUStats[Y][ Left_Type_Y][2][0] x 0_Left_2 x 2)» Shift
+ (CTUStats[Y][ Left_Type_Y][3][l] x O Left 3 x O Left 3 CTUStats[Y][ Left_Type_Y][3][0] x 0_Left_3 x 2)» Shift
The variable Shift is designed for a distortion adjustment. The distortion should be negative as SAO is a post filtering.
The same computing is applied for Chroma components. The Lambda of the Rate distortion cost is fixed for the three components. For an SAO parameters merged with the left CTU, the rate is only 1 flag which is CABAC coded.
The encoding process illustrated in Figure 19 is applied in order to find the best offset in terms of rate distortion criterion, offset referred to as ORDj. This process is applied in steps 1109 to 1112. In an initial step 901 of the encoding process of Figure 19, the rate distortion value
Jj is initialized to the maximum possible value. Then a loop on Oj from Ooptj to 0 is applied in step 902. Note that Oj is modified by 1 at each new iteration of the loop. If Ooptj is negative, the value Oj is incremented and if Ooptj is positive, the value Oj is decremented. The rate distortion cost related to Oj is computed in step 903 according to the following formula: J(Oj)= SumNbPiX) x Oj x Oj - Sumj x Oj x 2 + l R(Oj) where l is the Lagrange parameter and R(Oj) is a function which provides the number of bits needed for the code word associated with Oj.
Formula‘SumNbPix ) x Oj x Oj - Sumj x Oj x 2’ gives the improvement in terms of the distortion provided by the use of the offset Oj. If J(Oj) is inferior to Jj then Jj = J(Oj) and ORDj is equal to Oj in step 904. If Oj is equal to 0 in step 905, the loop ends and the best ORDj for the class j is selected.
This algorithm of Figures 18 and 19 provides a best ORDj for each class j. This algorithm is repeated for each of the four directions of Figure 7A. Then the direction that provides the best rate distortion cost (sum of Jj for each direction) is selected as the direction to be used for the current CTU.
This algorithm (Figures 18 and 19) for selecting the offset values at the encoder side for the Edge offset tool can be easily applied to the Band Offset filter to select the best position (SAO_band_position) where j is in the interval [0,32] instead of the interval [1,4] in Figure 18. It involves changing the value 4 to 32 in modules 801, 810, 811. More specifically, for the 32 classes of Figure 8, the parameter Sunij (j=[0,32]) is computed. This corresponds to computing for each range j, the difference between the current pixel value (Pi) and its original value (Porgi), each pixel of the image belonging to a single range j. Then the best offset in terms of rate distortion ORDj is computed for the 32 classes, with the same process as described in Figure 19.
The next step involves finding the best position of the S AO band position of Figure 8. This is determined with the encoding process set out in Figure 20. The RD cost Jj for each range has been computed with the encoding process of Figure 19 with the optimal offset ORDj in terms of rate distortion. In Figure 20, in an initial step 1001 the rate distortion value J is initialized to the maximum possible value. Then a loop on the 28 positions j of 4 consecutive classes is run in step 1002. Next, the variable Jj corresponding to the RD cost of the band (of 4 consecutive classes) is initialized to 0 in step 1003. Then the loop on the four consecutive offset j is run in step 1004. Ji is incremented by the RD costs of the four classes Jj in step 1005 (j=i to i+4).
If this cost Ji is inferior to the Best RD cost J, J is set to Ji, and sao_band _position = i in step 1007, and the next step is step 1008. Otherwise, the next step is step 1008.
Test 1008 checks whether or not the loop on the 28 positions has ended. If not, the process continues in step 1002, otherwise the encoding process returns the best band position as being the current value of sao_band _position 1009. Thus, the CTUStats table in the case of determining the SAO parameters at the
CTU level is created by the process of Figure 17. This corresponds to evaluating the CTU level in terms of the rate-distortion compromise. The evaluation may be performed for the whole image or for just the current slice.
A further evaluation is carried out for the temporal derivation. Again, this temporal derivation may apply to the whole image or just to the current slice. Figure 21 shows the RD cost evaluation of temporal derivation at Slice level. First the distortion for the current colour component X is set equal to 0 (1601). For each CTU number nbCTU from 0 to LastCTU (1602), the temporal SAO parameters set of the collocated CTU in a reference frame (Ly, refidx) (1605) is extracted (1604) from the DPB (1603). If the SAO parameters set (1605) is equal to OFF (No SAO), the next CTU is processed (1610). Otherwise, for each in turn of the four offsets (1607), the distortion Distortion TEMPORAL X is incremented by an amount equal to the associated distortion of the offset Oi (1609). This is the same process as the RD cost evaluation for a merge of SAO parameters as described previously. Please note that sao_band _position is set equal to 0 when the SAO type is equal to an Edge type. When the distortion of all offsets have been added to Distortion TEMPORAL X (1608), the next CTU is processed (1610). When the number of CTU nbCTU is equal to the lastCTU (1610), the RDCost for the temporal mode at Slice level, for component X, is set equal to the sum of this computed distortion Distortion TEMPORAL X and l multiplied by the rate for this temporal mode at Slice level (1611). This rate is equal to the rate of the signalling of temporal mode plus, if needed, the rate of the reference frame index refidx and if needed plus the rate of the list Ly.
The two evaluations are then compared and the one with the best performance is selected. The selected derivation (temporal or CTU level) is then signalled to the decoder in the bitstream, for example using the first syntax element as described in connection with the second embodiment. Figure 22 illustrates the competition between the CTU level for SAO and for temporal derivation at encoder side. The current slice/frame 1901 is used to set the CTUStats table (1903) for each CTU (1902). This table (1903) is used to evaluate the CTU level derivation (1904) and the temporal derivation for the whole slice (1915) as described previously in Figure 21. This table (1903) is also used to evaluate several reference frames for temporal derivation. The best derivation for the slice is selected according to the rate distortion criterion computed for each available derivation (1910). The SAO parameters sets for each CTU are set (1911) according to the derivation selected in step 1910. These SAO parameters are then used to apply the SAO filtering (1913) in order to obtain the filtered frame/slice.
The selected derivation may be signalled in the slice header, for example using a syntax element indicating temporal derivation (which the decoder reads, see 2101 and 2201 in Figures 13 and 14).
Ninth Embodiment
In the eighth embodiment the temporal derivation was put into competition with one alternative non-temporal method of deriving the SAO parameters. In the ninth embodiment two alternative methods are in competition with the temporal derivation.
Figure 23 shows various different groupings 1201-1206 of CTUs in a slice.
A first grouping 1201 has individual CTUs. This first grouping requires one set of SAO parameters per CTU. It corresponds to the CTU-level derivation in the eighth embodiment.
A second grouping 1202 makes all CTUs of the entire image one group. Thus, in contrast to the CTU-level derivation, all CTUs of the frame (and hence the slice which is either the entire frame or a part thereof) share the same SAO parameters.
To make all CTUs of the image share the same SAO parameters one of two methods can be used. In both methods, the encoder first computes a set of SAO parameters to be shared by all CTUs of the image. Then, in the first method, these SAO parameters are set for the first CTU of the slice. For each remaining CTU from the second CTU to the last CTU of the slice, the sao_merge_left flag is set equal to 1 if the flag exists (that is, if the current CTU has a left CTU). Otherwise, the sao_merge_up flag is set equal to 1. Figure 24 shows an example of CTUs with SAO parameters set according to the first method. This method has the advantage that no signalling of the grouping to the decoder is required. Also, no changes to the decoder are required to introduce the groupings and only the encoder is changed. The groupings could therefore be introduced in an encoder based on HE VC without modifying the HE VC decoder. Surprisingly, groupings do not increase the rate too much. This is because the merge flags are generally CAB AC coded in the same context. Since for the second group (entire image) these flags all have the same value (1), the rate consumed by these flags is very low. This follows because they always have the same value and the probability is 1.
In the second method of making all CTUs of the image share the same SAO parameters, the grouping is signalled to the decoder in the bitstream. The SAO parameters are also signalled as SAO parameters for the group (whole image), for example in the slice header. In this case, the signalling of the grouping consumes bandwidth. However, the merge flags can be dispensed with, saving the rate related to the merge flags, so that overall the rate is reduced.
The first and second groupings 1201 and 1202 provide very different rate-distortion compromises. The first grouping 1201 is at one extreme, giving very fine control of the SAO parameters (CTU by CTU), which should lower distortion, but at the expense of a lot of signalling. The second grouping is at the other extreme, giving very coarse control of the SAO parameters (one set for the whole image), which raises distortion but has very light signalling.
Next, a description will be given of how to determine in the encoder the SAO parameters for the second grouping 1202. In the second grouping 1202 the determination is done for a whole image and all CTUs of the slice/frame share the same SAO parameters.
Figure 25 is an example of the setting of SAO parameters for a frame/slice level using the first method of sharing SAO parameters (i.e. without new SAO classifications at encoder side). This figure is based on Figure 17. At the beginning of the process, the CTUStats table is set for each CTU (in the same way as the CTU level encoding choice). This CTUStats can be used for the traditional CTU level (1302). Then the table FrameStats is set by adding each value for all CTUs of the table CTUStats (1303). Then the same process as for CTU level is applied to find the best SAO parameters (1305 to 1315). To set the SAO parameters for all CTUs of the frame, the selected SAO parameters set at step 1315 is set for the first CTU of the slice/frame. Then for each CTU from the second CTU to the last CTU of the slice/frame, the sao_merge_left_flag is set equal to 1 if it exists otherwise the sao_merge_up_flag is set equal to 1 (indeed for the second CTU to the last CTU a merge Left or Up or both exist) (1317). The syntax of the SAO parameters set is unchanged from that presented in Figure 9. At the end of the process the SAO parameters are set for the whole slice/frame.
Thus, the CTUStats table in the case of determining the SAO parameters for the whole image (frame level) is created by the process of Figure 25. This corresponds to evaluating the frame level in terms of the rate-distortion compromise.
As described previously in connection with the eighth embodiment, the encoder also evaluates the CTU level non-temporal derivation and the temporal derivation in terms of their respective rate-distortion compromises. Each evaluation is performed for the whole image in this case.
The three evaluations are then compared and the one with the best performance is selected. The selected derivation (temporal or CTU level or frame level) is then signalled to the decoder in the bitstream.
The signalling of the selected derivation can be made in many different ways. For example, a grouping index can be signalled. The first syntax element can then still be used to signal whether the SAO parameters for all CTUs of the slice are derived temporally or not (e.g. temporal merge flag), supplemented by the grouping index in the case when temporal derivation is not used. For example, the CTU level may have grouping index 0 and the frame level may have grouping index 1. Alternatively, the first syntax element may be adapted to signal everything, for example CTU level and frame level may have index 0 and index 1 respectively and temporal derivation may have another index such as 2. In this case, in Figures 21, 22 and 24 the first syntax element is changed accordingly.
The example of determining the SAO parameters in Figure 25 corresponds to the first method of sharing SAO parameters as it uses the merge flags to share the SAO parameters among all CTUs of the image (see steps 1316 and 1317). These steps can be omitted if the second method of sharing SAO parameters is used.
Incidentally, if the second method is used, and merge flags are not used within the group (image), the process of Figure 17 should be modified appropriately, in particular to not evaluate the RD costs in 1103 and 1104.
Tenth Embodiment In the eighth embodiment the CTU-level non-temporal derivation is in competition with the temporal derivation. In the tenth embodiment the CTU-level non-temporal derivation is not available and instead the frame-level non-temporal derivation is in competition with the temporal derivation. Eleventh Embodiment
The CTU and Frame levels used in the ninth embodiment offer extreme rate-distortion compromises. It is also possible to include other groupings intermediate between the CTU and frame levels which can offer other rate-distortion compromises. Referring again to Figure 12 a third grouping 1203 makes a column of CTUs a group.
Figure 26 is an example of the setting of SAO parameters sets for the third grouping 1203 at the encoder side. This Figure is based on Figure 17. To reduce the amount of steps in the figure, the modules 1105 to 1115 have been merged in one step 1405 in this Figure 26. At the beginning of the process, the CTUStats table is set for each CTU. This CTUStats can be used for the traditional CTU level (1302) encoding choice. For each column (1403) of the current slice/frame, the table ColumnStats is set by adding each value (1405) from CTUStats (1402), for each CTUs of the current column (1404). Then the new SAO parameters are determined as for CTU level (1406) encoding choice (cf. Figure 17). If it is not the first column, the RD cost to share the SAO parameters with the previous left column is also evaluated (1407), in the same way as the sharing of SAO parameters set between left and up CTU (1103, 1104) is evaluated. If the sharing of SAO parameters gives a better RD cost (1408) than the RD cost for the new SAO parameters set, the sao merge left flag is set equal to 1 for the first CTU of the column. This CTU has the address number equal to the value“Column”. Otherwise, the SAO parameters set for this first CTU of the column is set equal (1409) to the new SAO parameters obtained in step 1406.
For all other CTUs of the column (1411), their SAO merge Left sao merge left flag is set equal to 0 if it exists and the SAO merge up sao merge up flag is set equal to 1. Then the SAO parameters set for the next column can be processed (1403). Please note that, except for the first line of CTU all other CTUs of the frame have the sao merge left flag equal 1 to 0 if it exists and the sao merge up flag equals to 1. So, step 1412 can be processed once per frame.
The advantage of this CTU grouping is another RD compromise between the CTU level encoding choice and the frame level which can be useful for some conditions. Also, in this example, merge flags are used within the group, which means that the third grouping can be introduced without modifying the decoder (i.e. the grouping can be HEVC-compliant). Of course, the second method of sharing SAO parameters described in the third embodiment can be used instead. In that case, merge flags are not used in the group (CTU column) and steps 1411 and 1412 are omitted.
In one variant, the Merge between columns doesn’t need to be checked. It means that steps 1407 1408 1410 are removed from the process of Figure 26. The advantage of removing this possibility is a simplification of the implementation and the ability to parallelize the process. This has a small impact on coding efficiency. Another possible compromise intermediate between the CTU level and the frame level can be offered by a fourth grouping 1204 in Figure 23 which makes a line of CTUs a group. To determine the SAO parameters for this fourth grouping, a similar process to that of Figure 25 can be applied. In that case, the variable ColumnStats is changed by LineStats. The step 1403 is replaced by“For Line = 0 to Num CTU in Height”. The step 1404 is replaced by
“For CTU_in_line= 0 to Num CTU in Width”. Step 1405 by ColumnStats[][][][] += CTUStats[Line* Num CTU in Width + CTU in line] [][][][]. The New SAO parameters and the merge with the up CTU is evaluated based on this LineStats table (steps 1406 1407). The step 1410 is replaced by setting of sao merge up flag to 1 for the first CTU of the Line. And for all CTUs of the slice/ffame except each first CTU of each Line, sao merge left flag is set equal to 1.
The advantage of the line is another RD compromise between the CTU level and Frame level. Please note that the frame or slice are most of the time rectangles and their width is larger than their height. So the line CTUs grouping 1204 is expected to be an RD compromise closer to the frame CTU grouping 1202 than the column CTU grouping 1203.
As for the other CTU groupings 1202 and 1203, the line CTU grouping can be HE VC compliant if the merge flags are used within the groups.
As for the column CTU grouping 1203 the evaluation of merging 2 lines can be removed.
Further RD compromises can be offered by putting two or more columns of CTUs or two or more lines of CTUs together as a group. The process of Figure 25 can be adapted to determine SAO parameters to such groups.
In one embodiment, the number N of columns or lines in a group may depend on the number of groups that are targeted.
The use of several columns or lines for the CTU groupings may be particularly advantageous when the slices or frames are large (for HD, 4K or beyond).
As described previously, in one variant, the merge between these groups containing two or more columns or two or more lines doesn’t need to be evaluated.
Another possible grouping includes split columns or split lines, where the split is tailored to the current slice/frame.
Another possible compromise between the CTU level and the frame level can be offered by square CTU groupings 1205 and 1206 as illustrated in Figure 23. The grouping 1205 makes 2x2 CTUs a group. The grouping 1206 makes 3x3 CTUs a group. Figure 27 shows an example of how to determine the SAO parameters for such groupings. For each NxN group (1503), the table NxNStats (1507) is set (1504, 1505, 1506) based on CTUstats. This table is used to determine the New SAO parameters (1508) and its RD cost, in addition to the RD cost for a Left (1510) sharing or Up (1509) sharing of SAO parameters. If the Best RD cost is the new SAO parameters (1511), the SAO parameters of the first CTU (top left CTU) of the NxN group is set equal to this new SAO parameters (1514). If the best RD cost is the sharing of SAO parameters with the up NxN group (1512), the sao merge up flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 and the sao merge left flag to 0 (1515). If the best RD cost is the sharing of SAO parameters with the left NxN group (1513), the sao_merge_left_flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 (1516). Then the sao_merge_left_flag and sao_merge_up_flag are set correctly for the other CTUs of the NxN group in order to form the SAO parameters for the current NxN group (1517). Figure 28 illustrates this setting for a 3x3 SAO group. The top left CTU is set equal to the SAO parameters determined in step 1508 to 1516. For the 2 other top CTUs, the sao_merge_left_flag is set equal to 1. As the sao_merge_left_flag is the first flag encoded or decoded and as it is set to 1, there is no need to set the sao merge up flag to 0. For the 2 other CTUs in the first row, the sao merge left flag is set equal to 0 and sao merge up flag is set equal to 1. For the other CTUs, the sao merge left flag is set equal to 1.
The advantage of the NxN CTU groupings is to create several RD compromises for SAO. As for the other groupings, these groupings can be HEVC compliant if merge flags within the groups are used. As for the other groupings, the test of Merge left and Merge up between groups can be dispensed with in Figure 27. So steps 1509, 1510, 1512, 1513, 1515 and 1516 can be removed, especially when N is high.
In one variant, the value N depends on the size of the frame/slice. The advantage of this embodiment is to obtain an efficient RD compromise.
In a preferred variant, only N equal to 2 and 3 are evaluated. This offers an efficient compromise.
The possible groupings are in competition with one another and with the temporal derivation as the SAO parameter derivation to be selected for the current slice. Figure 29 illustrates an example of how to select the SAO parameter derivation using a rate-distortion compromise comparison. In this example, the first method of sharing SAO parameters among the CTUs of a group is used. Accordingly, merge flags are used within groups. If applied to HEVC, the resulting bitstream can be decoded by an HEVC-compliant decoder.
The current slice/frame 1701 is used to set the CTUStats table (1703) for each CTU (1702). This table (1703) is used to evaluate the CTU level (1704), the temporal derivation
(1715), the frame/ Slice Grouping (1705), the Column grouping (1706), the line grouping (1707), the 2x2 CTUs grouping (1708) or 3x3 CTU grouping (1709) or all other described CTUs groupings as described previously. The best derivation (a non-temporal derivation with a CTU grouping or the temporal derivation) is selected according to the rate distortion criterion computed for each available derivation (1710). The SAO parameters sets for each CTU are set (1711) according to the derivation selected in step 1710. These SAO parameters are then used to apply the SAO filtering (1713) in order to obtain the filtered frame/slice.
The second method of sharing SAO parameters among the CTUs of the CTU grouping may be used instead of the first method. Both methods have the advantage of offering a coding efficiency increase. A second advantage, obtained when the first method is used but not when the second method is used, is that this competition method doesn’t require any additional SAO filtering or classification. Indeed, the main impacts on encoder complexity are the step 1702 which needs SAO classification for all possible SAO type and the step 1713 which filters the samples. All other CTU groupings evaluations are only some additions of values already obtained during the CTU level encoding choice (set in the table CTUStats).
One other possibility to increase the coding efficiency at encoder side is to test all possible SAO groupings but this should increase the encoding time compared to the example of Figure 29 where a small subset of groupings is evaluated.
As mentioned just now, it is also possible to use the second method of sharing SAO parameters among the CTUs of a group. In this case, the encoder signals in the bitstream which derivation of the SAO parameters is selected (CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation). A possible indexing scheme is shown in Table 1 below:
Figure imgf000045_0001
Figure imgf000046_0001
Table 1
Because the majority of the derivation index values (values 0 to 5) signal groupings, the derivation index is also referred to as a grouping index hereinafter.
Figure 30 is a flow chart illustrating a decoding process when the CTU grouping is signaled in the slice header according to the second method of sharing SAO parameters among the CTUs of the group. First the flag SaoEnabledFlag is extracted from the bitstream (1801). If SAO is not enabled, the next slice header syntax element is decoded (1807) and SAO will not be applied to the current slice. Otherwise the decoder extracts N bits form the slice header (1803). N depends on the number of available CTUs groupings. Ideally the number of CTUs groupings should be equal to 2 power of N. The corresponding CTUs grouping index (1804) is used to select the CTUs grouping method (1805). This grouping method will be applied to extract the SAO syntax and to determine the SAO parameters set for each CTU (1806). Then the next slice header syntax element is decoded. If the CTU grouping index (1804) corresponds to the temporal derivation, other parameters can be extracted from the bitstream such as the reference frame index and/or other parameters necessary for the temporal derivation.
The advantage of the signalling at slice header of the CTUs grouping is its low impact on the bitrate.
But when the number of slices is significant for a frame, it may be desirable to reduce this signalling. So, in one variant, the CTUs grouping index uses a unary max code in the slice header. In that case, the CTUs groupings are ordered according to their probabilities of occurrences (highest to lowest).
In the eleventh embodiment, at least one non-temporal derivation is an intermediate level derivation (SAO parameters not at CTU level or at group level). When applied to a group it causes the group (e.g. frame or slice) to be subdivided into subdivided parts (CTU groupings 1203-1206, e.g. columns of CTUs, lines of CTUs, NxN CTUs, etc.) and derives SAO parameters for each of the subdivided parts. Each subdivided part is made up of two or more said image parts (CTUs). The advantage of the intermediate level derivation(s) is introduction of one or more effective rate-distortion compromises. The intermediate level derivation(s) can be used without the CTU-level derivation or without the frame-level derivation or without either of those two derivations.
Twelfth Embodiment
In the ninth embodiment the temporal derivation is in competition with CTU level derivation and the frame level derivation. The twelfth embodiment builds on this and adds one or more of the intermediate groupings set out in the sixth embodiment, so that the competition includes CTU level, frame level, one or more groupings intermediate between the CTU and frame levels, and the temporal derivation.
Thirteenth Embodiment
In the eighth embodiment the temporal derivation is in competition with CTU level derivation but not the frame level derivation. The thirteenth embodiment builds on this and adds one or more NxN CTU groups so that the competition includes CTU level, one or more NxN CTU groups, and the temporal derivation.
Fourteenth Embodiment
In the eighth embodiment the temporal derivation is in competition with CTU level derivation but not the frame level derivation. The eighth embodiment builds on this and adds the third grouping 1203 (column of CTUs) or the fourth grouping 1204 (line of CTUs) or both the third and fourth groupings 1203 and 1204. The competition therefore includes CTU level, the third and/or fourth grouping, and the temporal derivation.
The ninth and eleventh to fourteenth embodiments each promote diversity for the SAO parameter derivation to be applied to a group by making at least first and second said non temporal derivations available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level. The levels may any two levels from the frame level to a CTU level. The levels may correspond to the groupings 1201-1206 in Figure 23. Fifteenth Embodiment
In the eighth to fourteenth embodiments, the smallest grouping is the first grouping 1201 in which each CTU is a group and there is one set of S AO parameters per CTU. However, in the fifteenth embodiment, a set of SAO parameters can be applied to a smaller block than the CTU. In this case, the non-temporal derivation is not at the CTU level, frame level or an intermediate level between the CTU and frame levels but at a sub-CTU level (a level smaller than an image part).
In this case, instead of signalling a grouping it is effective to signal an index representing a depth of the SAO parameters. Table 2 below shows one example of a possible indexing scheme:
Figure imgf000048_0001
Table 2
The index 0 means that each CTU is divided into 16 blocks and each may have its own SAO parameters. Index 1 means that each CTU is divided into 4 blocks, again each having its own SAO parameters.
These different depths of SAO parameters are put in competition with the temporal derivation and the encoder selects one derivation (either temporal derivation or non-temporal derivation at one of available depths). The selection may be based on a RD comparison.
The selected derivation is then signalled to the decoder in the bitstream. The signalling may comprise a temporal/non-temporal syntax element plus a depth syntax element (e.g. using the indexing scheme above). Alternatively, a combined syntax element may be used to signal temporal/non-temporal and the depth. Temporal derivation could be assigned index 6 ,for example, with the non-temporal derivations having index 0-5.
In the fifteenth embodiment, at least one non-temporal derivation when applied to a group causes the group to be subdivided into subdivided parts and derives SAO parameters for each of the subdivided parts, and each image part is made up of two or more said sub-divided parts.
In the fifteenth embodiment, as in the ninth and eleventh to fourteenth embodiments, at least first and second said non-temporal derivations are available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level. The levels may any two levels from the frame level to a sub-CTU level. The levels may correspond to the groupings 1201- 1206 in Figure 23. Sixteenth Embodiment
In the second embodiment the selected derivation of the SAO parameters is signalled for a slice, which means that the temporal derivation (when selected) is used for all CTUs of the slice. It is not possible to determine at the CTU level whether to use temporal derivation or not. The same is true in the fifteenth embodiment in which, even though the available non temporal derivations include derivations having SAO parameters at different levels (depths) lower than the slice or frame level, it is not possible to determine at the chosen level of the SAO parameters whether to use temporal prediction or not.
In the sixteenth embodiment, the SAO parameters derivation is modified so that a temporal derivation at the CTU level is available, rather than only a temporal derivation at the group level. The temporal derivation at the CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts.
For example, in one implementation, first a level of the SAO parameters is selected for a slice or frame, which may include the CTU level. Then, when the CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each CTU of the slice or frame.
Also, when the selected level of the SAO parameters for a slice is an intermediate level between the slice level and the CTU level, a temporal derivation or non-temporal derivation may be selected per CTU group (e.g. each column of CTUs) of the slice or frame. In this case, the temporal derivation does still apply to a group of two or more CTUs (image parts). One or more CTU groups within the slice may then use temporal derivation (with each CTU deriving an SAO parameter from a collocated CTU of a reference image), whilst other CTU groups use a non-temporal derivation. In this case, the benefit of selecting between temporal and non temporal SAO parameter derivation per CTU group is achieved in addition to the benefit of applying the temporal derivation on a group basis. This is illustrated in Figure 31 for a 2x2 CTU grouping (grouping 1205 in Figure 23).
In Figure 31 the SAO merge flags are usable between groups of the CTUs grouping. As depicted in Figure 31, for the 2x2 CTU grouping, the SAO merge Left and SAO merge up are kept for each group of 2x2 CTUs. But they are removed for CTUs inside the group. Please note that only the saojnergejeft Jlag is used for the grouping 1203 of a column of CTUs, and only the sao_merge_up Jlag is used for the grouping 1204 of a line of CTUs.
In a variant, a flag signals if the current CTU group shares its SAO parameters or not. If it is true, a syntax element representing one of the previous groups is signalled. So each group of a slice can be predicted by a previous group except the first one. This improves the coding efficiency by adding several new possible predictors.
Seventeenth Embodiment
In the sixteenth embodiment a depth of the SAO parameters was selected for a slice, including depths smaller than a CTU, making it possible to have a set of SAO parameters per block in a CTU. However, when the use of temporal derivation was selected no depth could be selected and all CTUs of the slice had to use temporal derivation.
In the seventeenth embodiment, the SAO parameters derivation is modified so that a depth is selected for the slice and then it is selected for an image part at the selected depth whether or not to use temporal derivation. The depths may be the ones in Table 2.
In the sixteenth embodiment, the SAO parameters derivation is modified so that a temporal derivation at the sub-CTU level is available, rather than only a temporal derivation at the group level. The temporal derivation at the sub-CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts.
For example, in one implementation, first a level of the SAO parameters is selected for a slice or frame, which may include the sub-CTU level. Then, when the sub-CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each block of the slice or frame.
Also, when the selected level of the SAO parameters for a slice is an intermediate level between the slice level and the block level, a temporal derivation or non-temporal derivation may be selected per CTU or per CTU group (e.g. each column of CTUs) of the slice or frame. In this case, the temporal derivation does still apply to a group of two or more blocks (image parts). One or more CTUs or CTU groups within the slice may then use temporal derivation (with each block deriving an SAO parameter from a collocated block of a reference image), whilst other CTUs or CTU groups use a non-temporal derivation. In this case, the benefit of selecting between temporal and non-temporal SAO parameter derivation per CTU or CTU group is achieved in addition to the benefit of applying the temporal derivation on a CTU or CTU group basis.
In the seventeenth embodiment and in the sixteenth embodiment one possibility is to remove the SAO merge flags for all levels. It means that steps 503 504 505 506 of Figure 9 are removed. The advantage is that it reduces significantly the signalling of SAO and consequently it reduces the bitrate. Moreover, it simplifies the design by removing 2 syntax elements at CTU level.
In one variant, the merge flags are kept for CTU level but removed for all other CTU groupings. The advantage is a flexibility of the CTU level.
In another variant, the merge flags are used for CTU when the SAO signalling is lower or equal to the CTU level (1/16 CTU or ¼ CTU) and removed for other CTUs groupings having larger groups.
The merge flags are important for small block sizes because a SAO parameters set is costly compared to the amount of samples that it can improve. In that case, these syntax elements reduce the cost of SAO parameters signalling. For large groups, the SAO parameters set is less costly so the usage of merge flags is not efficient. So the advantage of these embodiments is a coding efficiency increase.
In another variant, the level where the SAO merge flags are enabled is explicitly signalled in the bitstream. For example, a flag indicates if the SAO merge flags are used or not. The flag may be included after the index of the CTUs grouping (or the depth) in the slice header.
This offers to the encoder to efficiently select the usage of SAO Merge flags or not.
Eighteenth Embodiment In the eighth to fifteenth embodiments there is competition between the temporal derivation and at least one alternative derivation method not using temporal derivation. Similarly, in the sixteenth and seventeenth embodiments there is competition between groupings or depths, with temporal derivation being possible for each grouping or depth. Whilst such competition is useful in identifying an efficient SAO parameters derivation for the slice or frame, it can place quite a big burden on the encoder which has to perform an evaluation for each candidate derivation. This burden may be undesirable, especially for a hardware encoder.
Accordingly, in the eighteenth embodiment, the competition between the different permitted derivations (e.g. in the eighth embodiment the competition between non-temporal derivation at the CTU level and temporal derivation) is modified so that only one derivation is permitted in the encoder for any given slice or frame. The permitted derivation may be determined in dependence upon one or more characteristics of the slice or frame. For example, the permitted derivation may be selected based on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP). As a result, for certain slices or frames, only temporal derivation is permitted, while for other slices or frames only non-temporal derivation is permitted, for example non-temporal derivation at the CTU level. For example, the Intra Frames and the Inter frames at the highest position in the hierarchy of the GOP structure or with the low QP may be permitted only to use the CTU level. And the other frames which have lower positions in the GOP hierarchy or a high QP may be permitted only to use temporal derivation. The different parameters can be set depending on the rate distortion compromise. The advantage of this embodiment is a complexity reduction. Instead of evaluating two or more competing derivations just one derivation is selected, which can be useful for a hardware encoder.
Thus, in the eighteenth embodiment a first derivation is associated with first groups of the image (e.g. Intra slices) and a second derivation is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, the first derivation is used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, the second derivation is used to filter the image parts of the group. Evaluation of the two derivations is not required.
Whether a group to be filtered is determined to be a first group or a second group may depend on one or more of:
a slice type; a frame type of the image to which the group to be filtered belongs;
a position in a quality hierarchy of a Group of Pictures of the image to which the group to be filtered belongs;
a quality of the image to which the group to be filtered belongs; and
a quantisation parameter applicable to the group to be filtered.
For example, when the first groups have a higher quality or higher position in the quality hierarchy than the second groups, the first derivation may have fewer image parts per group than the second derivation.
Nineteenth Embodiment
In the eigtheenth embodiment a particular derivation of the SAO parameters was selected for a given slice or frame. However, if the encoder has the capacity to evaluate a limited number of competing derivations, it is unnecessary to eliminate the competition altogether. In the fourteenth embodiment, the competition for a given slice or frame is still permitted but the set of competing derivations is adapted to the slice or frame. For example, the set of competing derivations may depend on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP).
The set of competing derivations may depend on the slice type.
For Intra slices, the set preferably contains groupings with groups containing small numbers CTUs (e.g. CTU level, 2x2 CTU, 3x3 CTU, and Column). Also, if depths lower than a CTU are available (as in the tenth embodiment), these depths are preferably also included. Of course, the temporal derivation is not used.
For Inter slices, the set of derivations preferably contains groupings with groups containing large numbers of CTUs such as Fine, Frame level, and the temporal derivation. However, smaller groupings can also be considered down to the CTU level.
The advantage of this embodiment is a coding efficiency increase thanks to the use of derivations adapted for a slice or frame.
In one variant, the set of derivations can be different for an Inter B slice from that for an Inter P slice.
In another variant, the set of competing derivations depends on the characteristics of the frame in the GOP. This is especially beneficial for frames which vary in quality (QP) based on a quality hierarchy. For the frames with the highest quality or highest position in the hierarchy, the set of competing derivations should include groups containing few CTUs or even sub-CTU depths (same as for Intra slices above). For frames with a lower quality or lower position in the hierarchy, the set of competing derivations should include groups with more CTUs.
The set of competing derivations can be defined in the sequence parameters set.
Thus, in the nineteenth embodiment a first set of derivations is associated with first groups of the image (e.g. Intra slices) and a second set of derivations is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, a derivation is selected from the first set of derivations and used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, a derivation is selected from the second set of derivations and used to filter the image parts of the group. Evaluation of derivations not in the associated set of derivations is not required.
Whether a group to be filtered is a first group or a second group may be determined in the preceding embodiment. For example, when the first groups have a higher quality or higher position in the quality hierarchy than the second groups, the first set of derivations may have at least one derivation with fewer image parts per group than the derivations of the second set of derivations. Twentieth Embodiment
In the preceding embodiments, the temporal derivation involves simply copying SAO parameters from a collocated CTU (or from a collocated block within a CTU if SAO parameters at the block level are used). In a video, there are generally background and moving objects. When comparing a frame to its following frames, a large part can be static. When the SAO temporal derivation is applied on this static part for several consecutive frames, the SAO filtering should filter nothing, especially for edge offset. As a result, the temporal derivation will not be selected.
To solve this problem and increase the coding efficiency of the temporal derivation, in the twentieth embodiment the set of SAO parameters from the previous frame is changed according to some defined rules. Figure 32 is an example of an algorithm to produce such a modification of the set of SAO parameters. In this example, a 90° rotation is applied to the edge classification. If sao_eo_class_Luma or sao_eo_class_Chroma (2301) from the collocated CTU is equal to 0, which corresponds to edge type 0° (2302), the edge type for the current frame (2310) is set equal to 1 (2303) corresponding to SAO edge type 90°. And if sao_eo_class_X is equal to 1 (2304), sao_eo_class_X (2305) is set equal to 0. In the same way, the edge offset type 135° {sao_eo_class_X equal to 2 (2306)) is rotated to edge offset type 45° (2307). And the edge offset type 45° {sao_eo_class_X equal to 3 (2308)) is rotated to edge offset type 45° (2309). The offsets values have not been changed.
It will be appreciated that although the effect of the algorithm of Figure 32 is to apply a rotation, in practice the changes to the edge classification parameters ( sao_eo_class_Luma or sao_eo_class_Chroma ) may be effected by using a mapping table. In the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index. Thus, the mapping table implements the required rotation.
Figure 33 illustrates this temporal rotation by 90°. In this example, it is assumed that the temporal derivation with 90° rotation is applied to a whole frame or slice as in the first and second embodiments.
Of course, as variants, the 45° and the 135° rotations can be considered instead of 90°. Yet, in a preferred embodiment the rotation of temporal SAO parameters sets is the 90° rotation. This gives the best coding efficiency.
In one variant, when the temporal rotation is applied, band offsets are not copied and SAO is not applied on this CTU.
In another variant, as for the basic“copying” temporal derivation, for all CTUs for which an SAO parameter set is unobtainable (this means a CTU whose collocated CTU uses “no SAO” or all of whose collocated CTUs use“no SAO”), a default SAO parameter set can be used for the CTUs concerned as described in connection with the fifth embodiment.
Twenty-first Embodiment
In the twentieth embodiment, the“rotation” temporal derivation is introduced. In the twenty-first embodiment, the“rotation” temporal derivation is put in competition with the “copying” temporal derivation as shown in Figure 34. In this example the competition is applied to each slice or each frame. The best temporal derivation may be selected based on a rate-distortion criterion.
Twenty-second Embodiment In several preceding embodiments, the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths). In the twenty-second embodiment the“rotation” temporal derivation is put into competition with the same non-temporal derivation(s) instead of the“copying” temporal derivation.
Twenty-third Embodiment
In several preceding embodiments, the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths). In the twenty-third embodiment both the “copying” and “rotation” temporal derivations are put into competition with the same non-temporal derivation(s) instead of just the“copying” temporal derivation. For example, Table 3 below shows the competing derivations when the eleventh embodiment is modified in this way:
Figure imgf000056_0001
Table 3
As a variant, further temporal derivations with 135° and 45° rotations respectively or with other rotation angles are possible.
Twenty-fourth Embodiment In the twenty-first embodiment the“copying” and“rotation” temporal derivations are in competition with one another. In the twenty-fourth embodiment these two temporal derivations and further“rotation” temporal derivations are used cyclically.
In one exemplary cycle, a first frame FO is followed by second, third, fourth and fifth frames F1-F4. The first frame FO does not use temporal derivation of SAO parameters. For Fl, the“copying” temporal derivation is applied (i.e. copying the SAO parameters from FO). For F2, the temporal derivation is a 90° rotation of SAO parameters of F0. For F3, the temporal derivation is a 135° rotation of SAO parameters of F0. For F4, the temporal derivation is a 45° rotation of SAO parameters of F0. In this case, F0 is a reference image for each of Fl to F4.
The same effect can be achieved by using the previous frame only as the reference frame:
Frame F0: (SAO parameters not derived temporally)
Frame Fl : (temporal‘copy’ Frame 0)
Frame F2: (temporal‘90°’ Frame 1)
Frame F3: (temporal‘45°’ Frame 2)
Frame F4: (temporal‘90°’ Frame 3)
By filtering an image part in a first image using the rotation copy derivation and filtering the same image part in two or more further images following the first image using different ones of the two or more temporal rotation derivations in a predetermined sequence the direction of edge filtering of an image part may be switched successively through all possible edge filtering directions.
Second group of embodiments
In HEVC, SAO filtering is performed CTU by CTU. In the first group of embodiments, temporal derivation is introduced, and to improve the signalling efficiency, a group of image parts is formed and the use of temporal prediction is signalled for this group of image parts, rather than for each image part individually.
In the second group of embodiments, it is not necessary for any of the derivations to be a group-wise derivation.
Twenty-fifth Embodiment In the twentieth embodiment, the“rotation” temporal derivation is applied to all CTUs of a slice or frame. In other words, a rotation temporal derivation is signalled for a group (slice, frame, column, line, NxN CTUs, etc.) composed of two or more image parts (CTUs). The image parts (CTUs) may still have different SAO parameters depending on the SAO parameters of the respective collocated image parts.
Signalling the temporal derivation at the slice of frame level is useful for compatibility with the embodiments described previously a grouping of CTUs is selectable for the slice or frame from among plural groupings (e.g. the groupings 1201-1206 in Figure 23), the selected grouping also being signalled at the slice or frame level. However, it is not essential to signal the use of temporal derivation at the slice or frame level. This applies whether there is just one type of temporal derivation, e.g.“copy” or“rotation”, or the type can be selected from plural different types. Instead, the signalling of the use of temporal derivation can be at the CTU level or at the block level (i.e. sub-CTU). In this case, a syntax element may be provided per CTU to indicate whether or not rotation temporal derivation is used for the CTU concerned. Equally, a syntax element may be provided per block (i.e. sub-CTU) to indicate whether or not rotation temporal derivation is used for the block concerned.
In the twenty- fifth embodiment neither temporal derivation nor a grouping is signalled at the slice level and all the SAO signalling is at CTU level. Figure 35 shows an example decoding process in this embodiment. In this example, one or more merge flags are used at CTU level to signal the SAO derivation including usage of the temporal derivation. A new SAO temporal merge flag is introduced compared to Figure 9.
The process of Figure 35 is performed CTU by CTU. For a current CTU the sao jnerge Jemporal Jlag_X is extracted from the bitstream if other merge flags are off (2613). If sao jnerge Jemporal Jlag_X is equal to 1, a syntax element representing a reference frame is extracted from the bitstream (2614). Please note that this step is not needed if only one reference frame is used for the derivation. Then a syntax element representing a rotation of the parameters is decoded (2615). Please note that this step is not needed, if no“rotation” option is available. This would be the case if the only type of temporal derivation is the basic“copy” type. Also, even if there is a rotation option, the step 2615 is not performed if the collocated CTU in the reference frame is not EO type. Then the respective sets of SAO parameters for the 3 color components are copied from the collocated CTU to the current CTU. Processing then moves to the next CTU in step 2610.
The advantage of the temporal merge flag signalling compared to temporal/ CTU grouping signalling at slice level is a simplification of the encoder design for some implementations. Indeed, there is no need to wait for the encoding of the whole frame before starting SAO selection, unlike in the slice level approach. But the extra signalling at the CTU level can have a significant impact on the coding efficiency is not negligible.
When two or more temporal derivations are in competition with one another, for example“no temporal”,“copy”,“rotate by 90°”,“rotate by 135°”,“rotate by 45°” the syntax element per CTU extracted in step 2615 may indicate the selected temporal derivation, e.g. using an index. The syntax element could also specify the angle of rotation. In this way, in the same slice or frame, some CTUs may have no temporal derivation, other CTUs may use “copy”, still others may use“rotate by 90°”, and so on. These solutions lead to an extremely fine adaptation of the SAO parameter derivation to the CTUs of a slice or frame.
Signalling a grouping for a slice or frame and then signalling for each group of two or more CTUs whether to use temporal derivation or not or, if two or more temporal derivations are in competition with one another, which one of them is selected, is an effective way to achieve adaptability without having per-CTU syntax elements. For example, if the selected grouping for a slice is 3x3 CTUs, some groups may have no temporal derivation, other groups may use“copy”, still others may use“rotate by 90°”, and so on. As the number of groups is only l/9th of the number of CTUs the number of syntax elements is correspondingly smaller compared to per-CTU signalling too, yet the different CTUs in each group may still have different SAO parameters depending on the collocated CTUs.
Twenty-sixth embodiment
In the twentieth to twentieth- fifth embodiments rotation temporal derivations are introduced. These rotation temporal derivations are preferred examples from a wider class of transformations that can be applied to change the direction of EO filtering in a CTU of the current frame compared to the direction of EO filtering in a collocated CTU of a reference frame. For example, the direction-changing transformation could be a reflection about the x- axis or y-axis. Such a reflection has the effect of swapping two directions and leaving the other two directions unchanged. It could also be a reflection about a diagonal line at 45° or 135°.
As in the twentieth embodiment it will be appreciated that although the effect of the algorithm of Figure 32 is to apply a transformation, in practice the changes to the edge classification parameters ( sao_eo_class_Luma or sao_eo_class_Chroma ) may be effected by using a mapping table. In the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index. Thus, the mapping table implements the required transformation.
This embodiment is applicable to the first group of embodiments (which use a group- wise derivation) or to the second group of embodiments (which do not use a group-wise derivation).
Third group of Embodiments
In the first and second groups of embodiments, temporal derivation of SAO parameters was introduced, either as a group-wise derivation (applied to a group of two or more image parts) or for individual image parts.
In a third group of embodiments, new spatial derivations of SAO parameters are introduced. These may be group-wise derivations or for individual image parts.
In the case of a group-wise spatial derivation, as in the first group of embodiments, a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1. A group could also be NxN CTUs, where N is an integer greater than 1, or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
Alternatively, a group of image parts can be a CTU, and each constituent block of the CTU can be an image part. In such a case, each block of a CTU may have its own SAO parameters, but the signalling to use spatial derivation of the SAO parameters can be made for the CTU as a whole.
In the simplest case, where there is only one type of spatial derivation, a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
In the case of a group-wise spatial derivation the manner in which the SAO parameters are derived in the spatial derivation is not particularly limited except that the source image part belongs to another group of image parts in the same image as the subject group. The source image part and the image part to be derived are at the same positions in their respective groups. For example, in a 3x3 CTU grouping, there are 9 positions from the top left to the bottom right. If the other group is, for example, the left group of the subject group, then at least one SAO parameter of an image part at position 1 (the top left position, say) in the subject group is derived from an SAO parameter of the image part at the same position (position 1 or top left position) in the left group. This image part in the left group serves as a source image part for the image part to be derived in the subject group. The same is true for each other position in the subject group.
In the simplest case, the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the source image part by copying the SAO parameter of the source image part. One, more than one, or all SAO parameters may be copied. Alternatively, one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
For example, in Figure 31, several 2x2 CTU groups are illustrated. When saojnergejeft Jlag is set in a current 2x2 group, one or more SAO parameters of the CTU in the top-left position of the left 2x2 group are copied to the CTU in the top-left position of the current 2x2 group; one or more SAO parameters of the CTU in the top-right position of the left 2x2 group are copied to the CTU in the top-right position of the current 2x2 group; one or more SAO parameters of the CTU in the bottom-left position of the left 2x2 group are copied to the CTU in the botom-left position of the current 2x2 group; and one or more SAO parameters of the CTU in the bottom-right position of the left 2x2 group are copied to the CTU in the bottom- right position of the current 2x2 group. The sao_merge_up Jlag does the same for a 2x2 group above the current 2x2 group. When neither flag is set, each CTU has its own“new” SAO parameters, which may be at the group level (one set per group) or at the CTU level.
Instead of copying a spatial derivation of SAO parameters which involves modifying one, more than one, or all SAO parameters of the source image part may be used.
It will be appreciated that spatial and temporal group-wise derivations are both“group- wise sourcing derivations”. Each involves applying a group-wise sourcing derivation of SAO parameters to a group of two or more image parts, the group-wise sourcing derivation permitting different image parts belonging to the group to have different SAO parameters and comprising deriving at least one said SAO parameter of an image part belonging to the group from an SAO parameter of another image part serving as a source image part for the image part to be derived. In the case of a temporal derivation the source image part is a collocated image part in a reference image having a position in the reference image collocated with a position of the image part to be derived in its image. In the case of a spatial derivation, the source image part belongs to another group of image parts in the same image as the image part to be derived, said source image part and said image part to be derived being at the same positions in their respective groups.
Twenty-seventh Embodiment In the twenty- fifth embodiment the rotation derivation was a non-group-wise temporal derivation. However, in the twenty-seventh embodiment a spatial rotation derivation is used as a derivation, i.e. where the SAO parameters of a CTU in a current image are derived by rotation from the SAO parameters of another CTU of the same image (as opposed to being derived by rotation from the SAO parameters of a collocated CTU of a reference image).
Similarly to the“copy” spatial derivation, the other CTU in the“rotation” spatial derivation may be a left CTU or an upper CTU, in which case a sao_merge_rotation_left flag or sao_merge_rotation_up flag may be used to signal when the rotation spatial derivation is selected. Figure 36 shows two examples where the other CTU is the left CTU and the rotation from the left CTU to the current CTU is 90 degrees.
In one variant, the rotation spatial derivation may be in competition with the temporal copy derivation and/or the rotation temporal derivation.
In another variant, there are no temporal derivations and the rotation spatial derivation is in competition with the copy spatial derivation (which of course may be copy-left and/or copy-up).
In these cases , the“rotation” derivation is applied on a spatial basis to generate additional SAO merge parameters set candidates to predict the SAO parameters set of the current CTU. Accordingly, the“rotation” can be applied to increase the list of SAO Merge candidates or to find new SAO Merge candidates for empty positions.
The advantage of using the twenty-seventh embodiment instead of using several SAO parameters set from previously decoded SAO parameters set is an increase of coding efficiency performance. Moreover it offers additional flexibility for encoder implementation by accessing to a limited number of already encoded SAO parameters sets.
Figure 37 is a flow chart represented on of example of the possible usage of the rotation derivation of SAO parameters.
In the process for decoding a set of SAO parameters for a current CTU, the sao_merge_rotation_Left_X flag is extracted from the bitstream if other merge flags are off (3613). If sao_merge_rotation_Left_X is equal to 1, for each color component YUV of the current CTU the set of SAO parameters is derived from the set SAO parameters for the same component of the left CTU YUV by applying rotation to the edge classification as described in the twenty-fifth embodiment. The SAO parameters other than the direction may be simply copied. Twenty-eighth Embodiment
In the twenty- seventh embodiment, the rotation spatial derivation was applied to one CTU. In the twenty-eighth embodiment a group-based rotation spatial derivation is applied. Then, each CTU of a current group derives its SAO parameters by rotation from the CTU at the same position in another group of the same image. For example, the group may be 3x3 CTUs. The other group may be a group above or on the left.
Again, the group-based spatial derivation may be in competition with a group-based temporal derivation (either copy or rotation or both).
Similarly, the group-based spatial derivation may be in competition with a group-based
“copy” spatial derivation (which may be copy-left and/or copy up).
Twenty-ninth Embodiment In the twenty-seventh and twenty-eighth embodiments a rotation spatial derivation was introduced. Just as the rotation temporal derivation is one of a wider class of possible direction transforming temporal derivations, so the rotation spatial derivation is one of a wider class of possible direction-changing spatial derivations. The direction-changing spatial derivation may be applied to an individual CTU or to a group of CTUs. It may be in competition with other spatial and/or temporal derivations.
Thirtieth Embodiment Figure 38 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention. According to an embodiment, the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100. Such a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user. The system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal - e.g. while earlier video/audio are being displayed/output) via the communication network 199. According to an embodiment, the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time. The system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191. The bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus. The system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g. the title of the content and other meta/storage location data for identifying, selecting and requesting the content), and for receiving and processing a user request for a content so that the requested content can be delivered/streamed from the storage apparatus to the user terminal. Alternatively, the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content. The decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
In the preceding embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer- readable medium and executed by a hardware-based processing unit.
Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non- transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer- readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Claims

1. A method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising:
selecting, from among two or more available temporal derivations of SAO parameters, a temporal derivation of SAO parameters to apply to an image part, the available temporal derivations comprising different ways of deriving at least one said SAO parameter of said image part from an SAO parameter of a collocated image part of a reference image; and
performing SAO filtering on the image part using the derived SAO parameters.
2. A method as claimed in claim 1 , wherein one said available temporal derivation is a temporal copy derivation which derives said at least one said SAO parameter of a subject image part from an SAO parameter of a collocated image part of a reference image by copying the SAO parameter of the collocated image part.
3. A method as claimed in claim 1 or 2, wherein one said available temporal derivation is a temporal rotation derivation which derives an SAO parameter representing a direction of edge filtering of a subject image part from an SAO parameter representing a direction of edge filtering of the collocated image part so that the direction of edge filtering of the subject image part is rotated relative to the direction of edge filtering of the collocated image part.
4. A method as claimed in claim 3, wherein the direction of edge filtering of the subject image part is rotated by an angle equal to 90 degrees relative to the direction of edge filtering of the collocated image part.
5. A method as claimed in claim 3 or 4, having two or more said temporal rotation derivations which rotate the direction of edge filtering of the subject image part relative to the direction of edge filtering of the collocated image part by different angles.
6. A method as claimed in claim 5, further comprising using different ones of the two or more temporal rotation derivations in a predetermined sequence to filter the same image part in two or more successive images.
7. A method as claimed in claim 6 when read as appended to claim 2, further comprising filtering an image part in a first image using the rotation copy derivation and filtering the same image part in two or more further images following the first image using different ones of the two or more temporal rotation derivations in a predetermined sequence.
8. A method as claimed in claim 7, wherein the direction of edge filtering of an image part is switched successively through all possible edge- filtering directions by the use of the rotation copy derivation and the two or more temporal rotation derivations in said predetermined sequence.
9. A method as claimed in any preceding one of claims 1 to 8, wherein the or one said temporal derivation is a temporal direction-changing derivation which derives an SAO parameter representing a direction of edge filtering of an image part from an SAO parameter representing a direction of edge filtering of the collocated image part so that the direction of edge filtering of the image part belonging to the group is changed relative to the direction of edge filtering of the collocated image part.
10. A method as claimed in any one of claims 1 to 9, wherein different reference images may be used for the temporal derivation of SAO parameters for the subject image part.
11. A method as claimed in claim 10, comprising searching through a plurality of available reference images and selecting a reference image whose said collocated image part satisfies at least one search condition.
12. A method as claimed in claim 11, wherein the or one said search condition is that said collocated image part uses SAO filtering.
13. A method as claimed in claim 11, wherein the or one said search condition is that said collocated image part uses edge-type SAO filtering.
14. A method as claimed in claim 11, 12 or 13, comprising performing a first search through the available reference images using a first search condition and if none of the available reference images satisfies the first search condition performing a second search through the available reference images using a second search condition different from the first search condition.
15. A method as claimed in claim 14, wherein the first search condition is that said collocated image part uses edge-type S AO filtering and the second search condition is that said collocated image part uses band-type SAO filtering.
16. A method as claimed in any one of claims 11 to 15, wherein the reference images are searched in order from highest coding efficiency to lowest coding efficiency.
17. A method as claimed in one of claims 10 to 16, comprising creating a list of reference images for the temporal derivation of SAO parameters based on one or more lists of reference images used for temporal prediction of the image parts of the group, wherein at least one reference image among the one or more temporal-prediction lists is excluded from the list of reference images for the temporal derivation of SAO parameters, and selecting a reference image for the temporal derivation of SAO parameters from the list of reference images for the temporal derivation of SAO parameters.
18. A method as claimed in claim 17, wherein redundant reference images among the one or more temporal-prediction lists are excluded from the list of reference images for the temporal derivation of SAO parameters.
19. A method as claimed in claim 17 or 18, wherein reference images whose respective collocated image parts do not use SAO filtering or whose respective collocated parts do not use edge-type SAO filtering are excluded from the list of reference images for the temporal derivation of SAO parameters.
20. A method as claimed in claim 17, comprising imposing a maximum on a number of reference images includable in the list of reference images for the temporal derivation of SAO parameters.
21. A method of encoding an image comprising performing sample adaptive offset (SAO) filtering using the method of any one of claims 1 to 20.
22. A method of encoding an image as claimed in claim 21, comprising:
evaluating different said derivations of SAO parameters for an image part;
comparing the evaluation results for the different derivations; and
selecting one of the derivations to apply to the image part based on the comparison results.
23. A method as claimed in claim 21 or 22, comprising generating one or more items of information representing the selected derivation and including the item(s) in a bitstream for use by a decoder.
24. A method of decoding an image comprising performing sample adaptive offset (SAO) filtering using the method of any one of claims 1 to 20.
25. A device for performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the device comprising:
means for selecting, from among two or more available temporal derivations of SAO parameters, a temporal derivation of SAO parameters to apply to an image part, the available temporal derivations comprising different ways of deriving at least one said SAO parameter of said image part from an SAO parameter of a collocated image part of a reference image; and means for performing SAO filtering on the image part using the derived SAO parameters.
26. An encoder comprising the device of claim 25.
27. A decoder comprising the device of claim 25.
28. A program which, when executed by a computer or processor, causes the computer or processor to carry out the method of any one of claims 1 to 24.
29. A signal carrying an information dataset for an image represented by a video bitstream, the image comprising a set of reconstructable samples, each reconstructable sample having a sample value, the information dataset comprising a syntax element indicating which one of two or more available temporal derivations of SAO parameters has been selected to apply to an image part, the available temporal derivations comprising different ways of deriving at least one said SAO parameter of said image part from an SAO parameter of a collocated image part of a reference image.
PCT/EP2019/064455 2018-06-05 2019-06-04 Video coding and decoding WO2019233998A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1809233.8 2018-06-05
GB1809233.8A GB2574422A (en) 2018-06-05 2018-06-05 Video coding and decoding

Publications (1)

Publication Number Publication Date
WO2019233998A1 true WO2019233998A1 (en) 2019-12-12

Family

ID=62975623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/064455 WO2019233998A1 (en) 2018-06-05 2019-06-04 Video coding and decoding

Country Status (3)

Country Link
GB (1) GB2574422A (en)
TW (1) TW202005371A (en)
WO (1) WO2019233998A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130051454A1 (en) * 2011-08-24 2013-02-28 Vivienne Sze Sample Adaptive Offset (SAO) Parameter Signaling
US20140192860A1 (en) 2013-01-04 2014-07-10 Canon Kabushiki Kaisha Method, device, computer program, and information storage means for encoding or decoding a scalable video sequence
US20140314141A1 (en) * 2013-04-19 2014-10-23 Samsung Electronics Co., Ltd. Video encoding method and apparatus, and video decoding method and apparatus based on signaling of sample adaptive offset parameters
US20140328389A1 (en) * 2011-12-22 2014-11-06 Mediatek Inc. Method and apparatus of texture image compression in 3d video coding
US9769450B2 (en) 2012-07-04 2017-09-19 Intel Corporation Inter-view filter parameters re-use for three dimensional video coding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10382766B2 (en) * 2016-05-09 2019-08-13 Qualcomm Incorporated Signalling of filtering information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130051454A1 (en) * 2011-08-24 2013-02-28 Vivienne Sze Sample Adaptive Offset (SAO) Parameter Signaling
US20140328389A1 (en) * 2011-12-22 2014-11-06 Mediatek Inc. Method and apparatus of texture image compression in 3d video coding
US9769450B2 (en) 2012-07-04 2017-09-19 Intel Corporation Inter-view filter parameters re-use for three dimensional video coding
US20140192860A1 (en) 2013-01-04 2014-07-10 Canon Kabushiki Kaisha Method, device, computer program, and information storage means for encoding or decoding a scalable video sequence
US20140314141A1 (en) * 2013-04-19 2014-10-23 Samsung Electronics Co., Ltd. Video encoding method and apparatus, and video decoding method and apparatus based on signaling of sample adaptive offset parameters

Also Published As

Publication number Publication date
GB2574422A (en) 2019-12-11
TW202005371A (en) 2020-01-16
GB201809233D0 (en) 2018-07-25

Similar Documents

Publication Publication Date Title
US11601687B2 (en) Method and device for providing compensation offsets for a set of reconstructed samples of an image
KR102408765B1 (en) Video coding and decoding
US20150341638A1 (en) Method and device for processing prediction information for encoding or decoding an image
CN115066898A (en) Cross-layer reference constraints
WO2020002117A2 (en) Methods and devices for performing sample adaptive offset (sao) filtering
WO2019234000A1 (en) Prediction of sao parameters
WO2019233997A1 (en) Prediction of sao parameters
WO2019233999A1 (en) Video coding and decoding
CN115088265A (en) Image encoding apparatus and method for controlling loop filtering
WO2019233998A1 (en) Video coding and decoding
WO2019234002A1 (en) Video coding and decoding
WO2019234001A1 (en) Video coding and decoding
WO2024213516A1 (en) Image and video coding and decoding
WO2021055640A1 (en) Methods and apparatuses for lossless coding modes in video coding
GB2629031A (en) Image and video coding and decoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19728939

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19728939

Country of ref document: EP

Kind code of ref document: A1