WO2019234002A1 - Video coding and decoding - Google Patents
Video coding and decoding Download PDFInfo
- Publication number
- WO2019234002A1 WO2019234002A1 PCT/EP2019/064459 EP2019064459W WO2019234002A1 WO 2019234002 A1 WO2019234002 A1 WO 2019234002A1 EP 2019064459 W EP2019064459 W EP 2019064459W WO 2019234002 A1 WO2019234002 A1 WO 2019234002A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- sao
- grouping
- ctu
- group
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
- H04N19/198—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters including smoothing of a sequence of encoding parameters, e.g. by averaging, by choice of the maximum, minimum or median value
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Definitions
- the present invention relates to video coding and decoding.
- VVC Versatile Video Coding
- the goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020.
- the main target applications and services include— but not limited to— 360-degree and high-dynamic-range (HDR) videos.
- HDR high-dynamic-range
- JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs.
- Some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
- UHD ultra-high definition
- JEM JVET exploration model
- SAO sample adaptive offset
- US 9769450 discloses an SAO filter for three dimensional or 3D Video Coding or 3DVC such as implemented by the HEVC standard.
- the filter directly re-uses SAO filter parameters of an independent view or a coded dependent view to encode another dependent view, or re-uses only part of the SAO filter parameters of the independent view or a coded dependent view to encode another dependent view.
- the SAO parameters are re-used by copying them from the independent view or coded dependent view.
- US 2014/0192860 Al relates to the scalable extension of HEVC.
- HEVC scalable extension aims at allowing coding/decoding of a video having multiple scalability layers, each layer being made up of a series of frames. Coding efficiency is improved by inferring, or deriving, SAO parameters to be used at an upper layer (e.g. an enhancement layer) from the SAO parameters actually used at a lower (e.g. base) layer. This is because inferring some SAO parameters makes it possible to avoid transmitting them.
- SAO parameters to be used at an upper layer (e.g. an enhancement layer) from the SAO parameters actually used at a lower (e.g. base) layer. This is because inferring some SAO parameters makes it possible to avoid transmitting them.
- a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts comprising: performing SAO filtering on a group made up of N x N image parts of the image using SAO parameters associated with the group, wherein N is three or more.
- SAO sample adaptive offset
- two or more different groupings of said image parts are available, and the group made up of the N x N image parts is formed by one of said available groupings.
- the method further comprises: comparing SAO filtering using two or more of the available groupings; and selecting one grouping based on the comparison.
- one available grouping forms another group made up of M image parts in a column of the image, and M is three or more.
- one available grouping forms another group made up of image parts in a complete column of the image (e.g. M is the height of the image).
- a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts comprising: performing SAO filtering on a group made up of M image parts in a column of the image, wherein M is three or more.
- SAO sample adaptive offset
- a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts comprising: performing SAO filtering on a group made up of image parts in a complete column of the image.
- SAO sample adaptive offset
- the method further comprises: comparing SAO filtering using two or more of the available groupings ; and selecting one grouping based on the comparison.
- a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, two or more different groupings of said image parts being available comprising: determining a grouping; and performing the SAO filtering using SAO parameters associated with the determined grouping, wherein the two or more different groupings comprise one or more of: a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; another grouping for forming another group made up of M image parts in a column of the image, wherein M is three or more; and another grouping for forming another group made up of image parts in a complete column of the image.
- SAO sample adaptive offset
- the determining comprises: comparing SAO filtering using two or more of the available groupings ; and selecting one grouping based on the comparison.
- the determining comprises obtaining, from a bitstream,: the SAO parameters associated with the grouping; data indicating a grouping, and determining the grouping using the obtained data; and/or data indicating inferring of the SAO parameters for the SAO filtering from another image part of the image or of another image, and inferring the SAO parameters using the obtained data.
- the method further comprises: obtaining, from a bitstream, data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used; and when the data indicates either or both data is available, obtaining, from the bitstream, the available data and using the obtained available data to determine the grouping or inferring the SAO parameters for the SAO filtering.
- the method further comprises, when the data indicates either of the data is not available for use, obtaining, from the bitstream, the SAO parameters associated with the grouping.
- the method further comprises providing, in a bitstream: the SAO parameters associated with the grouping; data indicating a grouping; or data indicating SAO parameters for the SAO filtering are inferred from another image part of the image or of another image.
- the method further comprises providing, in a bitstream, data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used.
- the method further comprises, when said data indicates either of the data is not available for use, not including the unavailable data in the bitstream.
- the data indicating the grouping and/or the data indicating inferring of the SAO parameters is explicitly signalled in the bitstream.
- the data indicating the grouping and/or the data indicating inferring of the SAO parameters is implicitly signalled by the bitstream (i.e. without explicitly signalling in the bitstream).
- a method of encoding an image or a sequence of images comprising performing sample adaptive offset (SAO) filtering according to the method of any one of the aforegoing first to fourth aspects.
- a method of decoding an image or a sequence of images comprising performing sample adaptive offset (SAO) filtering according to the method of any one of the aforegoing first to fourth aspects.
- a device for performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts
- the device comprising a means for performing sample adaptive offset (SAO) filtering according to the method of any one of the aforegoing first to fourth aspects.
- a device for encoding an image or a sequence of images comprising a device of the seventh aspect.
- a device for decoding an image or a sequence of images comprising a device of the seventh aspect.
- a program which, when executed causes the method of any one of the aforegoing first to sixth aspects.
- the comparison is based on rate- distortion evaluation for the two or more groupings.
- at least one available grouping is excluded from the comparison and/or evaluation.
- the two or more different (and/or available) groupings further comprise one or more of a grouping(s) for forming: a group made up of p x q image parts of the image, wherein p and q are one or larger; a group made up of j x j image parts of the image, wherein j is two; a group made up of an image part of the image; a group made up of all the image parts of the image; a group made up of image parts in a line of the image; a group made up of k image parts in a row of the image, wherein k is three or more/the width of the image; a group made up of image part(s) which use(s) temporal derivation for at least one SAO parameter; a group made up of image part(s) which use(s) temporal derivation with a modified image or image part for at least one SAO parameter; and a group made up of image part(s)
- an image part is one of: a block; a unit; a partition; a portion; a coding tree block; a largest coding unit; or a coding tree unit, for processing or coding the image.
- an image part is a coding tree unit or a coding tree block.
- the data indicating a grouping is an index or an identifier for the grouping (e.g. an element index of an array/list/table of groupings).
- the data indicating inferring of the SAO parameters is a flag(s) indicating at least one SAO parameter is to be copied from another image part of the image or of another image.
- the data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used is a flag(s) indicating enabling (e.g. used) or disabling (e.g. not used) of the data.
- the SAO parameters are parameters used in SAO filtering, for example a control data for controlling a SAO filter.
- the SAO parameters comprise one or more of: an SAO (filter) type parameter indicating whether it is an Edge Offset (EO) or a Band Offset (BO) type (or whether there is no SAO filtering at all); a direction for the Edge Offset; an SAO band (range); an SAO band position; and an SAO offset to be applied with the SAO filter.
- a signal carrying an information dataset for an image or a sequence of images represented by a bitstream the image comprising a plurality of image parts
- the information dataset comprises data for performing SAO filtering using SAO parameters associated with a group made up of : N x N image parts of the image, wherein N is three or more; M image parts in a column of the image, wherein M is three or more; or image parts in a complete column of the image.
- Figure 1 is a diagram for use in explaining a coding structure used in HEYC
- Figure 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the invention may be implemented;
- FIG. 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented;
- Figure 4 is a flow chart illustrating steps of an encoding method according to embodiments of the invention.
- Figure 5 is a flow chart illustrating steps of a loop filtering process of in accordance with one or more embodiments of the invention.
- Figure 6 is a flow chart illustrating steps of a decoding method according to embodiments of the invention.
- Figure 7A and 7B are diagrams for use in explaining edge-type SAO filtering in HEVC
- Figure 8 is a diagram for use in explaining band-type SAO filtering in HEVC.
- Figure 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications
- Figure 10 is a flow chart illustrating in more detail one of the steps of the Figure 9 process
- Figure 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications
- Figure 12A is a flow chart illustrating how SAO filtering is performed on an image part according to a first embodiment of the present invention
- Figures 12B-12C are flow charts illustrating how SAO filtering is performed on an image part according to a fourth embodiment of the present invention
- Figures 13A and 13B are flow charts illustrating how a grouping is determined using a rate-distortion compromise comparison according to the fourth embodiment of the present invention
- Figures 14A-14B are flow charts illustrating how a determined grouping and/or SAO parameters is communicated according to a fifth embodiment of the present invention.
- Figures 14C-14D are flow charts illustrating how a determined grouping and/or SAO parameters are communicated according to a sixth embodiment of the present invention.
- Figure 15 is a schematic view for use in explaining a temporal derivation of SAO parameters in a seventh embodiment of the present invention
- Figure 16 is a flow chart for use in explaining a method of decoding an image in the seventh embodiment
- Figure 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in a CTU-level non-temporal derivation of SAO parameters in an eighth embodiment of the present invention
- Figure 18 shows one of the steps of Figure 17 in more detail
- Figure 19 shows another one of the steps of Figure 17 in more detail
- Figure 20 shows yet another one of the steps of Figure 17 in more detail
- Figure 21 is a flow chart for use in explaining how to evaluate a cost of a temporal derivation in the eighth embodiment
- Figure 22 is a flow chart for use in explaining how to compare the costs of the temporal derivation and a further, non-temporal derivation, in the eighth embodiment
- Figure 23 shows various different groupings 1201-1206 of CTUs in a slice
- Figure 24 is a diagram showing image parts of a frame in a non-temporal derivation of SAO parameters in which a first method of sharing SAO parameters is used
- Figure 25 is a flowchart of an example of a process for setting SAO parameters in the non-temporal derivation of Figure 24;
- Figure 26 is a flowchart of an example of a process for setting of SAO parameters in another non-temporal derivation using the first sharing method to share SAO parameters among a column of CTUs;
- Figure 27 is a flowchart of an example of a process for setting of SAO parameters in yet another non-temporal derivation using the first sharing method to share SAO parameters among a group of NxN CTUs;
- Figure 28 is a diagram showing image parts of one NxN group in the non-temporal derivation of Figure 27;
- Figure 29 illustrates an example of how to select the SAO parameter derivation in an eleventh embodiment of the present invention
- Figure 30 is a flow chart illustrating a decoding process suitable for a second method of sharing SAO parameters among image parts of a group
- Figure 31 is a diagram showing image parts of multiple 2x2 groups in a sixteenth embodiment of the present invention
- Figure 32 is a schematic view for use in explaining a process of deriving SAO parameters in a temporal rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention
- Figure 33 is a schematic view of the temporal rotation derivation of Figure 32;
- Figure 34 is a schematic view for use in explaining a process of deriving SAO parameters in which different temporal derivations are available;
- Figure 35 is a flowchart for use in explaining a decoding process in a twenty-fifth embodiment of the present invention.
- Figure 36 is a schematic view for use in explaining a process of deriving SAO parameters in a spatial rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention.
- Figure 37 is a flowchart for use in explaining a decoding process in the twenty-seventh embodiment.
- Figure 38 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments of the present invention.
- HEVC High Efficiency Video Coding
- the present invention is not limited thereto. It is understood that other embodiments of the present invention may be based on any process or device that involves SAO filtering being performed an image or an image part.
- a SAO filter according an embodiment of the present invention may be used in any image/video encoding or decoding process or device, such as a future video coding standard compliant device.
- a HEVC compliant method/process or device e.g. an encoder, a decoder, a SAO filter of HEVC
- a decoder according to a later described embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a tablet or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user.
- an encoder according to a later described embodiment is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode (and communicate/transmit thereafter).
- a camera e.g. a closed-circuit television or video surveillance camera
- a network camera e.g. a closed-circuit television or video surveillance camera
- Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard.
- a video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
- HEVC High Efficiency Video Coding
- An image 2 of the sequence may be divided into slices 3.
- a slice may in some instances constitute an entire image.
- These slices are divided into non-overlapping Coding Tree Units (CTUs).
- a Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards.
- a CTU is also sometimes referred to as a Largest Coding Unit (LCU).
- LCU Largest Coding Unit
- a CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
- CTB Coding Tree Block
- a CTU is generally of size 64 pixels x 64 pixels.
- Each CTU may in turn be iteratively divided into smaller variable-size Coding Units (CUs) 5 using a quadtree decomposition.
- CUs variable-size Coding Units
- Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU).
- the maximum size of a PU or TU is equal to the CU size.
- a Prediction Unit corresponds to the partition of the CU for prediction of pixels values.
- Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs.
- a Transform Unit is an elementary unit that is subjected to spatial transformation using DCT.
- a CU can be partitioned into TUs based on a quadtree representation 607.
- NAL Network Abstraction Layer
- coding parameters of the video sequence are stored in dedicated NAL units called parameter sets.
- SPS Sequence Parameter Set
- PPS Picture Parameter Set
- HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream.
- the VPS is a new type of parameter set defined in HEVC, and applies to all of the layers of a bitstream.
- a layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer.
- HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
- FIG. 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented.
- the data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200.
- the data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN).
- WAN Wide Area Network
- LAN Local Area Network
- Such a network may be for example a wireless network (Wifi / 802.1 la or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks.
- the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
- the data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201.
- the server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
- the compression of the video data may be for example in accordance with the HEVC format or H.264/AVC format.
- the client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker.
- the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
- a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
- FIG. 3 schematically illustrates a processing device 300 configured to implement at least one embodiment of the present invention.
- the processing device 300 may be a device such as a micro-computer, a workstation or a light portable device.
- the device 300 comprises a communication bus 313 connected to:
- central processing unit 311 such as a microprocessor, denoted CPU;
- ROM read only memory
- RAM random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention;
- the apparatus 300 may also include the following components:
- -a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention;
- the disk drive being adapted to read data from the disk 306 or to write data onto said disk;
- -a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means.
- the apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
- peripherals such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
- the communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it.
- the representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
- the disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
- CD-ROM compact disk
- ZIP disk or a memory card
- the executable code may be stored either in read only memory 306, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously.
- the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304.
- the central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means.
- the program or programs that are stored in a non-volatile memory for example on the hard disk 304 or in the read only memory 306, are transferred into the random access memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
- the apparatus is a programmable apparatus which uses software to implement the invention.
- the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
- Figure 4 illustrates a block diagram of an encoder according to at least one embodiment of the invention.
- the encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
- An original sequence of digital images / ⁇ to m 401 is received as an input by the encoder
- Each digital image is represented by a set of samples, known as pixels.
- a bitstream 410 is output by the encoder 400 after implementation of the encoding process.
- the bitstream 410 comprises a plurality of encoding units or slices, each slice comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
- the input digital images /0 to m 401 are divided into blocks of pixels by module 402.
- the blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered).
- a coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
- Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
- Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405.
- a reference image/picture from among a set of reference images/pictures 416 is selected, and a portion of the reference image/picture, also called reference area or image portion, which is the closest area to the given block to be encoded, is selected by the motion estimation module 404.
- Motion compensation module 405 then predicts the block to be encoded using the selected area.
- the difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405.
- the selected reference area is indicated by a motion vector.
- a prediction direction is encoded.
- at least one motion vector is encoded.
- Motion vector predictors of a set of motion information predictors is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417.
- the encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion.
- an encoding cost criterion such as a rate-distortion criterion.
- a transform such as DCT
- the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409.
- the encoded residual block of the current block being encoded is inserted into the bitstream 410.
- the encoder 400 also performs decoding of the encoded image in order to produce a reference image for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames.
- the dequantization module 411 performs dequantization of the quantized data, followed by an inverse transform by inverse transform module 412.
- the intra prediction module 413 uses the prediction information to determine which predictor to use for a given block and the motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images/pictures 416.
- Post filtering is then applied by module 415 to filter the reconstructed frame of pixels.
- an SAO loop filter is used in which compensation offsets are added to the pixel values of the reconstructed pixels of the reconstructed image
- Figure 5 is a flow chart illustrating steps of loop filtering process according to at least one embodiment of the invention.
- the encoder generates the reconstruction of the full frame.
- a deblocking filter is applied on this first reconstruction in order to generate a deblocked reconstruction 53.
- the aim of the deblocking filter is to remove block artifacts generated by residual quantization and block motion compensation or block Intra prediction. These artifacts are visually important at low bitrates.
- the deblocking filter operates to smooth the block boundaries according to the characteristics of two neighboring blocks. The encoding mode of each block, the quantization parameters used for the residual coding, and the neighboring pixel differences in the boundary are taken into account.
- the deblocking filter improves the visual quality of the current frame by removing blocking artifacts and it also improves the motion estimation and motion compensation for subsequent frames. Indeed, high frequencies of the block artifact are removed, and so these high frequencies do not need to be compensated for with the texture residual of the following frames.
- the deblocked reconstruction is filtered by a sample adaptive offset (SAO) loop filter in step 54 using SAO parameters determined in accordance with embodiments of the invention.
- the resulting frame 55 may then be filtered with an adaptive loop filter (ALF) in step 56 to generate the reconstructed frame 57 which will be displayed and used as a reference frame for the following Inter frames.
- SAO sample adaptive offset
- ALF adaptive loop filter
- step 54 each pixel of the frame region is classified into a class or group.
- the same offset value is added to every pixel value which belongs to a certain class or group.
- FIG. 6 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention.
- the decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
- the decoder 60 receives a bitstream 61 comprising encoding units, each one being composed of a header containing information on encoding parameters and a body containing the encoded video data.
- the encoded video data is entropy encoded, and the motion vector predictors’ indexes are encoded, for a given block, on a predetermined number of bits.
- the received encoded video data is entropy decoded by module 62.
- the residual data are then dequantized by module 63 and then an inverse transform is performed by module 64 to obtain pixel values.
- the mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks of image data.
- an INTRA predictor is determined by intra prediction module 65 based on the intra prediction mode specified in the bitstream.
- the motion prediction information is extracted from the bitstream so as to find the reference area used by the encoder.
- the motion prediction information is composed of the reference frame index and the motion vector residual.
- the motion vector predictor is added to the motion vector residual in order to obtain the motion vector by motion vector decoding module 70.
- Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor, for the current block has been obtained the actual value of the motion vector associated with the current block can be decoded and used to perform motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to perform the motion compensation 66. The motion vector field data 71 is updated with the decoded motion vector in order to be used for the prediction of subsequent decoded motion vectors. Finally, a decoded block is obtained. Post filtering is applied by post filtering module 67 similarly to post filtering module 815 applied at the encoder as described with reference to Figure 5. A decoded video signal 69 is finally provided by the decoder 60.
- SAO filtering is to improve the quality of the reconstructed frame by sending additional data in the bitstream in contrast to the deblocking filter where no information is transmitted.
- each pixel is classified into a predetermined class or group and the same offset value is added to every pixel sample of the same class/group.
- One offset is encoded in the bitstream for each class.
- SAO loop filtering has two SAO types: an Edge Offset (EO) type and a Band Offset (BO) type.
- EO Edge Offset
- BO Band Offset
- An example of Edge Offset type is schematically illustrated in Figures 7A and 7B
- an example of Band Offset type is schematically illustrated in Figure 8.
- SAO filtering is applied CTU by CTU.
- the parameters needed to perform the SAO filtering (set of SAO parameters) are selected for each CTU at the encoder side and the necessary parameters are decoded and/or derived for each CTU at the decoder side.
- This offers the possibility of easily encoding and decoding the video sequence by processing each CTU at once without introducing delays in the processing of the whole frame.
- SAO filtering is enabled, only one SAO type is used: either the Edge Offset type filter or the Band Offset type filter according to the related parameters transmitted in the bitstream for each classification.
- One of the SAO parameters in HEVC is an SAO type parameter saojypejdx which indicates for the CTU whether EO type, BO type or no SAO filtering is selected for the CTU concerned.
- the SAO parameters for a given CTU can be copied from the upper or left CTU, for example, instead of transmitting all the SAO data.
- One of the SAO parameters in HEVC is a saojnerge p flag, which when set indicates that the SAO parameters (other than the sao merge up flag) for the subject CTU should be copied from the upper CTU.
- Another of the SAO parameters in HEVC is a saojnergejeft flag, which when set indicates that the SAO parameters for the subject CTU should be copied from the left CTU.
- SAO filtering may be applied independently for different color components (e.g. YUV) of the frame.
- one set of SAO parameters may be provided for the luma component Y and another set of SAO parameters may be provided for both chroma components U and V in common.
- one or more SAO parameters may be used as common filtering parameters for two or more color components, while other SAO parameters are dedicated (per-component) filtering parameters for the color components.
- the SAO type parameter saojypejdx is common to U and V, and so is a EO class parameter which indicates a class for EO filtering (see below), whereas a BO class parameter which indicates a group of classes for BO filtering has dedicated (per-component) SAO parameters for U and V.
- Edge Offset type involves determining an edge index for each pixel by comparing its pixel value to the values of two neighboring pixels. Moreover, these two neighboring pixels depend on a parameter which indicates the direction of these two neighboring pixels with respect to the current pixel. These directions are the 0-degree (horizontal direction), 45-degree (diagonal direction), 90-degree (vertical direction) and 135-degree (second diagonal direction). These four directions are schematically illustrated in Figure 7A.
- the table of Figure 7B gives the offset value to be applied to the pixel value of a particular pixel“C” according to the value of the two neighboring pixels Cnl and Cn2 at the decoder side.
- the offset to be added to the pixel value of the pixel C is“+ 01”.
- the offset to be added to this pixel sample value is“+ 02”.
- the offset to be applied to this pixel sample is“- 03”.
- the value of C is greater than the two values of Cnl or Cn2, the offset to be applied to this pixel sample is“- 04”.
- each offset (01, 02, 03, 04) is encoded in the bitstream.
- the sign to be applied to each offset depends on the edge index (or the Edge Index in the HEVC specifications) to which the current pixel belongs. According to the table represented in Figure 7B, for Edge Index 0 and for Edge Index 1 (01, 02) a positive offset is applied. For Edge Index 3 and Edge Index 4 (03, 04), a negative offset is applied to the current pixel.
- the direction for the Edge Offset amongst the four directions of Figure 7A is specified in the bitstream by a sao_eo_class_luma” field for the luma component and a“sao_eo_class_chroma” field for both chroma components U and V.
- the difference between the pixel value of C and the pixel value of both its neighboring pixels Cnl and Cn2 can be shared for current pixel C and its neighbors.
- the term sign (Cnl- C) has already computed for the previous pixels (to be precise it was computed as C’-Cn2’ at a time when the current pixel C’ at that time was the present neighboring pixel Cnl and the neighboring pixel Cn2’ was what is now the current pixel C).
- this sign (c n l- c) does not need to be computed again.
- Band Offset type in SAO also depends on the pixel value of the sample to be processed.
- a class in SAO Band offset is defined as a range of pixel values. Conventionally, for all pixels within a range, the same offset is added to the pixel value. In the HEVC specifications, the number of offsets for the Band Offset filter is four for each reconstructed block or frame area of pixels (CTU), as schematically illustrated in Figure 8.
- SAO Band offset splits the full range of pixel values into 32 ranges of the same size. These 32 ranges are the bands (or classes) of SAO Band offset.
- Classifying the pixels into 32 ranges of the full interval includes 5 bits checking needed to classify the pixels values for fast implementation i.e. only the 5 first bits (5 most significant bits) are checked to classify a pixel into one of the 32 classes/ ranges of the full range.
- each band or class contains 8 pixel values.
- a group 40 of bands represented by the grey area (40), is used, the group having four successive bands 41, 42, 43 and 44, and information is signaled in the bitstream to identify the position of the group, for example the position of the first of the 4 bands.
- the syntax element representative of this position is the “ sao_band jpositiorT’ field in the HEVC specifications. This corresponds to the start of band 41 in Figure 8.
- 4 offsets corresponding respectively to the 4 bands are signaled in the bitstream.
- FIG. 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications.
- the process of Figure 9 is applied for each CTU to generate a set of SAO parameters for all components.
- a predictive scheme is used for the CTU mode. This predictive mode involves checking if the CTU on the left of the current CTU uses the same SAO parameters (this is specified in the bitstream through a flag named “ saojnergejeft Jlag”). If not, a second check is performed with the CTU above the current CTU (this is specified in the bitstream through a flag named“ sao_merge_up Jlag”). This predictive technique enables the amount of data representing the SAO parameters for the CTU mode to be reduced. Steps of the process are set out below.
- step 503 the“ saojnergejeft Jlag” is read from the bitstream 502 and decoded. If its value is true, then the process proceeds to step 504 where the SAO parameters of left CTU are copied for the current CTU. This enables the types for YUV of the SAO filter for the current CTU to be determined in step 508.
- step 503 If the outcome is negative in step 503 then the“ sao_merge_up Jlag” is read from the bitstream and decoded. If its value is true, then the process proceeds to step 505 where the SAO parameters of the above CTU are copied for the current CTU. This enables the types of the SAO filter for the current CTU to be determined in step 508.
- step 505 If the outcome is negative in step 505, then the SAO parameters for the current CTU are read and decoded from the bitstream in step 507 for the Luma Y component and both U and V components (501) (551) for the type.
- the offsets for Chroma are independent.
- step 508 the parameters are obtained and the type of SAO filter is determined in step 508.
- FIG. 10 is a flow chart illustrating steps of a process of parsing of SAO parameters in the bitstream 601 at the decoder side.
- the”sao ypejdx_X” syntax element is read and decoded.
- the code word representing this syntax element can use a fixed length code or could use any method of arithmetic coding.
- the syntax element sao_type_idx_X enables determination of the type of SAO applied for the frame area to be processed for the colour component Y or for both Chroma components U & V. For example, for a YUV 4:2:0 sequence, two components are considered: one for Y, and one for U and V.
- the “ saoJypejdxJC’ can take 3 values as follows depending on the SAO type encoded in the bitstream.‘O’ corresponds to no SAO,‘G corresponds to the Band Offset case illustrated in Figure 8 and‘2’ corresponds to the Edge Offset type filter illustrated in Figures 3 A and 3B.
- YUV color components are used in HE VC (sometimes called Y, Cr and Cb components), it will be appreciated that in other video coding schemes other color components may be used, for example RGB color components.
- the techniques of the present invention are not limited to use with YUV color components. and can be used with RGB color components or any other color components.
- a test is performed to determine if the“ saoJypejdx ’ is strictly positive. If“saoJypejdxJC’ is equal to“0” signifying that there is no SAO for this frame area (CTU) for Y if X is set equal to Y and that there is no SAO for this frame area for U and V if X is set equal to U and V. The determination of the SAO parameters is complete and the process proceeds to step 608. Otherwise if the“ saojypejdx” is strictly positive, this signifies that SAO parameters exist for this CTU in the bitstream.
- step 606 a loop is performed for four iterations.
- the four iterations are carried in step 607 where the absolute value of offset j is read and decoded from the bitstream.
- These four offsets correspond either to the four absolute values of the offsets (01, 02, 03, 04) of the four Edge indexes of SAO Edge Offset (see Figure 7B) or to the four absolute values of the offsets related to the four ranges of the SAO band Offset (see Figure 8).
- MAX abs SAO offset value (1 « (Min(bitDepth, l0)-5))-l
- « is the left (bit) shift operator.
- This formula means that the maximum absolute value of an offset is 7 for a pixel value bitdepth of 8 bits, and 31 for a pixel value bitdepth of 10 bits and beyond.
- the current HE VC standard amendment addressing extended bitdepth video sequences provides similar formula for a pixel value having a bitdepth of 12 bits and beyond.
- the absolute value decoded may be a quantized value which is dequantized before it is applied to pixel values at the decoder for SAO filtering. An indication of use or not of this quantification is transmitted in the slice header.
- the sign is signaled in the bitstream as a second part of the offset if the absolute value of the offset is not equal to 0.
- the bit of the sign is bypassed when CAB AC is used.
- the signs of the offsets for the Band Offset mode are decoded in steps 609 and 610, except for each offset that has a zero value, before the following step 604 is performed in order to read in the bitstream and to decode the position“ sao_band _position_X” of the SAO band as illustrated in Figure 8.
- the read syntax element is“sao eo class luma” and if X is set equal to U and V, the read syntax element is“sao eo class chroma”.
- FIG 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications, for example during the step 67 in Figure 6.
- this image part is a CTU.
- This same process 700 is also applied in the decoding loop (step 415 in Figure 4) at the encoder in order to produce the reference frames used for the motion estimation and compensation of the following frames.
- This process is related to the SAO filtering for one color component (thus suffix“_X” in the syntax elements has been omitted below).
- An initial step 701 comprises determining the SAO filtering parameters according to processes depicted in Figures 9 and 10.
- the SAO filtering parameters are determined by the encoder and the encoded SAO parameters are included in the bitstream. Accordingly, on the decoder side in step 701 the decoder reads and decodes the parameters from the bitstream.
- Step 701 obtains the saojypejdx and if it equals 1 also obtains the sao_band jposition 702 and if it equals 2 also obtains the sao o lass Junta or sao_eo_class_chroma (according to the color component processed). If the element saojypejdx is equal to 0 the SAO filtering is not applied.
- Step 701 obtains also an offsets table 703 of the 4 offsets.
- a variable i used to successively consider each pixel Pi of the current block or frame area (CTU), is set to 0 in step 704.
- “frame area” and“image area” are used interchangeably in the present specification.
- a frame area in this example is a CTU in the
- step 706 pixel p is extracted from the frame area 705 which contains N pixels. This pixel p is classified in step 707 according to the Edge offset classification described with reference to Figures 7A & 7B or Band offset classification as described with reference to Figure 8.
- the decision module 708 tests if ' is in a class that is to be filtered using the conventional SAO filtering.
- value J is extracted in step 710 from the offsets table 703.
- This filtered pixel is inserted in step 713 into the filtered frame area 716.
- step 713 If is not in a class to be SAO filtered then (709) is inserted in step 713 into the filtered frame area 716 without filtering.
- step 713 the variable i is incremented in step 714 in order to filter the subsequent pixels of the current frame area 705 (if any - test 715).
- step 715 the filtered frame area 716 is reconstructed and can be added to the SAO reconstructed frame (see frame 68 of Figure 6 or 416 of Figure 4).
- JEM JVET exploration model
- SAO sample adaptive offset
- Embodiments of the present invention described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in a current image from one or more SAO parameters of a collocated image part in a reference image. These techniques may be referred to as temporal derivation techniques for SAO parameters. Further embodiments described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in an image from one or more SAO parameters of another image part of the same image. These techniques may be referred to as spatial derivation techniques for SAO parameters.
- First six of the first group of embodiments focuss on improving the signalling efficiency.
- SAO filtering is performed CTU by CTU which can be resource intensive.
- a grouping (of one or more CUTs) is not used in SAO filtering in HEVC.
- a temporal derivation of SAO parameters is not used in HEVC.
- different groupings and use thereof in SAO filtering i.e. a non-temporal derivation (NTD) of SAO parameters using grouping
- NTD non-temporal derivation
- a group of image parts is formed (by a grouping) and the SAO filtering parameters are determined based on this group/grouping of image parts. Also, where applicable, the use of a particular grouping or use of the temporal derivation is signalled for this group of image parts, rather than for each image part individually.
- the grouping to which it belongs is used to derive/determine/obtain the SAO parameters for the image part. The grouping therefore serves as an indicator/identifier for associated SAO parameters.
- a temporal derivation may be used to derive at least one of the SAO parameters of the image part from at least one SAO parameter of a collocated image part in a reference image.
- the collocated image part in the reference image therefore serves as a source image part for the image part to be derived.
- different image parts of the group can have different SAO parameters depending on the SAO parameters of the respective collocated image parts. Accordingly, with very light signalling, image parts belonging to a given group of image parts can determine/obtain/derive the SAO parameters to use in the SAO filtering. Also, the image parts belonging to the given group can use temporal derivation and benefit from different (and efficient) SAO parameters.
- a collocated image/image part/CTU is an associated/corresponding image/image part/CTU of the image/image part/CTU.
- This collocation relationship is defined by a preset relationship between the two images/image parts/CTUs.
- the preset relationship may be that one (e.g. a reference image) is used to predict/determine a value for encoding/decoding/predicting the other (e.g. the current image being encoded/decoded).
- the present relationship may be that they are located at the same position (e.g.
- the present relationship may be that they are used to encode/decode/predict a value for the same pixel(s)/element(s) of the image.
- An embodiment according to the present invention improves the coding efficiency of these SAO parameters by enabling derivation of them based on a grouping. For example, an encoder makes a choice/determination to group one or more image parts, such as one or more CTUs. Then, rate distortion (RD) cost for coding/communicating the SAO parameters for different groups (groupings) are evaluated to determine the best performing group (grouping) for a plurality of image parts (e.g. CTUs). Following embodiments describe determining based on RD cost evaluations but it is understood that according to alternative embodiments, other criteria than the RD cost are used to compare different groups (groupings).
- RD cost rate distortion
- the SAO parameters set selected for (i.e. associated with) the group is transmitted/signalled once for any one of the CTUs in the group, and these parameters are shared with other CTUs in the group, for example using data or a flag for obtaining these parameters (e.g. using SAO Merge flags such as sao_merge_left_flag and/or sao merge up flag).
- the SAO parameters set selected for (i.e. associated with) the group is transmitted/signalled for the first CTU (in the raster scan order) and these parameters are shared with the other CTUs in the group, for example using the SAO Merge flags (e.g. sao_merge_left_flag and/or sao_merge_up_flag).
- An advantage of using a grouping is that it provides a diversity/ variety for achieving a better RD compromise for SAO filtering parameters since the CTU level (e.g. as in SAO filtering in HEVC) is not always the best compromise. So when many groupings are compared/competing, a particularly high efficiency can be achieved. However, this must also be balanced with the cost of performing RD evaluation for many different groupings.
- First embodiment Figure 12A is a flow chart illustrating how SAO filtering is performed on an image part according to a first embodiment of the present invention.
- sample adaptive offset (SAO) filtering is performed 9000 on a group of image parts of the image, which comprises a plurality of image parts, using SAO parameters associated with that particular group (i.e. grouping).
- SAO sample adaptive offset
- an encoder or a decoder performs the SAO filtering 9000 to obtain a filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
- the groupings with which SAO parameters are associated comprise one or both of: a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; and a grouping for forming a group made up of M image parts in a column of the image, wherein M is three or more, or M is the height of the image.
- the groupings with which SAO parameters are associated further comprise one or more of: a grouping for forming a group made up of all of the plurality of image parts of the image; a grouping for forming a group made up of one image part of the image; a grouping for forming a group made up of two or more image parts of the image; a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; a grouping for forming a group made up of M image parts in a column of the image, wherein M is three or more, or wherein M is the height of the image (i.e.
- the group is made up of image parts in the complete column of the image); a grouping for forming a group made up of 2 x 2 image parts of the image; a grouping for forming a rectangular group; and a grouping for forming a group made up of k image parts in a line/row of the image, wherein k is two or more/the complete width of the image.
- the groupings further comprises one or more of: a grouping for forming a group made up of image part(s) which use(s) temporal derivation for at least one SAO parameter; a grouping for forming a group made up of image part(s) which use(s) temporal derivation with a modified image or image part for at least one SAO parameter; and a grouping for forming a group made up of image part(s) which uses(s) temporal derivation with another image or image part which has been rotated by 45, 90 or 135 degrees for at least one SAO parameter.
- an image part is one of: a block; a unit; a partition; a portion; a coding tree block; a largest coding unit; or a coding tree unit, for processing or coding the image.
- an image part is a coding tree unit.
- an image part is a coding tree block.
- Figure 23 shows various different (non-temporal derivation or spatial derivation) groupings 1201-1206 for forming groups made up of image parts (e.g. CTUs) in a slice (or a frame or an image). It illustrates several CTU groupings for a frame or a slice containing 40 (8x5) CTUs. More detailed description of these groupings and how they are used in the SAO filtering can be found later in the later described embodiments, and only a short introduction of these groupings are provided here.
- image parts e.g. CTUs
- FIG. 23 shows various different (non-temporal derivation or spatial derivation) groupings 1201-1206 for forming groups made up of image parts (e.g. CTUs) in a slice (or a frame or an image). It illustrates several CTU groupings for a frame or a slice containing 40 (8x5) CTUs. More detailed description of these groupings and how they are used in the SAO filtering can be found later in
- the first grouping 1201 forms a group made up of individual CTUs (a CTU level grouping). This first grouping requires one set of SAO parameters per CTU. It corresponds to the CTU-level derivation in the eighth embodiment.
- the second grouping 1202 forms a group made up of all CTUs of the entire image (or slice/frame).
- This second one 1202 is a frame level grouping.
- all CTUs of the image/frame (of the slice which is either the entire frame or a part thereof) share the same SAO parameters.
- Figure 25 illustrates an example for setting SAO parameters at a frame/slice level without using a new SAO classification (i.e. without adding a new classification to the HE VC SAO classification) at the encoder side, which is described in more detail later.
- An advantage of this embodiment is it provides a different compromise in terms of rate distortion to the CTU level comprise, which can be better than the CTU level compromise in some cases.
- the amount of distortion experienced should be less than with the CTU level grouping but the rate will be very low. So which one is the best for a particular current frame/slice/image will depend the characteristics of the current frame/slice/image (e.g. size thereof).
- the first and second groupings 1201, 1202 provide two extreme RD compromises. So, other groupings which are an intermediate grouping between the two extreme groupings (i.e. an intermediate level grouping) are also considered.
- the third grouping 1203 is an intermediate grouping which compromises between the CTU level and the frame level (i.e. the first and second groupings 1201 1202).
- the third grouping 1203 forms a group made up of CTUs in a column of the image.
- Figure 26 illustrates an example for, at the encoder side, setting SAO parameters for a group made up of column CTUs.
- the fourth grouping 1204 which form a group made up of CTUs in a line or a row. As a frame or a slice is often rectangular and their widths are larger than their height, the fourth grouping 1204 is often an RD compromise closer to the second grouping 1202 (frame level) than the third grouping 1203.
- NxN square CTU groupings
- the fifth grouping 1205 forms a group made up of 2x2 CTUs.
- the sixth grouping 1206 forms a group made up of 3x3 CTUs.
- the sixth grouping 1206 is used because it provides a good balance for the RD compromise for coding/communicating a sequence of images and/or images of high quality/resolution.
- Figure 27 illustrates an example for, at the encoder side, setting SAO parameters for such NxN CTU groupings.
- An advantage of such NxN CTUs groupings is that it is easy to create several RD compromises for SAO parameters (e.g. by varying N).
- the value N depends on the size of the frame/slice.
- N is equal to 2 or 3.
- N is 3. This offers an efficient compromise, for example when encoding/decoding a sequence of images (of higher quality/resolution).
- this variable can, in that case, have a dimension which is the CTU address:
- a group of image parts formed by one of the aforementioned temporal derivation groupings is also compared/competed with the N c N CTU groupings to set the SAO parameters using a RD cost evaluation shown in Figure 21.
- a grouping forming a group of NxN image parts are particularly advantageous when encoding/decoding a sequence of high quality/resolution images.
- sample adaptive offset (SAO) filtering is performed on an image comprising a plurality of image parts, wherein the performing comprises performing the SAO filtering on a group made up of N x N image parts of the image using SAO parameters associated with the group, wherein N is three or more. Two or more different groupings of the image parts are available, and the group made up of the N x N image parts is formed by one of the available groupings.
- one available grouping forms another group made up of M image parts in a column of the image, and M is three or more, or the height of the image.
- the two or more available groupings further comprise one or more of the aforementioned non-temporal derivation (i.e. spatial derivation) groupings.
- the two or more available groupings further comprise one or more of the aforementioned temporal derivation groupings.
- At least one available grouping is excluded from the comparison and/or evaluation (e.g. the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination).
- a grouping forming a group made up of image parts (CTUs) in a column of the image also offer a good balance between RD cost and computing complexity when encoding/decoding a sequence of high quality/resolution images.
- sample adaptive offset (SAO) filtering is performed on an image comprising a plurality of image parts, wherein the performing comprises performing SAO filtering on a group made up of M image parts in a column of the image, wherein M is three or more, or the height of the image.
- Two or more different groupings of the image parts are available, and the group made up of the M image parts in the column is formed by one of the available groupings.
- SAO filtering using two or more of the available groupings are compared based on rate-distortion evaluation for the two or more available groupings, and one grouping is selected based on the comparison.
- one available grouping forms another group made up of N x N image parts of the image, wherein N is three or more.
- the two or more available groupings further comprise one or more of the aforementioned non temporal derivation (i.e. spatial derivation) groupings.
- the two or more available groupings further comprise one or more of the aforementioned temporal derivation groupings.
- At least one available grouping is excluded from the comparison and/or evaluation (e.g. the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination).
- the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination.
- Figures 12B-12C are flow charts illustrating how SAO filtering is performed on an image part using two or more different groupings according to a fourth embodiment of the present invention.
- the performing the SAO filtering on an image 9000 comprises: determining a grouping 9100; and performing the SAO filtering using SAO parameters associated with the determined grouping 9200.
- an encoder or a decoder performs the determining 9100 and the SAO filtering 9200 to obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
- the determining 9100 comprises comparing SAO filtering using two or more of the available groupings 9110 and selecting one grouping based on this comparison 9120, the selected grouping being for use with performing the SAO filtering on the image parts.
- a group made up of N x N image parts or M image parts in a column of the image is formed by one of the available groupings.
- the two or more different groupings comprise one or both of: a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; and another grouping for forming another group made up of M image parts in a column of the image, wherein M is three or more, or the height of the image.
- the two or more available groupings further comprise one or more of the aforementioned non-temporal derivation (e.g. spatial derivation) groupings.
- the two or more available groupings further comprises one or more of the aforementioned temporal derivation groupings.
- the comparison is based on rate-distortion evaluation for the two or more groupings.
- at least one available grouping is excluded from the comparison and/or evaluation (e.g. the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination).
- an encoder performs the comparing 9110 and the selecting 9120 to obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
- a corresponding decoder (of the encoder) also performs the comparing 9110 and the selecting 9120.
- the encoder communicates (e.g. provides/includes, in a bitstream) data or a flag(s) for use with the determining 9100, and a corresponding decoder (of the encoder) performs the determining 9100 using the data or flag(s) (obtained from the bitstream).
- Figures 13A-13B illustrate examples on how such a comparison based on rate- distortion evaluation is used to determine a grouping for the SAO filtering (and how an encoder signals the determined grouping).
- merge flags are used within the group so if applied to HEVC, the resulting bitstream can be decoded by an HEVC-compliant decoder. It is understood that according to an alternative embodiment, the merge flag may not be used within the group.
- the example of Figure 13A compares two available groupings and the example of Figure 13B compares more than two available groupings, and the description provided below applies to both.
- the current slice/frame 9701 is used to set the CTUStats table 9703 for each CTU 9702.
- This table 9703 is used to evaluate RD costs for an available grouping 9704 (e.g. any one of the aforementioned groupings such as a CTU level) and another (for Figure 13A) or more (for Figure 13B) available grouping 9705, 9706 (e.g. any one or more of the aforementioned groupings different from the available grouping 9704).
- the available grouping with the best RD cost (i.e. the best derivation) is selected/determined according to the rate distortion criterion computed for each available SAO parameter derivation 9710.
- the SAO parameters set for each CTU are set 9711 according to the derivation selected in step 9710. These SAO parameters are then associated with the selected/determined group and used to apply the SAO filtering 9713 in order to obtain the filtered frame/slice 9714.
- the available groupings comprise a group of image parts formed by one of the aforementioned temporal derivation groupings, which is also compared/competed with the other available groupings, for example using a RD cost evaluation similar to that shown in Figure 21.
- the SAO parameters are shared among the CTUs of a group, and this sharing is achieved by the encoder signalling/providing in the bitstream which derivation of (i.e. grouping for) the SAO parameters is selected (e.g. CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation).
- grouping for the SAO parameters is selected (e.g. CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation).
- a possible indexing scheme is shown in Table 2 below:
- Figure 30 illustrates an example of a corresponding decoding process when the CTU grouping is signaled in a slice header, which is described in more detail with reference to a later described embodiment.
- data or a flag(s) can be used to communicate the grouping selection and/or SAO parameters between an encoder and a decoder.
- data or a flag indicating a grouping index is provided in a slice header.
- data/flags might be used according to embodiments of the present invention. It is understood that according to other embodiments, modifications can be made on how the data/flags are used in these embodiments as long as the encoder and the decoder are able to obtain the same SAO parameters to use with the SAO filtering.
- Figures 14A-14B are flow charts illustrating how a determined grouping and/or SAO parameters is communicated according to a fifth embodiment of the present invention.
- performing SAO filtering according to any one of the aforegoing embodiments further comprises an encoder (after the grouping determination 9100) providing 9150, in a bitstream, data or a flag (e.g. a grouping/group index) indicating the determined grouping and the SAO parameters associated with the determined grouping.
- the SAO parameters may be provided with data for an image part in the grouping (e.g. the first processed/encoded image part/CTU in the raster scan order).
- performing SAO filtering according to any one of the aforegoing embodiments further comprises a decoder obtaining 9155, from a bitstream, the SAO parameters associated with the grouping, and performing the SAO filtering 9200 using the obtained SAO parameters to obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
- Figures 14C-14D are flow charts illustrating how a determined grouping and/or SAO parameters are communicated according to a sixth embodiment of the present invention.
- performing SAO filtering according to any one of the aforegoing embodiments further comprises an encoder (after the grouping determination 9100) providing 9170, in a bitstream: data or a flag (e.g. a grouping/group index) indicating a grouping; and data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating SAO parameters for the SAO filtering are inferred from another image part of the image (non-temporal derivation/inference) or of another image (e.g. temporal derivation/inference).
- a flag e.g. a grouping/group index
- data or a flag(s) e.g. a sao merge flag up or sao merege flag left
- performing SAO filtering further comprises a decoder obtaining 9175, from a bitstream, data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating SAO parameters for the SAO filtering are inferred from another image part of the image (non temporal derivation/inference) or of another image (e.g. temporal derivation/inference), and performing the SAO filtering 9200 using the obtained data or flag(s) to infer SAO parameters and obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
- a decoder obtaining 9175, from a bitstream, data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating SAO parameters for the SAO filtering are inferred from another image part of the image (non temporal derivation/inference) or of another image (e.g. temporal derivation
- the data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating inferring of the SAO parameters is provided/included in the bitstream when the determined grouping is a particular grouping, e.g. a CTU level, and when the determine grouping is not the particular grouping, the data/flag(s) is not provided/included in the bitstream.
- a decoder first obtains the data/flag(s) (e.g. a grouping/group index) indicating a grouping and determines the grouping using the obtained data/flag(s).
- the decoder obtains and uses the data/flag(s) indicating inferring of the SAO parameters only when the determined grouping is that particular grouping. If the determined grouping is not that particular grouping, the encoder provides the SAO parameters in the bitstream, and the decoder obtains the provided SAO parameters without obtaining/using the data/flag(s) indicating inferring of the SAO parameters.
- performing SAO filtering further comprises an encoder providing, in a bitstream, data or a flag(s) (e.g. sao merge flags enabled flag) indicating whether either or both of the data indicating a grouping or the data indicating inferring of the SAO parameters is available for use.
- a flag(s) e.g. sao merge flags enabled flag
- a decoder obtains, from a bitstream, the data (e.g.
- sao merge flags enabled flag indicating whether either or both of the data indicating a grouping or the data indicating inferring of the SAO parameters is available for use, and when the data indicates either or both data is available, obtains, from the bitstream, the available data and uses the obtained available data to determine the grouping or infer the SAO parameters.
- the decoder does not obtain the unavailable data and obtains, from the bitstream, the SAO parameters associated with the grouping.
- a temporal derivation of SAO parameters may be used.
- a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1.
- a group could also be NxN CTUs, where N is an integer greater than 1, or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
- a group of image parts can be a CTU, and each constituent block of the CTU can be an image part. In such a case, each block of a CTU may have its own SAO parameters, but the signalling to use temporal derivation of the SAO parameters can be made for the CTU as a whole.
- a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
- the manner in which the SAO parameters are derived in the temporal derivation is not particularly limited except that at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part in a reference image.
- the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part by copying the SAO parameter of the collocated image part.
- One, more than one, or all SAO parameters may be copied.
- one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
- each CTU of a current image 2001 derives its SAO parameters temporally from a collocated CTU in a reference image 2002.
- the SAO parameters for the CTU 2003 in the current image 2001 are obtained by copying the SAO parameters from its collocated CTU 2005 in the reference image 2002.
- the SAO parameters for the CTU 2004 in the current image 2001 are obtained by copying the SAO parameters from its collocated CTU 2006 in the reference image 2002.
- CTU 2005 uses EO filtering with a direction of 0 degrees
- CTU 2006 uses BO filtering.
- the CTU 2003 also uses EO filtering with a direction of 0 degrees.
- the CTU 2004 also uses BO filtering.
- all the SAO parameters are copied in this embodiment, including the SAO type parameter sao ypejdx, parameters such as EO class (specifying a direction of EO filtering) and BO group sao_band j position (specifying a first class of a group of classes), and offsets.
- Figure 16 is a flow chart for use in explaining a method of decoding an image in the seventh embodiment.
- a first syntax element is read from the bitstream 2103 and decoded.
- This first syntax element in this example is a simple temporal merge flag which indicates for the whole image whether or not temporal derivation of SAO parameters is to be used.
- a reference image means another image of a sequence of images (previous or future image) which is used to perform temporal prediction for an image to be encoded.
- a reference image means another image of the sequence (previous or future image which is used to perform temporal derivation of SAO parameters.
- the reference images for the temporal derivation of SAO parameters may be the same as the reference images for the temporal prediction, or may be different.
- the HEVC specification uses the term“reference frame” instead of “reference image” and refidx is usually referred to as a reference frame index accordingly.
- the terms“reference image” and“reference frame” are used interchangeably in the present specification.
- the decoder has a storage unit 2106, which may be called a Decoded Picture Buffer (DPB), which stores the SAO parameters for each CTU of the reference image.
- DPB Decoded Picture Buffer
- the DPB 2106 stores the SAO parameters for each CTU explicitly, without relying on merge flags such as merge_up and mergejeft because reading merge flags as part of the SAO parameters temporal derivation increases the complexity and slows down the derivation.
- step 2107 the SAO parameters stored in the DPB 2106 for the collocated CTU in the reference image identified by refidx or by Ly and refidx are obtained. These are then set as the SAO parameters 2108 for the current CTU.
- the SAO parameters comprise dedicated SAO parameters for each color component X (Y, U, V) and in steps 2109-2111 SAO filtering is performed for each color component in turn using the dedicated SAO parameters for the color component concerned.
- the SAO parameters may be common to two or more components, for example U and V, or a mixture of common and dedicated (per-component) SAO parameters may be used.
- the SAO parameters for the CTUs of the group are read from the bitstream, for example using the process of Figure 5.
- Figure 16 relates to the steps carried out on the decoder side.
- the steps involve reading and decoding the syntax elements for the group of image parts (whole image in this case) from the bitstream and then performing SAO filtering on the image parts of the group.
- the same SAO filtering as on the decoder side is performed on the image parts of the group to ensure that the encoder has the same reference images as the decoder.
- the syntax elements do not need to be read and decoded from the bitstream, as the related information is available in the encoder already.
- the determination of whether or not to use the temporal derivation of SAO parameters for the group is made on the encoder side in this embodiment.
- the choice of reference image for the temporal derivation is made on the encoder side.
- the reference image is simply the first reference image of the first list L0. In that case, no syntax elements are necessary to identify refidx and Ly and step 2104 can be omitted. This removes some signalling and simplifies the decoder design.
- the eighth embodiment relates to an encoding process.
- a temporal derivation of SAO parameters is applied to a group of image parts.
- a temporal derivation is applied to a whole image.
- a non-temporal derivation of the SAO parameters is used in which SAO parameters are determined by the encoder for each image part (CTU) and signalled in the bitstream.
- This may be referred to as a CTU-level non-temporal derivation of SAO parameters.
- the decoder reads from the bitstream the first syntax element (e.g. temporal merge flag) and when it indicates temporal derivation is not applied to the group the decoder reads the per-CTU SAO parameters from the bitstream and filters each CTU according to the SAO parameters for the CTU concerned, for example using the decoding process of Figure 5.
- the temporal derivation and the CTU-level non-temporal derivation are available derivations and the encoder selects one of them to apply to the group (e.g. frame or slice).
- FIG 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in the CTU-level non-temporal derivation of SAO parameters.
- the process starts with a current CTU (1101).
- First the statistics for all possible SAO types and classes are accumulated in the variable CTUStats (1102).
- the process of Step 1102 is described below with reference to Figure 18.
- the RD cost for the SAO merge Left is evaluated if the Left CTU is in the current Slice (1103) as the RD cost of the SAO Merge UP (1104).
- RD costs are also compared to disable SAO independently for the Luma and the Chroma components (1113, 1114).
- the use of a new SAO parameters set (1115) is compared to the SAO parameters set“Merging” or sharing (1116) from the left and up CTU.
- Figure 18 is a flow chart illustrating steps of an example of a statistics computed at the encoder side that can be applied for the Edge Offset type filter, in the case of the conventional SAO filtering. The similar approach may also be used for the Band Offset type filter.
- Figure 18 illustrates the setting of the variable CTUStats containing all information needed to derive each best rate distortion offsets for each class. Moreover, it illustrates the selection of the best SAO parameters set for the current CTU.
- each SAO type is evaluated.
- the variables Sum . and SumNbPix j are set to zero in an initial step 801.
- the current frame area 803 contains N pixels.
- j is the current range number to determine the four offsets (related to the four edge indexes shown in Figure 7B for Edge Offset type or to the 32 ranges of pixel values shown in Figure 8 for Band Offset type).
- Sum j is the sum of the differences between the pixels in the range j and their original pixels.
- SumNbPix . is the number of pixels in the frame area, the pixel value of which belongs to the range j .
- step 802 a variable i, used to successively consider each pixel Pi of the current frame area, is set to zero. Then, the first pixel P t of the frame area 803 is extracted in step 804.
- step 805 the class of the current pixel is determined by checking the conditions defined in Figure 7B. Then a test is performed in step 805. During step 805, a check is performed to determine if the class of the pixel value P t corresponds to the value“none of the above” of
- step 806 the next step is 807 where the related SumNbPix . (i.e. the sum of the number of pixels for the class determined in step 805) is incremented and the difference between P t and its original value P ⁇ ’ " ' is added to Sum j .
- the variable i is incremented in order to consider the next pixels of the frame area 803.
- offset Offset is an integer value.
- the ratio defined in this formula may be rounded, either to the closest value or using the ceiling or floor function.
- Each offset Offset . is an optimal offset Ooptj in terms of distortion
- the encoder uses the statistics set in table CTUCStats.
- the distortion can be obtained by the following formula:
- the same computing is applied for Chroma components.
- the Lambda of the Rate distortion cost is fixed for the three components. Lor an SAO parameters merged with the left CTU, the rate is only 1 flag which is CABAC coded.
- the encoding process illustrated in Figure 19 is applied in order to find the best offset in terms of rate distortion criterion, offset referred to as ORDj. This process is applied in steps 1109 to 1112.
- the rate distortion value Jj is initialized to the maximum possible value.
- a loop on Oj from Ooptj to 0 is applied in step 902. Note that Oj is modified by 1 at each new iteration of the loop. If Ooptj is negative, the value Oj is incremented and if Ooptj is positive, the value Oj is decremented.
- the rate distortion cost related to Oj is computed in step 903 according to the following formula:
- J(Oj) SumNbPix x Oj x Oj - Sum j x Oj x 2 + l R(Oj) where l is the Lagrange parameter and R(Oj) is a function which provides the number of bits needed for the code word associated with Oj.
- This algorithm of Figures 18 and 19 provides a best ORDj for each class j. This algorithm is repeated for each of the four directions of Figure 7A. Then the direction that provides the best rate distortion cost (sum of Jj for each direction) is selected as the direction to be used for the current CTU.
- the next step involves finding the best position of the S AO band position of Figure 8. This is determined with the encoding process set out in Figure 20.
- the RD cost Jj for each range has been computed with the encoding process of Figure 19 with the optimal offset ORDj in terms of rate distortion.
- the rate distortion value J is initialized to the maximum possible value.
- a loop on the 28 positions j of 4 consecutive classes is run in step 1002.
- the variable Jj corresponding to the RD cost of the band (of 4 consecutive classes) is initialized to 0 in step 1003.
- the loop on the four consecutive offset j is run in step 1004.
- Test 1008 checks whether or not the loop on the 28 positions has ended. If not, the process continues in step 1002, otherwise the encoding process returns the best band position as being the current value of sao_band _position 1009.
- CTUStats table in the case of determining the SAO parameters at the CTU level is created by the process of Figure 17. This corresponds to evaluating the CTU level in terms of the rate-distortion compromise. The evaluation may be performed for the whole image or for just the current slice.
- FIG. 21 shows the RD cost evaluation of temporal derivation at Slice level.
- First the distortion for the current colour component X is set equal to 0 (1601).
- the temporal SAO parameters set of the collocated CTU in a reference frame (Ly, refidx) (1605) is extracted (1604) from the DPB (1603). If the SAO parameters set (1605) is equal to OFF (No SAO), the next CTU is processed (1610).
- the distortion Distortion TEMPORAL X is incremented by an amount equal to the associated distortion of the offset Oi (1609). This is the same process as the RD cost evaluation for a merge of SAO parameters as described previously. Please note that sao_band _position is set equal to 0 when the SAO type is equal to an Edge type. When the distortion of all offsets have been added to Distortion TEMPORAL X (1608), the next CTU is processed (1610).
- the RDCost for the temporal mode at Slice level, for component X is set equal to the sum of this computed distortion Distortion TEMPORAL X and l multiplied by the rate for this temporal mode at Slice level (1611).
- This rate is equal to the rate of the signalling of temporal mode plus, if needed, the rate of the reference frame index refidx and if needed plus the rate of the list Ly.
- the two evaluations are then compared and the one with the best performance is selected.
- the selected derivation (temporal or CTU level) is then signalled to the decoder in the bitstream, for example using the first syntax element as described in connection with the seventh embodiment.
- FIG 22 illustrates the competition between the CTU level for SAO and for temporal derivation at encoder side.
- the current slice/frame 1901 is used to set the CTUStats table (1903) for each CTU (1902).
- This table (1903) is used to evaluate the CTU level derivation (1904) and the temporal derivation for the whole slice (1915) as described previously in Figure 21.
- This table (1903) is also used to evaluate several reference frames for temporal derivation.
- the best derivation for the slice is selected according to the rate distortion criterion computed for each available derivation (1910).
- the SAO parameters sets for each CTU are set (1911) according to the derivation selected in step 1910.
- SAO filtering (1913) is then used to apply the SAO filtering (1913) in order to obtain the filtered frame/slice.
- the selected derivation may be signalled in the slice header, for example using a syntax element indicating temporal derivation (which the decoder reads, see 2101 and 2201 in Figures 13 and 14).
- the temporal derivation was put into competition with one alternative non-temporal method of deriving the SAO parameters.
- two alternative methods are in competition with the temporal derivation.
- Figure 23 shows various different groupings 1201-1206 of CTUs in a slice.
- a first grouping 1201 has individual CTUs. This first grouping requires one set of SAO parameters per CTU. It corresponds to the CTU-level derivation in the eighth embodiment.
- a second grouping 1202 makes all CTUs of the entire image one group.
- all CTUs of the frame and hence the slice which is either the entire frame or a part thereof) share the same SAO parameters.
- the encoder first computes a set of SAO parameters to be shared by all CTUs of the image. Then, in the first method, these SAO parameters are set for the first CTU of the slice. For each remaining CTU from the second CTU to the last CTU of the slice, the sao_merge_left flag is set equal to 1 if the flag exists (that is, if the current CTU has a left CTU). Otherwise, the sao_merge_up flag is set equal to 1.
- Figure 24 shows an example of CTUs with SAO parameters set according to the first method. This method has the advantage that no signalling of the grouping to the decoder is required.
- groupings do not increase the rate too much. This is because the merge flags are generally CAB AC coded in the same context. Since for the second group (entire image) these flags all have the same value (1), the rate consumed by these flags is very low. This follows because they always have the same value and the probability is 1.
- the grouping is signalled to the decoder in the bitstream.
- the SAO parameters are also signalled as SAO parameters for the group (whole image), for example in the slice header.
- the signalling of the grouping consumes bandwidth.
- the merge flags can be dispensed with, saving the rate related to the merge flags, so that overall the rate is reduced.
- the first and second groupings 1201 and 1202 provide very different rate-distortion compromises.
- the first grouping 1201 is at one extreme, giving very fine control of the SAO parameters (CTU by CTU), which should lower distortion, but at the expense of a lot of signalling.
- the second grouping is at the other extreme, giving very coarse control of the SAO parameters (one set for the whole image), which raises distortion but has very light signalling.
- the determination is done for a whole image and all CTUs of the slice/frame share the same SAO parameters.
- FIG 25 is an example of the setting of SAO parameters for a frame/slice level using the first method of sharing SAO parameters (i.e. without new SAO classifications at encoder side). This figure is based on Figure 17.
- the CTUStats table is set for each CTU (in the same way as the CTU level encoding choice). This CTUStats can be used for the traditional CTU level (1302).
- the table FrameStats is set by adding each value for all CTUs of the table CTUStats (1303). Then the same process as for CTU level is applied to find the best SAO parameters (1305 to 1315).
- the selected SAO parameters set at step 1315 is set for the first CTU of the slice/frame. Then for each CTU from the second CTU to the last CTU of the slice/frame, the sao_merge_left_flag is set equal to 1 if it exists otherwise the sao_merge_up_flag is set equal to 1 (indeed for the second CTU to the last CTU a merge Left or Up or both exist) (1317).
- the syntax of the SAO parameters set is unchanged from that presented in Figure 9. At the end of the process the SAO parameters are set for the whole slice/frame.
- CTUStats table in the case of determining the SAO parameters for the whole image (frame level) is created by the process of Figure 25. This corresponds to evaluating the frame level in terms of the rate-distortion compromise.
- the encoder also evaluates the CTU level non-temporal derivation and the temporal derivation in terms of their respective rate-distortion compromises. Each evaluation is performed for the whole image in this case. The three evaluations are then compared and the one with the best performance is selected. The selected derivation (temporal or CTU level or frame level) is then signalled to the decoder in the bitstream.
- the signalling of the selected derivation can be made in many different ways.
- a grouping index can be signalled.
- the first syntax element can then still be used to signal whether the SAO parameters for all CTUs of the slice are derived temporally or not (e.g. temporal merge flag), supplemented by the grouping index in the case when temporal derivation is not used.
- the CTU level may have grouping index 0 and the frame level may have grouping index 1.
- the first syntax element may be adapted to signal everything, for example CTU level and frame level may have index 0 and index 1 respectively and temporal derivation may have another index such as 2. In this case, in Figures 21, 22 and 24 the first syntax element is changed accordingly.
- the example of determining the SAO parameters in Figure 25 corresponds to the first method of sharing SAO parameters as it uses the merge flags to share the SAO parameters among all CTUs of the image (see steps 1316 and 1317). These steps can be omitted if the second method of sharing SAO parameters is used.
- the CTU-level non-temporal derivation is in competition with the temporal derivation.
- the CTU-level non-temporal derivation is not available and instead the frame-level non-temporal derivation is in competition with the temporal derivation.
- the CTU and Frame levels used in the ninth embodiment offer extreme rate-distortion compromises. It is also possible to include other groupings intermediate between the CTU and frame levels which can offer other rate-distortion compromises.
- a third grouping 1203 makes a column of CTUs a group as in the third embodiment.
- FIG 26 is an example of the setting of SAO parameters sets for the third grouping 1203 at the encoder side. This Figure is based on Figure 17. To reduce the amount of steps in the figure, the modules 1105 to 1115 have been merged in one step 1405 in this Figure 26.
- the CTUStats table is set for each CTU. This CTUStats can be used for the traditional CTU level (1302) encoding choice.
- the table ColumnStats is set by adding each value (1405) from CTUStats (1402), for each CTUs of the current column (1404). Then the new SAO parameters are determined as for CTU level (1406) encoding choice (cf. Figure 17).
- the RD cost to share the SAO parameters with the previous left column is also evaluated (1407), in the same way as the sharing of SAO parameters set between left and up CTU (1103, 1104) is evaluated. If the sharing of SAO parameters gives a better RD cost (1408) than the RD cost for the new SAO parameters set, the sao merge left flag is set equal to 1 for the first CTU of the column. This CTU has the address number equal to the value“Column”. Otherwise, the SAO parameters set for this first CTU of the column is set equal (1409) to the new SAO parameters obtained in step 1406.
- step 1412 can be processed once per frame.
- the advantage of this CTU grouping is another RD compromise between the CTU level encoding choice and the frame level which can be useful for some conditions.
- merge flags are used within the group, which means that the third grouping can be introduced without modifying the decoder (i.e. the grouping can be HE VC-compliant).
- the Merge between columns doesn’t need to be checked. It means that steps 1407 1408 1410 are removed from the process of Figure 26.
- the advantage of removing this possibility is a simplification of the implementation and the ability to parallelize the process. This has a small impact on coding efficiency.
- FIG. 23 Another possible compromise intermediate between the CTU level and the frame level can be offered by a fourth grouping 1204 in Figure 23 which makes a line of CTUs a group.
- a similar process to that of Figure 25 can be applied.
- the variable ColumnStats is changed by LineStats.
- the New SAO parameters and the merge with the up CTU is evaluated based on this LineStats table (steps 1406 1407).
- the step 1410 is replaced by setting of sao merge up flag to 1 for the first CTU of the Line. And for all CTUs of the slice/frame except each first CTU of each Line, sao merge left flag is set equal to 1.
- the advantage of the line is another RD compromise between the CTU level and Frame level. Please note that the frame or slice are most of the time rectangles and their width is larger than their height. So the line CTUs grouping 1204 is expected to be an RD compromise closer to the frame CTU grouping 1202 than the column CTU grouping 1203.
- the line CTU grouping can be HE VC compliant if the merge flags are used within the groups.
- RD compromises can be offered by putting two or more columns of CTUs or two or more lines of CTUs together as a group.
- the process of Figure 25 can be adapted to determine SAO parameters to such groups.
- the number N of columns or lines in a group may depend on the number of groups that are targeted.
- Another possible grouping includes split columns or split lines, where the split is tailored to the current slice/frame.
- the grouping 1205 makes 2x2 CTUs a group.
- the grouping 1206 makes 3x3 CTUs a group (one example of the second embodiment).
- Figure 27 shows an example of how to determine the SAO parameters for such groupings.
- the table NxNStats (1507) is set (1504, 1505, 1506) based on CTUstats. This table is used to determine the New SAO parameters (1508) and its RD cost, in addition to the RD cost for a Left (1510) sharing or Up (1509) sharing of SAO parameters. If the Best RD cost is the new SAO parameters (1511), the SAO parameters of the first CTU (top left CTU) of the NxN group is set equal to this new SAO parameters (1514).
- the sao merge up flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 and the sao merge left flag to 0 (1515).
- the sao_merge_left_flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 (1516). Then the sao_merge_left_flag and sao_merge_up_flag are set correctly for the other CTUs of the NxN group in order to form the SAO parameters for the current NxN group (1517).
- Figure 28 illustrates this setting for a 3x3 SAO group.
- the top left CTU is set equal to the SAO parameters determined in step 1508 to 1516.
- the sao_merge_left_flag is set equal to 1.
- the sao_merge_left_flag is the first flag encoded or decoded and as it is set to 1, there is no need to set the sao merge up flag to 0.
- the sao merge left flag is set equal to 0 and sao merge up flag is set equal to 1.
- the sao merge left flag is set equal to 1.
- NxN CTU groupings The advantage of the NxN CTU groupings is to create several RD compromises for SAO. As for the other groupings, these groupings can be HEVC compliant if merge flags within the groups are used. As for the other groupings, the test of Merge left and Merge up between groups can be dispensed with in Figure 27. So steps 1509, 1510, 1512, 1513, 1515 and 1516 can be removed, especially when N is high.
- the value N depends on the size of the frame/slice.
- the advantage of this embodiment is to obtain an efficient RD compromise.
- only N equal to 2 and 3 are evaluated. This offers an efficient compromise.
- Figure 29 illustrates an example of how to select the SAO parameter derivation using a rate-distortion compromise comparison.
- the first method of sharing SAO parameters among the CTUs of a group is used. Accordingly, merge flags are used within groups. If applied to HEVC, the resulting bitstream can be decoded by an HE VC-compliant decoder.
- the current slice/frame 1701 is used to set the CTUStats table (1703) for each CTU
- This table (1703) is used to evaluate the CTU level (1704), the temporal derivation (1715), the frame/ Slice Grouping (1705), the Column grouping (1706), the line grouping (1707), the 2x2 CTUs grouping (1708) or 3x3 CTU grouping (1709) or all other described CTUs groupings as described previously.
- the best derivation (a non-temporal derivation with a CTU grouping or the temporal derivation) is selected according to the rate distortion criterion computed for each available derivation (1710).
- the SAO parameters sets for each CTU are set (1711) according to the derivation selected in step 1710. These SAO parameters are then used to apply the SAO filtering (1713) in order to obtain the filtered frame/slice.
- the second method of sharing SAO parameters among the CTUs of the CTU grouping may be used instead of the first method. Both methods have the advantage of offering a coding efficiency increase.
- a second advantage, obtained when the first method is used but not when the second method is used, is that this competition method doesn’t require any additional SAO filtering or classification. Indeed, the main impacts on encoder complexity are the step 1702 which needs SAO classification for all possible SAO type and the step 1713 which filters the samples. All other CTU groupings evaluations are only some additions of values already obtained during the CTU level encoding choice (set in the table CTUStats).
- the encoder signals in the bitstream which derivation of the SAO parameters is selected (CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation).
- a possible indexing scheme is shown in Table 2 below:
- the derivation index is also referred to as a grouping index hereinafter.
- Figure 30 is a flow chart illustrating a decoding process when the CTU grouping is signaled in the slice header according to the second method of sharing SAO parameters among the CTUs of the group.
- the corresponding CTUs grouping index (1804) is used to select the CTUs grouping method (1805).
- This grouping method will be applied to extract the SAO syntax and to determine the SAO parameters set for each CTU (1806). Then the next slice header syntax element is decoded. If the CTU grouping index (1804) corresponds to the temporal derivation, other parameters can be extracted from the bitstream such as the reference frame index and/or other parameters necessary for the temporal derivation.
- the CTUs grouping index uses a unary max code in the slice header. In that case, the CTUs groupings are ordered according to their probabilities of occurrences (highest to lowest).
- At least one non-temporal derivation is an intermediate level derivation (SAO parameters not at CTU level or at group level).
- SAO parameters When applied to a group it causes the group (e.g. frame or slice) to be subdivided into subdivided parts (CTU groupings 1203-1206, e.g. columns of CTUs, lines of CTUs, NxN CTUs, etc.) and derives SAO parameters for each of the subdivided parts.
- Each subdivided part is made up of two or more said image parts (CTUs).
- the advantage of the intermediate level derivation(s) is introduction of one or more effective rate-distortion compromises.
- the intermediate level derivation(s) can be used without the CTU-level derivation or without the frame-level derivation or without either of those two derivations.
- the temporal derivation is in competition with CTU level derivation and the frame level derivation.
- the twelfth embodiment builds on this and adds one or more of the intermediate groupings so that the competition includes CTU level, frame level, one or more groupings intermediate between the CTU and frame levels, and the temporal derivation.
- the temporal derivation is in competition with CTU level derivation but not the frame level derivation.
- the thirteenth embodiment builds on this and adds one or more NxN CTU groups so that the competition includes CTU level, one or more NxN CTU groups, and the temporal derivation.
- the temporal derivation is in competition with CTU level derivation but not the frame level derivation.
- the eighth embodiment builds on this and adds the third grouping 1203 (column of CTUs) or the fourth grouping 1204 (line of CTUs) or both the third and fourth groupings 1203 and 1204.
- the competition therefore includes CTU level, the third and/or fourth grouping, and the temporal derivation.
- the ninth and eleventh to fourteenth embodiments each promote diversity for the SAO parameter derivation to be applied to a group by making at least first and second said non temporal derivations available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level.
- the levels may any two levels from the frame level to a CTU level.
- the levels may correspond to the groupings 1201-1206 in Figure 23.
- the smallest grouping is the first grouping 1201 in which each CTU is a group and there is one set of S AO parameters per CTU.
- a set of SAO parameters can be applied to a smaller block than the CTU.
- the non-temporal derivation is not at the CTU level, frame level or an intermediate level between the CTU and frame levels but at a sub-CTU level (a level smaller than an image part).
- index 0 means that each CTU is divided into 16 blocks and each may have its own SAO parameters.
- Index 1 means that each CTU is divided into 4 blocks, again each having its own SAO parameters.
- the selected derivation is then signalled to the decoder in the bitstream.
- the signalling may comprise a temporal/non-temporal syntax element plus a depth syntax element (e.g. using the indexing scheme above).
- a combined syntax element may be used to signal temporal/non-temporal and the depth.
- Temporal derivation could be assigned index 6 ,for example, with the non-temporal derivations having index 0-5.
- At least one non-temporal derivation when applied to a group causes the group to be subdivided into subdivided parts and derives SAO parameters for each of the subdivided parts, and each image part is made up of two or more said sub-divided parts.
- At least first and second said non-temporal derivations are available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level.
- the levels may any two levels from the frame level to a sub-CTU level.
- the levels may correspond to the groupings 1201- 1206 in Figure 23.
- the selected derivation of the SAO parameters may be signalled for a slice, which means that the temporal derivation (when selected) is used for all CTUs of the slice.
- the available non-temporal derivations include derivations having SAO parameters at different levels (depths) lower than the slice or frame level. However, in such cases it is not possible to determine at the CTU level (or at the chosen level of the SAO parameters) whether to use temporal derivation or not.
- the SAO parameters derivation is modified so that a temporal derivation at the CTU level is available, rather than only a temporal derivation at the group level.
- the temporal derivation at the CTU level is not applied to a group of image parts as in the previous embodiments.
- this temporal derivation is in competition with a temporal derivation applied to a group of image parts. For example, the competition is between the 3x3 grouping and a group using the temporal derivation at CTU level.
- a level of the SAO parameters is selected for a slice or frame, which may include the CTU level. Then, when the CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each CTU of the slice or frame.
- a temporal derivation or non-temporal derivation may be selected per CTU group (e.g. each column of CTUs) of the slice or frame.
- the temporal derivation does still apply to a group of two or more CTUs (image parts).
- One or more CTU groups within the slice may then use temporal derivation (with each CTU deriving an SAO parameter from a collocated CTU of a reference image), whilst other CTU groups use a non-temporal derivation.
- the SAO merge flags are usable between groups of the CTUs grouping. As depicted in Figure 31, for the 2x2 CTU grouping, the SAO merge Left and SAO merge up are kept for each group of 2x2 CTUs. But they are removed for CTUs inside the group. Please note that only the saojnergejeft Jlag is used for the grouping 1203 of a column of CTUs, and only the sao_merge_up Jlag is used for the grouping 1204 of a line of CTUs.
- a flag signals if the current CTU group shares its SAO parameters or not. If it is true, a syntax element representing one of the previous groups is signalled. So each group of a slice can be predicted by a previous group except the first one. This improves the coding efficiency by adding several new possible predictors.
- a depth of the SAO parameters was selected for a slice, including depths smaller than a CTU, making it possible to have a set of SAO parameters per block in a CTU.
- no depth could be selected and all CTUs of the slice had to use temporal derivation.
- the SAO parameters derivation is modified so that a depth is selected for the slice and then it is selected for an image part at the selected depth whether or not to use temporal derivation.
- the depths may be the ones in Table 3.
- the SAO parameters derivation is modified so that a temporal derivation at the sub-CTU level is available, rather than only a temporal derivation at the group level.
- the temporal derivation at the sub-CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts.
- a level of the SAO parameters is selected for a slice or frame, which may include the sub-CTU level. Then, when the sub-CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each block of the slice or frame.
- a temporal derivation or non-temporal derivation may be selected per CTU or per CTU group (e.g. each column of CTUs) of the slice or frame.
- the temporal derivation does still apply to a group of two or more blocks (image parts).
- One or more CTUs or CTU groups within the slice may then use temporal derivation (with each block deriving an SAO parameter from a collocated block of a reference image), whilst other CTUs or CTU groups use a non-temporal derivation.
- the benefit of selecting between temporal and non-temporal SAO parameter derivation per CTU or CTU group is achieved in addition to the benefit of applying the temporal derivation on a CTU or CTU group basis.
- one possibility is to remove the SAO merge flags for all levels. It means that steps 503 504 505 506 of Figure 9 are removed.
- the advantage is that it reduces significantly the signalling of SAO and consequently it reduces the bitrate. Moreover, it simplifies the design by removing 2 syntax elements at CTU level.
- the merge flags are kept for CTU level but removed for all other CTU groupings.
- the advantage is a flexibility of the CTU level.
- the merge flags are used for CTU when the SAO signalling is lower or equal to the CTU level (1/16 CTU or 1 ⁇ 4 CTU) and removed for other CTUs groupings having larger groups.
- the merge flags are important for small block sizes because a SAO parameters set is costly compared to the amount of samples that it can improve. In that case, these syntax elements reduce the cost of SAO parameters signalling. For large groups, the SAO parameters set is less costly so the usage of merge flags is not efficient. So the advantage of these embodiments is a coding efficiency increase.
- the level where the SAO merge flags are enabled is explicitly signalled in the bitstream.
- a flag indicates if the SAO merge flags are used or not.
- the flag may be included after the index of the CTUs grouping (or the depth) in the slice header.
- the competition between the different permitted derivations is modified so that only one derivation is permitted in the encoder for any given slice or frame.
- the permitted derivation may be determined in dependence upon one or more characteristics of the slice or frame. For example, the permitted derivation may be selected based on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP).
- the Intra Frames and the Inter frames at the highest position in the hierarchy of the GOP structure or with the low QP may be permitted only to use the CTU level.
- the other frames which have lower positions in the GOP hierarchy or a high QP may be permitted only to use temporal derivation.
- the different parameters can be set depending on the rate distortion compromise.
- the advantage of this embodiment is a complexity reduction. Instead of evaluating two or more competing derivations just one derivation is selected, which can be useful for a hardware encoder.
- a first derivation is associated with first groups of the image (e.g. Intra slices) and a second derivation is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, the first derivation is used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, the second derivation is used to filter the image parts of the group. Evaluation of the two derivations is not required.
- Whether a group to be filtered is determined to be a first group or a second group may depend on one or more of:
- a slice type a frame type of the image to which the group to be filtered belongs;
- the first derivation may have fewer image parts per group than the second derivation.
- the competition for a given slice or frame is still permitted but the set of competing derivations is adapted to the slice or frame.
- the set of competing derivations may depend on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP).
- the set of competing derivations may depend on the slice type.
- the set preferably contains groupings with groups containing small numbers CTUs (e.g. CTU level, 2x2 CTU, 3x3 CTU, and Column). Also, if depths lower than a CTU are available (as in the tenth embodiment), these depths are preferably also included. Of course, the temporal derivation is not used.
- the set of derivations preferably contains groupings with groups containing large numbers of CTUs such as Fine, Frame level, and the temporal derivation. However, smaller groupings can also be considered down to the CTU level.
- the advantage of this embodiment is a coding efficiency increase thanks to the use of derivations adapted for a slice or frame.
- the set of derivations can be different for an Inter B slice from that for an Inter P slice.
- the set of competing derivations depends on the characteristics of the frame in the GOP. This is especially beneficial for frames which vary in quality (QP) based on a quality hierarchy. For the frames with the highest quality or highest position in the hierarchy, the set of competing derivations should include groups containing few CTUs or even sub-CTU depths (same as for Intra slices above). For frames with a lower quality or lower position in the hierarchy, the set of competing derivations should include groups with more CTUs.
- the set of competing derivations can be defined in the sequence parameters set.
- a first set of derivations is associated with first groups of the image (e.g. Intra slices) and a second set of derivations is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, a derivation is selected from the first set of derivations and used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, a derivation is selected from the second set of derivations and used to filter the image parts of the group. Evaluation of derivations not in the associated set of derivations is not required.
- Whether a group to be filtered is a first group or a second group may be determined in the preceding embodiment. For example, when the first groups have a higher quality or higher position in the quality hierarchy than the second groups, the first set of derivations may have at least one derivation with fewer image parts per group than the derivations of the second set of derivations.
- the temporal derivation involves simply copying SAO parameters from a collocated CTU (or from a collocated block within a CTU if SAO parameters at the block level are used).
- SAO parameters In a video, there are generally background and moving objects.
- a large part can be static.
- the SAO temporal derivation is applied on this static part for several consecutive frames, the SAO filtering should filter nothing, especially for edge offset. As a result, the temporal derivation will not be selected.
- the set of SAO parameters from the previous frame is changed according to some defined rules.
- Figure 32 is an example of an algorithm to produce such a modification of the set of SAO parameters.
- a 90° rotation is applied to the edge classification. If sao_eo_class_Luma or sao_eo_class_Chroma (2301) from the collocated CTU is equal to 0, which corresponds to edge type 0° (2302), the edge type for the current frame (2310) is set equal to 1 (2303) corresponding to SAO edge type 90°.
- sao_eo_class_X is set equal to 0.
- the edge offset type 135° ⁇ sao_eo_class_X is rotated to edge offset type 45° (2307).
- the edge offset type 45° ⁇ sao_eo_class_X is rotated to edge offset type 45° (2309).
- the offsets values have not been changed.
- the effect of the algorithm of Figure 32 is to apply a rotation
- the changes to the edge classification parameters may be effected by using a mapping table.
- the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index.
- the mapping table implements the required rotation.
- Figure 33 illustrates this temporal rotation by 90°.
- the temporal derivation with 90° rotation is applied to a whole frame or slice (e.g. as in the seventh embodiment).
- the 45° and the 135° rotations can be considered instead of 90°.
- the rotation of temporal SAO parameters sets is the 90° rotation. This gives the best coding efficiency.
- band offsets are not copied and SAO is not applied on this CTU.
- a default SAO parameter set can be used for the CTUs concerned.
- the“rotation” temporal derivation is introduced.
- the“rotation” temporal derivation is put in competition with the “copying” temporal derivation as shown in Figure 34.
- the competition is applied to each slice or each frame.
- the best temporal derivation may be selected based on a rate-distortion criterion.
- the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths).
- the“rotation” temporal derivation is put into competition with the same non-temporal derivation(s) instead of the“copying” temporal derivation.
- the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths).
- both the “copying” and “rotation” temporal derivations are put into competition with the same non-temporal derivation(s) instead of just the“copying” temporal derivation.
- Table 4 below shows the competing derivations when the eleventh embodiment is modified in this way:
- the“copying” and“rotation” temporal derivations are in competition with one another.
- these two temporal derivations and further“rotation” temporal derivations are used cyclically.
- a first frame FO is followed by second, third, fourth and fifth frames F1-F4.
- the first frame FO does not use temporal derivation of SAO parameters.
- the“copying” temporal derivation is applied (i.e. copying the SAO parameters from F0).
- the temporal derivation is a 90° rotation of SAO parameters of F0.
- the temporal derivation is a 135° rotation of SAO parameters of F0.
- the temporal derivation is a 45° rotation of SAO parameters of F0.
- F0 is a reference image for each of Fl to F4.
- Frame F2 (temporal‘90°’ Frame 1)
- Frame F3 (temporal‘45°’ Frame 2)
- Frame F4 (temporal‘90°’ Frame 3)
- the direction of edge filtering of an image part may be switched successively through all possible edge- filtering directions.
- SAO filtering is performed CTU by CTU.
- temporal derivation is introduced, and to improve the signalling efficiency, a group of image parts is formed and the use of temporal prediction is signalled for this group of image parts, rather than for each image part individually.
- the“rotation” temporal derivation is applied to all CTUs of a slice or frame.
- a rotation temporal derivation is signalled for a group (slice, frame, column, line, NxN CTUs, etc.) composed of two or more image parts (CTUs).
- the image parts (CTUs) may still have different SAO parameters depending on the SAO parameters of the respective collocated image parts.
- Signalling the temporal derivation at the slice of frame level is useful for compatibility with the embodiments described previously a grouping of CTUs is selectable for the slice or frame from among plural groupings (e.g. the groupings 1201-1206 in Figure 23), the selected grouping also being signalled at the slice or frame level.
- temporal derivation it is not essential to signal the use of temporal derivation at the slice or frame level. This applies whether there is just one type of temporal derivation, e.g.“copy” or“rotation”, or the type can be selected from plural different types.
- the signalling of the use of temporal derivation can be at the CTU level or at the block level (i.e. sub-CTU).
- a syntax element may be provided per CTU to indicate whether or not rotation temporal derivation is used for the CTU concerned.
- a syntax element may be provided per block (i.e. sub-CTU) to indicate whether or not rotation temporal derivation is used for the block concerned.
- the process of Figure 35 is performed CTU by CTU.
- the sao jnerge Jemporal Jlag_X is extracted from the bitstream if other merge flags are off (2613). If sao jnerge Jemporal Jlag_X is equal to 1, a syntax element representing a reference frame is extracted from the bitstream (2614). Please note that this step is not needed if only one reference frame is used for the derivation. Then a syntax element representing a rotation of the parameters is decoded (2615). Please note that this step is not needed, if no“rotation” option is available. This would be the case if the only type of temporal derivation is the basic“copy” type.
- step 2615 is not performed if the collocated CTU in the reference frame is not EO type. Then the respective sets of SAO parameters for the 3 color components are copied from the collocated CTU to the current CTU. Processing then moves to the next CTU in step 2610.
- temporal merge flag signalling compared to temporal/ CTU grouping signalling at slice level is a simplification of the encoder design for some implementations. Indeed, there is no need to wait for the encoding of the whole frame before starting SAO selection, unlike in the slice level approach. But the extra signalling at the CTU level can have a significant impact on the coding efficiency is not negligible.
- the syntax element per CTU extracted in step 2615 may indicate the selected temporal derivation, e.g. using an index.
- the syntax element could also specify the angle of rotation. In this way, in the same slice or frame, some CTUs may have no temporal derivation, other CTUs may use “copy”, still others may use“rotate by 90°”, and so on.
- Signalling a grouping for a slice or frame and then signalling for each group of two or more CTUs whether to use temporal derivation or not or, if two or more temporal derivations are in competition with one another, which one of them is selected, is an effective way to achieve adaptability without having per-CTU syntax elements. For example, if the selected grouping for a slice is 3x3 CTUs, some groups may have no temporal derivation, other groups may use“copy”, still others may use“rotate by 90°”, and so on.
- the number of groups is only l/9 th of the number of CTUs the number of syntax elements is correspondingly smaller compared to per-CTU signalling too, yet the different CTUs in each group may still have different SAO parameters depending on the collocated CTUs.
- rotation temporal derivations are introduced. These rotation temporal derivations are preferred examples from a wider class of transformations that can be applied to change the direction of EO filtering in a CTU of the current frame compared to the direction of EO filtering in a collocated CTU of a reference frame.
- the direction-changing transformation could be a reflection about the x- axis or y-axis. Such a reflection has the effect of swapping two directions and leaving the other two directions unchanged. It could also be a reflection about a diagonal line at 45° or 135°.
- the effect of the algorithm of Figure 32 is to apply a transformation
- the changes to the edge classification parameters may be effected by using a mapping table.
- the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index.
- the mapping table implements the required transformation.
- This embodiment is applicable to the first group of embodiments (which use a group- wise derivation) or to the second group of embodiments (which do not use a group-wise derivation).
- Third group of Embodiments are applicable to the first group of embodiments (which use a group- wise derivation) or to the second group of embodiments (which do not use a group-wise derivation).
- temporal derivation of SAO parameters was introduced, either as a group-wise derivation (applied to a group of two or more image parts) or for individual image parts.
- new spatial derivations of SAO parameters are introduced. These may be group-wise derivations or for individual image parts.
- a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1.
- a group could also be NxN CTUs, where N is an integer greater than 1, or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
- a group of image parts can be a CTU, and each constituent block of the CTU can be an image part.
- each block of a CTU may have its own SAO parameters, but the signalling to use spatial derivation of the SAO parameters can be made for the CTU as a whole.
- a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
- the manner in which the SAO parameters are derived in the spatial derivation is not particularly limited except that the source image part belongs to another group of image parts in the same image as the subject group.
- the source image part and the image part to be derived are at the same positions in their respective groups. For example, in a 3x3 CTU grouping, there are 9 positions from the top left to the bottom right.
- the other group is, for example, the left group of the subject group
- at least one SAO parameter of an image part at position 1 (the top left position, say) in the subject group is derived from an SAO parameter of the image part at the same position (position 1 or top left position) in the left group.
- This image part in the left group serves as a source image part for the image part to be derived in the subject group. The same is true for each other position in the subject group.
- the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the source image part by copying the SAO parameter of the source image part.
- One, more than one, or all SAO parameters may be copied.
- one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
- spatial and temporal group-wise derivations are both“group- wise sourcing derivations”. Each involves applying a group-wise sourcing derivation of SAO parameters to a group of two or more image parts, the group-wise sourcing derivation permitting different image parts belonging to the group to have different SAO parameters and comprising deriving at least one said SAO parameter of an image part belonging to the group from an SAO parameter of another image part serving as a source image part for the image part to be derived.
- the source image part is a collocated image part in a reference image having a position in the reference image collocated with a position of the image part to be derived in its image.
- the source image part belongs to another group of image parts in the same image as the image part to be derived, said source image part and said image part to be derived being at the same positions in their respective groups.
- the rotation derivation was a non-group-wise temporal derivation.
- a spatial rotation derivation is used as a derivation, i.e. where the SAO parameters of a CTU in a current image are derived by rotation from the SAO parameters of another CTU of the same image (as opposed to being derived by rotation from the SAO parameters of a collocated CTU of a reference image).
- the other CTU in the“rotation” spatial derivation may be a left CTU or an upper CTU, in which case a sao_merge_rotation_left flag or sao_merge_rotation_up flag may be used to signal when the rotation spatial derivation is selected.
- Figure 36 shows two examples where the other CTU is the left CTU and the rotation from the left CTU to the current CTU is 90 degrees.
- the rotation spatial derivation may be in competition with the temporal copy derivation and/or the rotation temporal derivation.
- the“rotation” derivation is applied on a spatial basis to generate additional SAO merge parameters set candidates to predict the SAO parameters set of the current CTU. Accordingly, the“rotation” can be applied to increase the list of SAO Merge candidates or to find new SAO Merge candidates for empty positions.
- the advantage of using the twenty-seventh embodiment instead of using several SAO parameters set from previously decoded SAO parameters set is an increase of coding efficiency performance. Moreover it offers additional flexibility for encoder implementation by accessing to a limited number of already encoded SAO parameters sets.
- Figure 37 is a flow chart represented on of example of the possible usage of the rotation derivation of SAO parameters.
- the sao_merge_rotation_Left_X flag is extracted from the bitstream if other merge flags are off (3613). If sao_merge_rotation_Left_X is equal to 1, for each color component YUV of the current CTU the set of SAO parameters is derived from the set SAO parameters for the same component of the left CTU YUV by applying rotation to the edge classification as described in the twenty-fifth embodiment.
- the SAO parameters other than the direction may be simply copied.
- the rotation spatial derivation was applied to one CTU.
- a group-based rotation spatial derivation is applied. Then, each CTU of a current group derives its SAO parameters by rotation from the CTU at the same position in another group of the same image.
- the group may be 3x3 CTUs.
- the other group may be a group above or on the left.
- group-based spatial derivation may be in competition with a group-based temporal derivation (either copy or rotation or both).
- group-based spatial derivation may be in competition with a group-based “copy” spatial derivation (which may be copy-left and/or copy up).
- a rotation spatial derivation was introduced.
- the rotation temporal derivation is one of a wider class of possible direction- transforming temporal derivations
- the rotation spatial derivation is one of a wider class of possible direction-changing spatial derivations.
- the direction-changing spatial derivation may be applied to an individual CTU or to a group of CTUs. It may be in competition with other spatial and/or temporal derivations.
- Figure 38 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention.
- the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100.
- a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user.
- the system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal - e.g. while earlier video/audio are being displayed/output) via the communication network 199.
- the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time.
- the system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191.
- the bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus.
- the system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g.
- the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content.
- the decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
- the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer- readable medium and executed by a hardware-based processing unit.
- Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol.
- computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non- transitory or (2) a communication medium such as a signal or carrier wave.
- Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
- a computer program product may include a computer- readable medium.
- such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- any connection is properly termed a computer-readable medium.
- a computer-readable medium For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
- DSL digital subscriber line
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
- the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
- IC integrated circuit
- a set of ICs e.g., a chip set.
- Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present invention provides a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising performing SAO filtering on a group made up of N × N image parts of the image using SAO parameters associated with the group, wherein N is three or more.
Description
VIDEO CODING AND DECODING
The present invention relates to video coding and decoding.
Recently, the Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group l6’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include— but not limited to— 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
The JVET exploration model (JEM) uses all the HEVC tools. One of these tools is sample adaptive offset (SAO) filtering. However, SAO is less efficient in the JEM reference software than in the HEVC reference software. This arises from fewer evaluations and from signalling inefficiencies compared to other loop filters.
US 9769450 discloses an SAO filter for three dimensional or 3D Video Coding or 3DVC such as implemented by the HEVC standard. The filter directly re-uses SAO filter parameters of an independent view or a coded dependent view to encode another dependent view, or re-uses only part of the SAO filter parameters of the independent view or a coded dependent view to encode another dependent view. The SAO parameters are re-used by copying them from the independent view or coded dependent view.
US 2014/0192860 Al relates to the scalable extension of HEVC. HEVC scalable extension aims at allowing coding/decoding of a video having multiple scalability layers, each layer being made up of a series of frames. Coding efficiency is improved by inferring, or deriving, SAO parameters to be used at an upper layer (e.g. an enhancement layer) from the SAO parameters actually used at a lower (e.g. base) layer. This is because inferring some SAO parameters makes it possible to avoid transmitting them.
It is desirable to improve the coding efficiency of images subjected to the SAO filtering.
It is an aim of embodiments of the present invention to address one or more problems or disadvantages of the foregoing SAO filter.
According to aspects of the present invention there are provided a method, a device, a signal, and a program as set forth in the appended claims. According to other aspects of the
invention, there are provided a computer readable storage medium or a non-transitory computer-readable storage medium storing the program as set forth in the appended claims and a bitstream generating using a method as set forth in the appended claims. Other features of the invention will be apparent from the dependent claims, and the description which follows.
According to a first aspect of the present invention, there is provided a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising: performing SAO filtering on a group made up of N x N image parts of the image using SAO parameters associated with the group, wherein N is three or more.
Suitably, two or more different groupings of said image parts are available, and the group made up of the N x N image parts is formed by one of said available groupings.
Suitably, the method further comprises: comparing SAO filtering using two or more of the available groupings; and selecting one grouping based on the comparison. Suitably, one available grouping forms another group made up of M image parts in a column of the image, and M is three or more. Suitably, one available grouping forms another group made up of image parts in a complete column of the image (e.g. M is the height of the image).
According to a second aspect of the present invention, there is provided a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising: performing SAO filtering on a group made up of M image parts in a column of the image, wherein M is three or more.
According to a third aspect of the present invention, there is provided a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising: performing SAO filtering on a group made up of image parts in a complete column of the image.
Suitably, in the second or third aspect two or more different groupings of said image parts are available, and the group is formed by one of said available groupings. Suitably, the method further comprises: comparing SAO filtering using two or more of the available groupings ; and selecting one grouping based on the comparison.
According to a fourth aspect of the present invention, there is provided a method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, two or more different groupings of said image parts being available, the method comprising: determining a grouping; and performing the SAO filtering using SAO parameters
associated with the determined grouping, wherein the two or more different groupings comprise one or more of: a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; another grouping for forming another group made up of M image parts in a column of the image, wherein M is three or more; and another grouping for forming another group made up of image parts in a complete column of the image.
Suitably, the determining comprises: comparing SAO filtering using two or more of the available groupings ; and selecting one grouping based on the comparison.
Suitably, the determining comprises obtaining, from a bitstream,: the SAO parameters associated with the grouping; data indicating a grouping, and determining the grouping using the obtained data; and/or data indicating inferring of the SAO parameters for the SAO filtering from another image part of the image or of another image, and inferring the SAO parameters using the obtained data.
Suitably, the method further comprises: obtaining, from a bitstream, data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used; and when the data indicates either or both data is available, obtaining, from the bitstream, the available data and using the obtained available data to determine the grouping or inferring the SAO parameters for the SAO filtering. Suitably, the method further comprises, when the data indicates either of the data is not available for use, obtaining, from the bitstream, the SAO parameters associated with the grouping.
Suitably, the method further comprises providing, in a bitstream: the SAO parameters associated with the grouping; data indicating a grouping; or data indicating SAO parameters for the SAO filtering are inferred from another image part of the image or of another image. Suitably, the method further comprises providing, in a bitstream, data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used. Suitably, the method further comprises, when said data indicates either of the data is not available for use, not including the unavailable data in the bitstream.
Suitably, the data indicating the grouping and/or the data indicating inferring of the SAO parameters is explicitly signalled in the bitstream. Suitably, the data indicating the grouping and/or the data indicating inferring of the SAO parameters is implicitly signalled by the bitstream (i.e. without explicitly signalling in the bitstream).
According to a fifth aspect of the present invention, there is provided a method of encoding an image or a sequence of images, the method comprising performing sample adaptive offset (SAO) filtering according to the method of any one of the aforegoing first to fourth aspects.
According to a sixth aspect of the present invention, there is provided a method of decoding an image or a sequence of images, the method comprising performing sample adaptive offset (SAO) filtering according to the method of any one of the aforegoing first to fourth aspects.
According to a seventh aspect of the present invention, there is provided a device for performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the device comprising a means for performing sample adaptive offset (SAO) filtering according to the method of any one of the aforegoing first to fourth aspects.
According to an eighth aspect, there is provided a device for encoding an image or a sequence of images, the device comprising a device of the seventh aspect.
According to a ninth aspect, there is provided a device for decoding an image or a sequence of images, the device comprising a device of the seventh aspect. According to a tenth aspect, there is provided a program which, when executed causes the method of any one of the aforegoing first to sixth aspects.
Suitably, in one of the aforegoing first to tenth aspects, the comparison is based on rate- distortion evaluation for the two or more groupings. Suitably, in one of the aforegoing first to tenth aspects, at least one available grouping is excluded from the comparison and/or evaluation.
Suitably, in one of the aforegoing aspects, the two or more different (and/or available) groupings further comprise one or more of a grouping(s) for forming: a group made up of p x q image parts of the image, wherein p and q are one or larger; a group made up of j x j image parts of the image, wherein j is two; a group made up of an image part of the image; a group made up of all the image parts of the image; a group made up of image parts in a line of the image; a group made up of k image parts in a row of the image, wherein k is three or more/the width of the image; a group made up of image part(s) which use(s) temporal derivation for at least one SAO parameter; a group made up of image part(s) which use(s) temporal derivation with a modified image or image part for at least one SAO parameter; and a group made up of image part(s) which uses(s) temporal derivation with another image or image part which has been rotated by 45, 90 or 135 degrees for at least one SAO parameter.
Suitably, in one of the aforegoing aspects, an image part is one of: a block; a unit; a partition; a portion; a coding tree block; a largest coding unit; or a coding tree unit, for
processing or coding the image. Suitably, in one of the aforegoing aspects, an image part is a coding tree unit or a coding tree block.
Suitably, in one of the aforegoing aspects, the data indicating a grouping is an index or an identifier for the grouping (e.g. an element index of an array/list/table of groupings). Suitably, in one of the aforegoing aspects, the data indicating inferring of the SAO parameters is a flag(s) indicating at least one SAO parameter is to be copied from another image part of the image or of another image. Suitably, in one of the aforegoing aspects, the data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used is a flag(s) indicating enabling (e.g. used) or disabling (e.g. not used) of the data.
Suitably, in one of the aforegoing aspects, the SAO parameters are parameters used in SAO filtering, for example a control data for controlling a SAO filter. For example, the SAO parameters comprise one or more of: an SAO (filter) type parameter indicating whether it is an Edge Offset (EO) or a Band Offset (BO) type (or whether there is no SAO filtering at all); a direction for the Edge Offset; an SAO band (range); an SAO band position; and an SAO offset to be applied with the SAO filter.
According to an eleventh aspect, there is provided a signal carrying an information dataset for an image or a sequence of images represented by a bitstream, the image comprising a plurality of image parts, wherein the information dataset comprises data for performing SAO filtering using SAO parameters associated with a group made up of : N x N image parts of the image, wherein N is three or more; M image parts in a column of the image, wherein M is three or more; or image parts in a complete column of the image.
Further features, aspects, and advantages of the present invention will become apparent from the following description of embodiments with reference to the attached drawings. Each of the embodiments of the present invention described below can be implemented solely or as a combination of a plurality of the embodiments. Also, features from different embodiments can be combined where necessary or where the combination of elements or features from individual embodiments in a single embodiment is beneficial.
Reference will now be made, by way of example, to the accompanying drawings, in which:
Figure 1 is a diagram for use in explaining a coding structure used in HEYC;
Figure 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the invention may be implemented;
Figure 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented;
Figure 4 is a flow chart illustrating steps of an encoding method according to embodiments of the invention;
Figure 5 is a flow chart illustrating steps of a loop filtering process of in accordance with one or more embodiments of the invention;
Figure 6 is a flow chart illustrating steps of a decoding method according to embodiments of the invention;
Figure 7A and 7B are diagrams for use in explaining edge-type SAO filtering in HEVC;
Figure 8 is a diagram for use in explaining band-type SAO filtering in HEVC;
Figure 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications;
Figure 10 is a flow chart illustrating in more detail one of the steps of the Figure 9 process;
Figure 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications;
Figure 12A is a flow chart illustrating how SAO filtering is performed on an image part according to a first embodiment of the present invention;
Figures 12B-12C are flow charts illustrating how SAO filtering is performed on an image part according to a fourth embodiment of the present invention;
Figures 13A and 13B are flow charts illustrating how a grouping is determined using a rate-distortion compromise comparison according to the fourth embodiment of the present invention;
Figures 14A-14B are flow charts illustrating how a determined grouping and/or SAO parameters is communicated according to a fifth embodiment of the present invention;
Figures 14C-14D are flow charts illustrating how a determined grouping and/or SAO parameters are communicated according to a sixth embodiment of the present invention;
Figure 15 is a schematic view for use in explaining a temporal derivation of SAO parameters in a seventh embodiment of the present invention; Figure 16 is a flow chart for use in explaining a method of decoding an image in the seventh embodiment;
Figure 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in a CTU-level non-temporal derivation of SAO parameters in an eighth embodiment of the present invention;
Figure 18 shows one of the steps of Figure 17 in more detail;
Figure 19 shows another one of the steps of Figure 17 in more detail;
Figure 20 shows yet another one of the steps of Figure 17 in more detail;
Figure 21 is a flow chart for use in explaining how to evaluate a cost of a temporal derivation in the eighth embodiment;
Figure 22 is a flow chart for use in explaining how to compare the costs of the temporal derivation and a further, non-temporal derivation, in the eighth embodiment;
Figure 23 shows various different groupings 1201-1206 of CTUs in a slice;
Figure 24 is a diagram showing image parts of a frame in a non-temporal derivation of SAO parameters in which a first method of sharing SAO parameters is used; Figure 25 is a flowchart of an example of a process for setting SAO parameters in the non-temporal derivation of Figure 24;
Figure 26 is a flowchart of an example of a process for setting of SAO parameters in another non-temporal derivation using the first sharing method to share SAO parameters among a column of CTUs;
Figure 27 is a flowchart of an example of a process for setting of SAO parameters in yet another non-temporal derivation using the first sharing method to share SAO parameters among a group of NxN CTUs;
Figure 28 is a diagram showing image parts of one NxN group in the non-temporal derivation of Figure 27;
Figure 29 illustrates an example of how to select the SAO parameter derivation in an eleventh embodiment of the present invention;
Figure 30 is a flow chart illustrating a decoding process suitable for a second method of sharing SAO parameters among image parts of a group; Figure 31 is a diagram showing image parts of multiple 2x2 groups in a sixteenth embodiment of the present invention;
Figure 32 is a schematic view for use in explaining a process of deriving SAO parameters in a temporal rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention;
Figure 33 is a schematic view of the temporal rotation derivation of Figure 32;
Figure 34 is a schematic view for use in explaining a process of deriving SAO parameters in which different temporal derivations are available;
Figure 35 is a flowchart for use in explaining a decoding process in a twenty-fifth embodiment of the present invention;
Figure 36 is a schematic view for use in explaining a process of deriving SAO parameters in a spatial rotation derivation of SAO parameters in accordance with a twentieth embodiment of the present invention;
Figure 37 is a flowchart for use in explaining a decoding process in the twenty-seventh embodiment; and
Figure 38 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments of the present invention.
Embodiments of the present invention will be described hereinafter in detail, with reference to the accompanying drawings. It is to be understood that the following embodiments are not intended to limit the claims of the present invention, and that not all of the combinations of the aspects that are described according to the following embodiments are necessarily required with respect to the means to solve the problems according to the present invention.
In order to illustrate how the present invention may be put into effect, some of the embodiments described herein are based on how encoding and decoding processes are performed according to the High Efficiency Video Coding (HEVC). However, the present invention is not limited thereto. It is understood that other embodiments of the present invention may be based on any process or device that involves SAO filtering being performed an image or an image part. For example, a SAO filter according an embodiment of the present invention may be used in any image/video encoding or decoding process or device, such as a future video coding standard compliant device.
It is understood that where a HEVC compliant method/process or device (e.g. an encoder, a decoder, a SAO filter of HEVC) is described in relation to an embodiment of the present invention, not all features of the HEVC compliant method/process or device or SAO filter need to be included in the embodiment. As long as those features that interact with other parts of the embodiment are included, the embodiment of the invention can be put into effect.
It is also understood that according to an embodiment of the present invention, a decoder according to a later described embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a tablet or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user. According to yet another embodiment, an encoder according to a later described embodiment is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode (and communicate/transmit thereafter).
Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard. A video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
An image 2 of the sequence may be divided into slices 3. A slice may in some instances constitute an entire image. These slices are divided into non-overlapping Coding Tree Units (CTUs). A Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards. A CTU is also sometimes referred to as a Largest Coding Unit (LCU). A CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
A CTU is generally of size 64 pixels x 64 pixels. Each CTU may in turn be iteratively divided into smaller variable-size Coding Units (CUs) 5 using a quadtree decomposition.
Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU). The maximum size of a PU or TU is equal to the CU size. A Prediction Unit corresponds to the partition of the CU for prediction of pixels values. Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs. A Transform Unit is an elementary unit that is subjected to spatial transformation using DCT. A CU can be partitioned into TUs based on a quadtree representation 607.
Each slice is embedded in one Network Abstraction Layer (NAL) unit. In addition, the coding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC and H.264/AVC two kinds of parameter sets NAL units are employed: first, a Sequence Parameter Set (SPS) NAL unit that gathers all parameters that are unchanged during the whole video sequence. Typically, it handles the coding profile, the size of the video frames
and other parameters. Secondly, a Picture Parameter Set (PPS) NAL unit includes parameters that may change from one image (or frame) to another of a sequence. HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream. The VPS is a new type of parameter set defined in HEVC, and applies to all of the layers of a bitstream. A layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer. HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
Figure 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented. The data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200. The data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.1 la or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks. In a particular embodiment of the invention the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
The data stream 204 provided by the server 201 may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 or received by the server 201 from another data provider, or generated at the server 201. The server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
In order to obtain a better ratio of the quality of transmitted data to quantity of transmitted data, the compression of the video data may be for example in accordance with the HEVC format or H.264/AVC format.
The client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker.
Although a streaming scenario is considered in the example of Figure 2, it will be appreciated that in some embodiments of the invention the data communication between an
encoder and a decoder may be performed using for example a media storage device such as an optical disc.
In one or more embodiments of the invention a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
Figure 3 schematically illustrates a processing device 300 configured to implement at least one embodiment of the present invention. The processing device 300 may be a device such as a micro-computer, a workstation or a light portable device. The device 300 comprises a communication bus 313 connected to:
-a central processing unit 311, such as a microprocessor, denoted CPU;
-a read only memory 307, denoted ROM, for storing computer programs for implementing the invention;
-a random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention; and
-a communication interface 302 connected to a communication network 303 over which digital data to be processed are transmitted or received
Optionally, the apparatus 300 may also include the following components:
-a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention;
-a disk drive 305 for a disk 306, the disk drive being adapted to read data from the disk 306 or to write data onto said disk;
-a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means.
The apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
The communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is operable to communicate
instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
The disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
The executable code may be stored either in read only memory 306, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously.
According to a variant, the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304.
The central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 304 or in the read only memory 306, are transferred into the random access memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Figure 4 illustrates a block diagram of an encoder according to at least one embodiment of the invention. The encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
An original sequence of digital images /Ό to m 401 is received as an input by the encoder
400. Each digital image is represented by a set of samples, known as pixels.
A bitstream 410 is output by the encoder 400 after implementation of the encoding process. The bitstream 410 comprises a plurality of encoding units or slices, each slice
comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
The input digital images /0 to m 401 are divided into blocks of pixels by module 402. The blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered). A coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405. Firstly a reference image/picture from among a set of reference images/pictures 416 is selected, and a portion of the reference image/picture, also called reference area or image portion, which is the closest area to the given block to be encoded, is selected by the motion estimation module 404. Motion compensation module 405 then predicts the block to be encoded using the selected area. The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405. The selected reference area is indicated by a motion vector.
Thus in both cases (spatial and temporal prediction), a residual is computed by subtracting the prediction from the original block.
In the INTRA prediction implemented by module 403, a prediction direction is encoded. In the temporal prediction, at least one motion vector is encoded.
Information relative to the motion vector and the residual block is encoded if the Inter prediction is selected. To further reduce the bitrate, assuming that motion is homogeneous, the motion vector is encoded by difference with respect to a motion vector predictor. Motion vector predictors of a set of motion information predictors is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417.
The encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion. In order to further reduce redundancies a transform (such as DCT) is applied by transform module 407 to the residual block, the transformed data obtained is then quantized by quantization module
408 and entropy encoded by entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bitstream 410.
The encoder 400 also performs decoding of the encoded image in order to produce a reference image for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames. The dequantization module 411 performs dequantization of the quantized data, followed by an inverse transform by inverse transform module 412. The intra prediction module 413 uses the prediction information to determine which predictor to use for a given block and the motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images/pictures 416.
Post filtering is then applied by module 415 to filter the reconstructed frame of pixels. In the embodiments of the invention an SAO loop filter is used in which compensation offsets are added to the pixel values of the reconstructed pixels of the reconstructed image
Figure 5 is a flow chart illustrating steps of loop filtering process according to at least one embodiment of the invention. In an initial step 51 , the encoder generates the reconstruction of the full frame. Next, in step 52 a deblocking filter is applied on this first reconstruction in order to generate a deblocked reconstruction 53. The aim of the deblocking filter is to remove block artifacts generated by residual quantization and block motion compensation or block Intra prediction. These artifacts are visually important at low bitrates. The deblocking filter operates to smooth the block boundaries according to the characteristics of two neighboring blocks. The encoding mode of each block, the quantization parameters used for the residual coding, and the neighboring pixel differences in the boundary are taken into account. The same criterion/classification is applied for all frames and no additional data is transmitted. The deblocking filter improves the visual quality of the current frame by removing blocking artifacts and it also improves the motion estimation and motion compensation for subsequent frames. Indeed, high frequencies of the block artifact are removed, and so these high frequencies do not need to be compensated for with the texture residual of the following frames.
After the deblocking filter, the deblocked reconstruction is filtered by a sample adaptive offset (SAO) loop filter in step 54 using SAO parameters determined in accordance with embodiments of the invention. The resulting frame 55 may then be filtered with an adaptive loop filter (ALF) in step 56 to generate the reconstructed frame 57 which will be displayed and used as a reference frame for the following Inter frames.
In step 54 each pixel of the frame region is classified into a class or group. The same offset value is added to every pixel value which belongs to a certain class or group.
The derivation of the SAO parameters for the sample adaptive offset filtering in different embodiments of the present invention will be explained in more detail hereafter with reference to Figures 12 to 38.
Figure 6 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
The decoder 60 receives a bitstream 61 comprising encoding units, each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. As explained with respect to Figure 4, the encoded video data is entropy encoded, and the motion vector predictors’ indexes are encoded, for a given block, on a predetermined number of bits. The received encoded video data is entropy decoded by module 62. The residual data are then dequantized by module 63 and then an inverse transform is performed by module 64 to obtain pixel values.
The mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks of image data.
In the case of INTRA mode, an INTRA predictor is determined by intra prediction module 65 based on the intra prediction mode specified in the bitstream.
If the mode is INTER, the motion prediction information is extracted from the bitstream so as to find the reference area used by the encoder. The motion prediction information is composed of the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual in order to obtain the motion vector by motion vector decoding module 70.
Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor, for the current block has been obtained the actual value of the motion vector associated with the current block can be decoded and used to perform motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to perform the motion compensation 66. The motion vector field data 71 is updated with the decoded motion vector in order to be used for the prediction of subsequent decoded motion vectors.
Finally, a decoded block is obtained. Post filtering is applied by post filtering module 67 similarly to post filtering module 815 applied at the encoder as described with reference to Figure 5. A decoded video signal 69 is finally provided by the decoder 60.
The aim of SAO filtering is to improve the quality of the reconstructed frame by sending additional data in the bitstream in contrast to the deblocking filter where no information is transmitted. As mentioned above, each pixel is classified into a predetermined class or group and the same offset value is added to every pixel sample of the same class/group. One offset is encoded in the bitstream for each class. SAO loop filtering has two SAO types: an Edge Offset (EO) type and a Band Offset (BO) type. An example of Edge Offset type is schematically illustrated in Figures 7A and 7B, and an example of Band Offset type is schematically illustrated in Figure 8.
In HEVC, SAO filtering is applied CTU by CTU. In this case the parameters needed to perform the SAO filtering (set of SAO parameters) are selected for each CTU at the encoder side and the necessary parameters are decoded and/or derived for each CTU at the decoder side. This offers the possibility of easily encoding and decoding the video sequence by processing each CTU at once without introducing delays in the processing of the whole frame. Moreover, when SAO filtering is enabled, only one SAO type is used: either the Edge Offset type filter or the Band Offset type filter according to the related parameters transmitted in the bitstream for each classification. One of the SAO parameters in HEVC is an SAO type parameter saojypejdx which indicates for the CTU whether EO type, BO type or no SAO filtering is selected for the CTU concerned.
The SAO parameters for a given CTU can be copied from the upper or left CTU, for example, instead of transmitting all the SAO data. One of the SAO parameters in HEVC is a saojnerge p flag, which when set indicates that the SAO parameters (other than the sao merge up flag) for the subject CTU should be copied from the upper CTU. Another of the SAO parameters in HEVC is a saojnergejeft flag, which when set indicates that the SAO parameters for the subject CTU should be copied from the left CTU.
SAO filtering may be applied independently for different color components (e.g. YUV) of the frame. For example, one set of SAO parameters may be provided for the luma component Y and another set of SAO parameters may be provided for both chroma components U and V in common. Also, within the set of SAO parameters one or more SAO parameters may be used as common filtering parameters for two or more color components, while other SAO parameters are dedicated (per-component) filtering parameters for the color components. For example, in HEVC, the SAO type parameter saojypejdx is common to U and V, and so is a
EO class parameter which indicates a class for EO filtering (see below), whereas a BO class parameter which indicates a group of classes for BO filtering has dedicated (per-component) SAO parameters for U and V.
A description of the Edge Offset type in HEVC is now provided with reference to Figures 7A and 7B.
Edge Offset type involves determining an edge index for each pixel by comparing its pixel value to the values of two neighboring pixels. Moreover, these two neighboring pixels depend on a parameter which indicates the direction of these two neighboring pixels with respect to the current pixel. These directions are the 0-degree (horizontal direction), 45-degree (diagonal direction), 90-degree (vertical direction) and 135-degree (second diagonal direction). These four directions are schematically illustrated in Figure 7A.
The table of Figure 7B gives the offset value to be applied to the pixel value of a particular pixel“C” according to the value of the two neighboring pixels Cnl and Cn2 at the decoder side.
When the value of C is less than the two values of neighboring pixels Cnl and Cn2, the offset to be added to the pixel value of the pixel C is“+ 01”. When the pixel value of C is less than one pixel value of its neighboring pixels (either Cnl or Cn2) and C is equal to one value of its neighbors, the offset to be added to this pixel sample value is“+ 02”.
When the pixel value of c is less than one of the pixel values of its neighbors (Cnl or Cn2) and the pixel value of C is equal to one value of its neighbors, the offset to be applied to this pixel sample is“- 03”. When the value of C is greater than the two values of Cnl or Cn2, the offset to be applied to this pixel sample is“- 04”.
When none of the above conditions is met on the current sample and its neighbors, no offset value is added to the current pixel C as depicted by the Edge Index value“2” of the table.
It is important to note that for the particular case of the Edge Offset type, the absolute value of each offset (01, 02, 03, 04) is encoded in the bitstream. The sign to be applied to each offset depends on the edge index (or the Edge Index in the HEVC specifications) to which the current pixel belongs. According to the table represented in Figure 7B, for Edge Index 0 and for Edge Index 1 (01, 02) a positive offset is applied. For Edge Index 3 and Edge Index 4 (03, 04), a negative offset is applied to the current pixel.
In the HEVC specifications, the direction for the Edge Offset amongst the four directions of Figure 7A is specified in the bitstream by a sao_eo_class_luma” field for the luma component and a“sao_eo_class_chroma” field for both chroma components U and V.
The SAO Edge Index corresponding to the index value is obtained by the following formula: Edgelndex = sign (C - Cn2) - sign (Cnl- C) +2
where the definition of the function sign(.) is given by the following relationships sign(x) = 1 , when x>0
sign(x) = - 1 , when x<0
sign(x) = 0, when x=0.
In order to simplify the Edge Offset determination for each pixel, the difference between the pixel value of C and the pixel value of both its neighboring pixels Cnl and Cn2 can be shared for current pixel C and its neighbors. Indeed, when SAO Edge Offset filtering is applied using a raster scan order of pixels of the current CTU or frame, the term sign (Cnl- C) has already computed for the previous pixels (to be precise it was computed as C’-Cn2’ at a time when the current pixel C’ at that time was the present neighboring pixel Cnl and the neighboring pixel Cn2’ was what is now the current pixel C). As a consequence this sign (cnl- c) does not need to be computed again.
A description of the Band Offset type is now provided with reference to Figure 8. Band Offset type in SAO also depends on the pixel value of the sample to be processed. A class in SAO Band offset is defined as a range of pixel values. Conventionally, for all pixels within a range, the same offset is added to the pixel value. In the HEVC specifications, the number of offsets for the Band Offset filter is four for each reconstructed block or frame area of pixels (CTU), as schematically illustrated in Figure 8.
One implementation of SAO Band offset splits the full range of pixel values into 32 ranges of the same size. These 32 ranges are the bands (or classes) of SAO Band offset. The minimum value of the range of pixel values is systematically 0 and the maximum value depends on the bit depth of the pixel values according to the following relationship Max = 2B tdepth-l Classifying the pixels into 32 ranges of the full interval includes 5 bits checking needed to classify the pixels values for fast implementation i.e. only the 5 first bits (5 most significant bits) are checked to classify a pixel into one of the 32 classes/ ranges of the full range.
For example, when the bitdepth is 8 bits per pixel, the maximum value of a pixel can be 255. Hence, the range of pixel values is between 0 and 255. For this bitdepth of 8 bits, each band or class contains 8 pixel values.
According to the HEVC specifications, a group 40 of bands, represented by the grey area (40), is used, the group having four successive bands 41, 42, 43 and 44, and information
is signaled in the bitstream to identify the position of the group, for example the position of the first of the 4 bands. The syntax element representative of this position is the “ sao_band jpositiorT’ field in the HEVC specifications. This corresponds to the start of band 41 in Figure 8. According to the HEVC specifications, 4 offsets corresponding respectively to the 4 bands are signaled in the bitstream.
Figure 9 is a flow chart illustrating the steps of a process to decode SAO parameters according to the HEVC specifications. The process of Figure 9 is applied for each CTU to generate a set of SAO parameters for all components. In order to avoid encoding one set of SAO parameters per CTU (which is very costly), a predictive scheme is used for the CTU mode. This predictive mode involves checking if the CTU on the left of the current CTU uses the same SAO parameters (this is specified in the bitstream through a flag named “ saojnergejeft Jlag”). If not, a second check is performed with the CTU above the current CTU (this is specified in the bitstream through a flag named“ sao_merge_up Jlag”). This predictive technique enables the amount of data representing the SAO parameters for the CTU mode to be reduced. Steps of the process are set out below.
In step 503, the“ saojnergejeft Jlag” is read from the bitstream 502 and decoded. If its value is true, then the process proceeds to step 504 where the SAO parameters of left CTU are copied for the current CTU. This enables the types for YUV of the SAO filter for the current CTU to be determined in step 508.
If the outcome is negative in step 503 then the“ sao_merge_up Jlag” is read from the bitstream and decoded. If its value is true, then the process proceeds to step 505 where the SAO parameters of the above CTU are copied for the current CTU. This enables the types of the SAO filter for the current CTU to be determined in step 508.
If the outcome is negative in step 505, then the SAO parameters for the current CTU are read and decoded from the bitstream in step 507 for the Luma Y component and both U and V components (501) (551) for the type. The offsets for Chroma are independent.
The details of this step are described later with reference to Figure 10. After this step, the parameters are obtained and the type of SAO filter is determined in step 508.
In subsequent step 511 a check is performed to determine if the three colour components (Y and U & V) for the current CTU have been processed. If the outcome is positive, the determination of the SAO parameters for the three components is complete and the next CTU can be processed in step 510. Otherwise, (Only Y was processed) U and V are processed together and the process restarts from initial step 512 previously described.
Figure 10 is a flow chart illustrating steps of a process of parsing of SAO parameters in the bitstream 601 at the decoder side. In an initial step 602, the”sao ypejdx_X” syntax element is read and decoded. The code word representing this syntax element can use a fixed length code or could use any method of arithmetic coding. The syntax element sao_type_idx_X enables determination of the type of SAO applied for the frame area to be processed for the colour component Y or for both Chroma components U & V. For example, for a YUV 4:2:0 sequence, two components are considered: one for Y, and one for U and V. The “ saoJypejdxJC’ can take 3 values as follows depending on the SAO type encoded in the bitstream.‘O’ corresponds to no SAO,‘G corresponds to the Band Offset case illustrated in Figure 8 and‘2’ corresponds to the Edge Offset type filter illustrated in Figures 3 A and 3B.
Incidentally, although YUV color components are used in HE VC (sometimes called Y, Cr and Cb components), it will be appreciated that in other video coding schemes other color components may be used, for example RGB color components. The techniques of the present invention are not limited to use with YUV color components. and can be used with RGB color components or any other color components.
In the same step 602, a test is performed to determine if the“ saoJypejdx ’ is strictly positive. If“saoJypejdxJC’ is equal to“0” signifying that there is no SAO for this frame area (CTU) for Y if X is set equal to Y and that there is no SAO for this frame area for U and V if X is set equal to U and V. The determination of the SAO parameters is complete and the process proceeds to step 608. Otherwise if the“ saojypejdx” is strictly positive, this signifies that SAO parameters exist for this CTU in the bitstream.
Then the process proceeds to step 606 where a loop is performed for four iterations. The four iterations are carried in step 607 where the absolute value of offset j is read and decoded from the bitstream. These four offsets correspond either to the four absolute values of the offsets (01, 02, 03, 04) of the four Edge indexes of SAO Edge Offset (see Figure 7B) or to the four absolute values of the offsets related to the four ranges of the SAO band Offset (see Figure 8).
Note that for the coding of an SAO offset, a first part is transmitted in the bitstream corresponding to the absolute value of the offset. This absolute value is coded with a unary code. The maximum value for an absolute value is given by the following formula:
MAX abs SAO offset value = (1 « (Min(bitDepth, l0)-5))-l
where « is the left (bit) shift operator.
This formula means that the maximum absolute value of an offset is 7 for a pixel value bitdepth of 8 bits, and 31 for a pixel value bitdepth of 10 bits and beyond.
The current HE VC standard amendment addressing extended bitdepth video sequences provides similar formula for a pixel value having a bitdepth of 12 bits and beyond. The absolute value decoded may be a quantized value which is dequantized before it is applied to pixel values at the decoder for SAO filtering. An indication of use or not of this quantification is transmitted in the slice header.
For Edge Offset type, only the absolute value is transmitted because the sign can be inferred as explained previously.
For Band Offset type, the sign is signaled in the bitstream as a second part of the offset if the absolute value of the offset is not equal to 0. The bit of the sign is bypassed when CAB AC is used.
After step 607, the process proceeds to step 603 where a test is performed to determine if the type of SAO corresponds to the Band Offset type (sao_type_idx_X == 1).
If the outcome is positive, the signs of the offsets for the Band Offset mode are decoded in steps 609 and 610, except for each offset that has a zero value, before the following step 604 is performed in order to read in the bitstream and to decode the position“ sao_band _position_X” of the SAO band as illustrated in Figure 8.
If the outcome is negative in step 603 ^ saoJypejdxJC’ is set equal to 2), this signifies that the Edge Offset type is used. Consequently, the Edge Offset class
(corresponding to the direction 0, 45, 90 and 135 degrees) is extracted from the bitstream 601 in step 605. If X is equal to Y, the read syntax element is“sao eo class luma” and if X is set equal to U and V, the read syntax element is“sao eo class chroma”.
When the four offsets have been decoded, the reading of the SAO parameters is complete and the process proceeds to step 608.
Figure 11 is a flow chart illustrating how SAO filtering is performed on an image part according to the HEVC specifications, for example during the step 67 in Figure 6. In HEVC, this image part is a CTU. This same process 700 is also applied in the decoding loop (step 415 in Figure 4) at the encoder in order to produce the reference frames used for the motion estimation and compensation of the following frames. This process is related to the SAO filtering for one color component (thus suffix“_X” in the syntax elements has been omitted below).
An initial step 701 comprises determining the SAO filtering parameters according to processes depicted in Figures 9 and 10. The SAO filtering parameters are determined by the encoder and the encoded SAO parameters are included in the bitstream. Accordingly, on the decoder side in step 701 the decoder reads and decodes the parameters from the bitstream.
Step 701 obtains the saojypejdx and if it equals 1 also obtains the sao_band jposition 702 and if it equals 2 also obtains the sao o lass Junta or sao_eo_class_chroma (according to the color component processed). If the element saojypejdx is equal to 0 the SAO filtering is not applied. Step 701 obtains also an offsets table 703 of the 4 offsets.
A variable i, used to successively consider each pixel Pi of the current block or frame area (CTU), is set to 0 in step 704. Incidentally,“frame area” and“image area” are used interchangeably in the present specification. A frame area in this example is a CTU in the
HEVC standard. In step 706, pixel p is extracted from the frame area 705 which contains N pixels. This pixel p is classified in step 707 according to the Edge offset classification described with reference to Figures 7A & 7B or Band offset classification as described with reference to Figure 8. The decision module 708 tests if ' is in a class that is to be filtered using the conventional SAO filtering.
If p is in a filtered class, the related class number j is identified and the related offset
Offset
value J is extracted in step 710 from the offsets table 703. In the case of the conventional
Offset P.
SAO filtering this J is then added to the pixel value ' in step 711 in order to produce
P' P'
the filtered pixel value 712. This filtered pixel is inserted in step 713 into the filtered frame area 716.
P P
If is not in a class to be SAO filtered then (709) is inserted in step 713 into the filtered frame area 716 without filtering.
After step 713, the variable i is incremented in step 714 in order to filter the subsequent pixels of the current frame area 705 (if any - test 715). After all the pixels have been processed (i>=N) in step 715, the filtered frame area 716 is reconstructed and can be added to the SAO reconstructed frame (see frame 68 of Figure 6 or 416 of Figure 4).
As noted above, the JVET exploration model (JEM) for the future VVC standard uses all the HEVC tools. One of these tools is sample adaptive offset (SAO) filtering. However, SAO is less efficient in the JEM reference software than in the HEVC reference software. This arises from fewer evaluations and from signalling inefficiencies compared to other loop filters.
Embodiments of the present invention described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in a current image from one or more SAO parameters of a collocated image part in a reference image. These techniques may be referred to as temporal
derivation techniques for SAO parameters. Further embodiments described below are intended to improve the coding efficiency of SAO by using various techniques for deriving one or more SAO parameters of an image part in an image from one or more SAO parameters of another image part of the same image. These techniques may be referred to as spatial derivation techniques for SAO parameters.
First group of embodiments
First six of the first group of embodiments focuss on improving the signalling efficiency. In HEVC, SAO filtering is performed CTU by CTU which can be resource intensive. A grouping (of one or more CUTs) is not used in SAO filtering in HEVC. Also, a temporal derivation of SAO parameters is not used in HEVC. In the first group of six embodiments, different groupings and use thereof in SAO filtering (i.e. a non-temporal derivation (NTD) of SAO parameters using grouping) is introduced. Then a temporal derivation of at least one SAO parameter is introduced in the later described embodiments.
To improve the signalling efficiency, a group of image parts is formed (by a grouping) and the SAO filtering parameters are determined based on this group/grouping of image parts. Also, where applicable, the use of a particular grouping or use of the temporal derivation is signalled for this group of image parts, rather than for each image part individually. For each image part of the group, the grouping to which it belongs is used to derive/determine/obtain the SAO parameters for the image part. The grouping therefore serves as an indicator/identifier for associated SAO parameters. Also, a temporal derivation may be used to derive at least one of the SAO parameters of the image part from at least one SAO parameter of a collocated image part in a reference image. The collocated image part in the reference image therefore serves as a source image part for the image part to be derived. As a result, different image parts of the group can have different SAO parameters depending on the SAO parameters of the respective collocated image parts. Accordingly, with very light signalling, image parts belonging to a given group of image parts can determine/obtain/derive the SAO parameters to use in the SAO filtering. Also, the image parts belonging to the given group can use temporal derivation and benefit from different (and efficient) SAO parameters.
It is understood that here a collocated image/image part/CTU is an associated/corresponding image/image part/CTU of the image/image part/CTU. This collocation relationship (association/correspondence) is defined by a preset relationship between the two images/image parts/CTUs. For example, the preset relationship may be that
one (e.g. a reference image) is used to predict/determine a value for encoding/decoding/predicting the other (e.g. the current image being encoded/decoded). The present relationship may be that they are located at the same position (e.g. have the same offset value from an origin of some kind in the image/image part/CTU) and/or are either of the same size or have differing sizes but with positions therein determinable by applying an offset or a scaling factor. The present relationship may be that they are used to encode/decode/predict a value for the same pixel(s)/element(s) of the image.
Bad efficiency in SAO filtering can come from fewer evaluations and from signalling inefficiencies compared to other loop filters. For example, signalling the SAO parameters at CTU level can be very costly. An embodiment according to the present invention improves the coding efficiency of these SAO parameters by enabling derivation of them based on a grouping. For example, an encoder makes a choice/determination to group one or more image parts, such as one or more CTUs. Then, rate distortion (RD) cost for coding/communicating the SAO parameters for different groups (groupings) are evaluated to determine the best performing group (grouping) for a plurality of image parts (e.g. CTUs). Following embodiments describe determining based on RD cost evaluations but it is understood that according to alternative embodiments, other criteria than the RD cost are used to compare different groups (groupings).
According to an embodiment, the SAO parameters set selected for (i.e. associated with) the group is transmitted/signalled once for any one of the CTUs in the group, and these parameters are shared with other CTUs in the group, for example using data or a flag for obtaining these parameters (e.g. using SAO Merge flags such as sao_merge_left_flag and/or sao merge up flag). According to another embodiment, the SAO parameters set selected for (i.e. associated with) the group is transmitted/signalled for the first CTU (in the raster scan order) and these parameters are shared with the other CTUs in the group, for example using the SAO Merge flags (e.g. sao_merge_left_flag and/or sao_merge_up_flag).
An advantage of using a grouping is that it provides a diversity/ variety for achieving a better RD compromise for SAO filtering parameters since the CTU level (e.g. as in SAO filtering in HEVC) is not always the best compromise. So when many groupings are compared/competing, a particularly high efficiency can be achieved. However, this must also be balanced with the cost of performing RD evaluation for many different groupings.
First embodiment
Figure 12A is a flow chart illustrating how SAO filtering is performed on an image part according to a first embodiment of the present invention. In the first embodiment, sample adaptive offset (SAO) filtering is performed 9000 on a group of image parts of the image, which comprises a plurality of image parts, using SAO parameters associated with that particular group (i.e. grouping). For example, an encoder or a decoder performs the SAO filtering 9000 to obtain a filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
The groupings with which SAO parameters are associated comprise one or both of: a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; and a grouping for forming a group made up of M image parts in a column of the image, wherein M is three or more, or M is the height of the image.
According to a variant of the first embodiment, the groupings with which SAO parameters are associated further comprise one or more of: a grouping for forming a group made up of all of the plurality of image parts of the image; a grouping for forming a group made up of one image part of the image; a grouping for forming a group made up of two or more image parts of the image; a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; a grouping for forming a group made up of M image parts in a column of the image, wherein M is three or more, or wherein M is the height of the image (i.e. the group is made up of image parts in the complete column of the image); a grouping for forming a group made up of 2 x 2 image parts of the image; a grouping for forming a rectangular group; and a grouping for forming a group made up of k image parts in a line/row of the image, wherein k is two or more/the complete width of the image.
According to a variant, the groupings further comprises one or more of: a grouping for forming a group made up of image part(s) which use(s) temporal derivation for at least one SAO parameter; a grouping for forming a group made up of image part(s) which use(s) temporal derivation with a modified image or image part for at least one SAO parameter; and a grouping for forming a group made up of image part(s) which uses(s) temporal derivation with another image or image part which has been rotated by 45, 90 or 135 degrees for at least one SAO parameter.
It is understood that according to an embodiment, an image part is one of: a block; a unit; a partition; a portion; a coding tree block; a largest coding unit; or a coding tree unit, for processing or coding the image. According to another embodiment, an image part is a coding tree unit. According to yet another embodiment, an image part is a coding tree block. Following embodiments are described with a coding tree unit as an image part but it is
understood that other embodiments of the present invention may be implemented with any one of the aforementioned block/unit/partition/portion.
Figure 23 shows various different (non-temporal derivation or spatial derivation) groupings 1201-1206 for forming groups made up of image parts (e.g. CTUs) in a slice (or a frame or an image). It illustrates several CTU groupings for a frame or a slice containing 40 (8x5) CTUs. More detailed description of these groupings and how they are used in the SAO filtering can be found later in the later described embodiments, and only a short introduction of these groupings are provided here.
The first grouping 1201 forms a group made up of individual CTUs (a CTU level grouping). This first grouping requires one set of SAO parameters per CTU. It corresponds to the CTU-level derivation in the eighth embodiment.
The second grouping 1202 forms a group made up of all CTUs of the entire image (or slice/frame). This second one 1202 is a frame level grouping. Thus, in contrast to the CTU- level derivation for the first grouping, all CTUs of the image/frame (of the slice which is either the entire frame or a part thereof) share the same SAO parameters. Figure 25 illustrates an example for setting SAO parameters at a frame/slice level without using a new SAO classification (i.e. without adding a new classification to the HE VC SAO classification) at the encoder side, which is described in more detail later. An advantage of this embodiment is it provides a different compromise in terms of rate distortion to the CTU level comprise, which can be better than the CTU level compromise in some cases. With the frame level grouping, the amount of distortion experienced should be less than with the CTU level grouping but the rate will be very low. So which one is the best for a particular current frame/slice/image will depend the characteristics of the current frame/slice/image (e.g. size thereof).
The first and second groupings 1201, 1202 provide two extreme RD compromises. So, other groupings which are an intermediate grouping between the two extreme groupings (i.e. an intermediate level grouping) are also considered.
The third grouping 1203 is an intermediate grouping which compromises between the CTU level and the frame level (i.e. the first and second groupings 1201 1202). The third grouping 1203 forms a group made up of CTUs in a column of the image. Figure 26 illustrates an example for, at the encoder side, setting SAO parameters for a group made up of column CTUs.
Another intermediate compromise between the CTU level and the frame level can be offered by the fourth grouping 1204, which form a group made up of CTUs in a line or a row. As a frame or a slice is often rectangular and their widths are larger than their height, the fourth
grouping 1204 is often an RD compromise closer to the second grouping 1202 (frame level) than the third grouping 1203.
Other possible intermediate compromise between the CTU level and the frame level can be offered by a square (NxN) CTU groupings such as the fifth grouping 1205 and the sixth grouping 1206 as shown in Figure 23. The fifth grouping 1205 forms a group made up of 2x2 CTUs. The sixth grouping 1206 forms a group made up of 3x3 CTUs.
According to an embodiment, the sixth grouping 1206 is used because it provides a good balance for the RD compromise for coding/communicating a sequence of images and/or images of high quality/resolution. Figure 27 illustrates an example for, at the encoder side, setting SAO parameters for such NxN CTU groupings. An advantage of such NxN CTUs groupings is that it is easy to create several RD compromises for SAO parameters (e.g. by varying N). In one embodiment, the value N depends on the size of the frame/slice. In a preferred embodiment, N is equal to 2 or 3. In a preferred embodiment, N is 3. This offers an efficient compromise, for example when encoding/decoding a sequence of images (of higher quality/resolution). One way of implementing this grouping of CTUs at the encoder is by independently obtaining/maintaining/managing a variable CTUStats for each CTU. For example, this variable can, in that case, have a dimension which is the CTU address:
CTUStats[CTU_Address] [COLOUR COMPONENT] [S AO type] [CLASS] [0] .
Similarly, according to a variant, a group of image parts formed by one of the aforementioned temporal derivation groupings is also compared/competed with the NcN CTU groupings to set the SAO parameters using a RD cost evaluation shown in Figure 21.
Second embodiment As discussed earlier, a grouping forming a group of NxN image parts (CTUs) are particularly advantageous when encoding/decoding a sequence of high quality/resolution images. So in the second embodiment, sample adaptive offset (SAO) filtering is performed on an image comprising a plurality of image parts, wherein the performing comprises performing the SAO filtering on a group made up of N x N image parts of the image using SAO parameters associated with the group, wherein N is three or more. Two or more different groupings of the image parts are available, and the group made up of the N x N image parts is formed by one of the available groupings. SAO filtering using two or more of the available groupings are compared based on rate-distortion evaluation for the two or more available groupings, and one grouping is selected based on this comparison.
According to a variant of the embodiment, one available grouping forms another group made up of M image parts in a column of the image, and M is three or more, or the height of the image. According to a variant, the two or more available groupings further comprise one or more of the aforementioned non-temporal derivation (i.e. spatial derivation) groupings. According to a variant, the two or more available groupings further comprise one or more of the aforementioned temporal derivation groupings.
According to a variant, at least one available grouping is excluded from the comparison and/or evaluation (e.g. the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination).
Third embodiment
A grouping forming a group made up of image parts (CTUs) in a column of the image also offer a good balance between RD cost and computing complexity when encoding/decoding a sequence of high quality/resolution images. So in the third embodiment, sample adaptive offset (SAO) filtering is performed on an image comprising a plurality of image parts, wherein the performing comprises performing SAO filtering on a group made up of M image parts in a column of the image, wherein M is three or more, or the height of the image. Two or more different groupings of the image parts are available, and the group made up of the M image parts in the column is formed by one of the available groupings. SAO filtering using two or more of the available groupings are compared based on rate-distortion evaluation for the two or more available groupings, and one grouping is selected based on the comparison.
According to a variant of the embodiment, one available grouping forms another group made up of N x N image parts of the image, wherein N is three or more. According to a variant, the two or more available groupings further comprise one or more of the aforementioned non temporal derivation (i.e. spatial derivation) groupings. According to a variant, the two or more available groupings further comprise one or more of the aforementioned temporal derivation groupings.
According to a variant, at least one available grouping is excluded from the comparison and/or evaluation (e.g. the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination).
Fourth embodiment
Figures 12B-12C are flow charts illustrating how SAO filtering is performed on an image part using two or more different groupings according to a fourth embodiment of the present invention. In the fourth embodiment, two or more different groupings of the image parts are available, and the performing the SAO filtering on an image 9000 comprises: determining a grouping 9100; and performing the SAO filtering using SAO parameters associated with the determined grouping 9200. For example, an encoder or a decoder performs the determining 9100 and the SAO filtering 9200 to obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
The determining 9100 comprises comparing SAO filtering using two or more of the available groupings 9110 and selecting one grouping based on this comparison 9120, the selected grouping being for use with performing the SAO filtering on the image parts. According to a variant of the embodiment, a group made up of N x N image parts or M image parts in a column of the image is formed by one of the available groupings.
According to a variant of the embodiment, the two or more different groupings comprise one or both of: a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more; and another grouping for forming another group made up of M image parts in a column of the image, wherein M is three or more, or the height of the image. According to a variant, the two or more available groupings further comprise one or more of the aforementioned non-temporal derivation (e.g. spatial derivation) groupings. According to a variant, the two or more available groupings further comprises one or more of the aforementioned temporal derivation groupings.
The comparison is based on rate-distortion evaluation for the two or more groupings. According to a variant, at least one available grouping is excluded from the comparison and/or evaluation (e.g. the rate-distortion evaluation is not performed for the at least one available grouping because the rate-distortion cost can be determined based on a previous evaluation/comparison or is just not required to make the determination).
For example, an encoder performs the comparing 9110 and the selecting 9120 to obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image. According to a variant, a corresponding decoder (of the encoder) also performs the comparing 9110 and the selecting 9120. According to a variant, the encoder communicates (e.g. provides/includes, in a bitstream) data or a flag(s) for use with the determining 9100, and a corresponding decoder
(of the encoder) performs the determining 9100 using the data or flag(s) (obtained from the bitstream).
Figures 13A-13B illustrate examples on how such a comparison based on rate- distortion evaluation is used to determine a grouping for the SAO filtering (and how an encoder signals the determined grouping). In these examples, merge flags are used within the group so if applied to HEVC, the resulting bitstream can be decoded by an HEVC-compliant decoder. It is understood that according to an alternative embodiment, the merge flag may not be used within the group.
The example of Figure 13A compares two available groupings and the example of Figure 13B compares more than two available groupings, and the description provided below applies to both. The current slice/frame 9701 is used to set the CTUStats table 9703 for each CTU 9702. This table 9703 is used to evaluate RD costs for an available grouping 9704 (e.g. any one of the aforementioned groupings such as a CTU level) and another (for Figure 13A) or more (for Figure 13B) available grouping 9705, 9706 (e.g. any one or more of the aforementioned groupings different from the available grouping 9704).
The available grouping with the best RD cost (i.e. the best derivation) is selected/determined according to the rate distortion criterion computed for each available SAO parameter derivation 9710. The SAO parameters set for each CTU are set 9711 according to the derivation selected in step 9710. These SAO parameters are then associated with the selected/determined group and used to apply the SAO filtering 9713 in order to obtain the filtered frame/slice 9714.
It is understood that according to a variant of the embodiment, the available groupings comprise a group of image parts formed by one of the aforementioned temporal derivation groupings, which is also compared/competed with the other available groupings, for example using a RD cost evaluation similar to that shown in Figure 21.
One other possibility to increase the coding efficiency at encoder side is to test all possible SAO groupings but this increases the computing complexity and the encoding time compared to the example of Figure 29 where only a small subset of the groupings is evaluated.
According to an embodiment, the SAO parameters are shared among the CTUs of a group, and this sharing is achieved by the encoder signalling/providing in the bitstream which derivation of (i.e. grouping for) the SAO parameters is selected (e.g. CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation). A possible indexing scheme is shown in Table 2 below:
Table 2
Because the majority of the derivation index values (values 0 to 5) signal groupings, the derivation index is also referred to as a grouping index hereinafter. Figure 30 illustrates an example of a corresponding decoding process when the CTU grouping is signaled in a slice header, which is described in more detail with reference to a later described embodiment.
Fifth embodiment As discussed above, data or a flag(s) can be used to communicate the grouping selection and/or SAO parameters between an encoder and a decoder. For example, according to an embodiment, data or a flag indicating a grouping index is provided in a slice header. Following two embodiments describe how such data/flags might be used according to embodiments of the present invention. It is understood that according to other embodiments, modifications can be made on how the data/flags are used in these embodiments as long as the encoder and the decoder are able to obtain the same SAO parameters to use with the SAO filtering.
Figures 14A-14B are flow charts illustrating how a determined grouping and/or SAO parameters is communicated according to a fifth embodiment of the present invention.
As shown in Figure 14A, in the fifth embodiment performing SAO filtering according to any one of the aforegoing embodiments further comprises an encoder (after the grouping determination 9100) providing 9150, in a bitstream, data or a flag (e.g. a grouping/group index) indicating the determined grouping and the SAO parameters associated with the determined grouping. For example, the SAO parameters may be provided with data for an image part in the grouping (e.g. the first processed/encoded image part/CTU in the raster scan order).
As shown in Figure 14B, in the fifth embodiment performing SAO filtering according to any one of the aforegoing embodiments further comprises a decoder obtaining 9155, from a bitstream, the SAO parameters associated with the grouping, and performing the SAO filtering 9200 using the obtained SAO parameters to obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
Sixth embodiment
Figures 14C-14D are flow charts illustrating how a determined grouping and/or SAO parameters are communicated according to a sixth embodiment of the present invention.
As shown in Figure 14C, in the sixth embodiment performing SAO filtering according to any one of the aforegoing embodiments further comprises an encoder (after the grouping determination 9100) providing 9170, in a bitstream: data or a flag (e.g. a grouping/group index) indicating a grouping; and data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating SAO parameters for the SAO filtering are inferred from another image part of the image (non-temporal derivation/inference) or of another image (e.g. temporal derivation/inference).
As shown in Figure 14D, in the sixth embodiment performing SAO filtering according to any one of the aforegoing embodiments further comprises a decoder obtaining 9175, from a bitstream, data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating SAO parameters for the SAO filtering are inferred from another image part of the image (non temporal derivation/inference) or of another image (e.g. temporal derivation/inference), and performing the SAO filtering 9200 using the obtained data or flag(s) to infer SAO parameters and obtain filtered (reconstructed) image parts, and hence a filtered (reconstructed) image.
According to a variant of the embodiment, the data or a flag(s) (e.g. a sao merge flag up or sao merege flag left) indicating inferring of the SAO parameters is provided/included in the bitstream when the determined grouping is a particular grouping, e.g. a CTU level, and when the determine grouping is not the particular grouping, the data/flag(s) is not provided/included in the bitstream. In such a case, a decoder first obtains the data/flag(s) (e.g. a grouping/group index) indicating a grouping and determines the grouping using the obtained data/flag(s). The decoder obtains and uses the data/flag(s) indicating inferring of the SAO parameters only when the determined grouping is that particular grouping. If the determined grouping is not that particular grouping, the encoder provides the SAO parameters
in the bitstream, and the decoder obtains the provided SAO parameters without obtaining/using the data/flag(s) indicating inferring of the SAO parameters.
According to a variant of the sixth embodiment, performing SAO filtering further comprises an encoder providing, in a bitstream, data or a flag(s) (e.g. sao merge flags enabled flag) indicating whether either or both of the data indicating a grouping or the data indicating inferring of the SAO parameters is available for use. When the data indicates either or both of the data is not available for use, the unavailable data is not included/provided in the bitstream. A decoder then obtains, from a bitstream, the data (e.g. sao merge flags enabled flag) indicating whether either or both of the data indicating a grouping or the data indicating inferring of the SAO parameters is available for use, and when the data indicates either or both data is available, obtains, from the bitstream, the available data and uses the obtained available data to determine the grouping or infer the SAO parameters. When the data indicates either or both data is not available for use, the decoder does not obtain the unavailable data and obtains, from the bitstream, the SAO parameters associated with the grouping.
It is understood that other combinations of the grouping and/or SAO parameters or indicative/inference data/flag(s) may also be used according to another embodiment of the present invention as long as the grouping and the SAO parameters for the SAO filtering can be determined. Seventh embodiment
As already discussed, a temporal derivation of SAO parameters may be used. Also as discussed, a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1. A group could also be NxN CTUs, where N is an integer greater than 1, or MxN CTUs or NxM CTUs, where M > 1 and N > 1. Alternatively, a group of image parts can be a CTU, and each constituent block of the CTU can be an image part. In such a case, each block of a CTU may have its own SAO parameters, but the signalling to use temporal derivation of the SAO parameters can be made for the CTU as a whole.
In the simplest case, where there is only one type of temporal derivation, a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
The manner in which the SAO parameters are derived in the temporal derivation is not particularly limited except that at least one SAO parameter of an image part belonging to the
group is derived from an SAO parameter of the collocated image part in a reference image. In the simplest case, the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the collocated image part by copying the SAO parameter of the collocated image part. One, more than one, or all SAO parameters may be copied. Alternatively, one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
Instead of copying a temporal derivation of SAO parameters which involves modifying one, more than one, or all SAO parameters of the collocated image part (source image part) may also be used, as described later.
In the seventh embodiment, the group of image parts is a whole image. Referring now to Figure 15 each CTU of a current image 2001 derives its SAO parameters temporally from a collocated CTU in a reference image 2002. For example, the SAO parameters for the CTU 2003 in the current image 2001 are obtained by copying the SAO parameters from its collocated CTU 2005 in the reference image 2002. Similarly, the SAO parameters for the CTU 2004 in the current image 2001 are obtained by copying the SAO parameters from its collocated CTU 2006 in the reference image 2002. In this example, CTU 2005 uses EO filtering with a direction of 0 degrees, and CTU 2006 uses BO filtering. As a result of copying the SAO parameters from CTU 2005 the CTU 2003 also uses EO filtering with a direction of 0 degrees. As a result of copying the SAO parameters from CTU 2006 the CTU 2004 also uses BO filtering. Although not shown in Figure 15, all the SAO parameters are copied in this embodiment, including the SAO type parameter sao ypejdx, parameters such as EO class (specifying a direction of EO filtering) and BO group sao_band jposition (specifying a first class of a group of classes), and offsets.
It can be seen that different CTUs such as CTU 2003 and CTU 2004 within the same CTU group can have different SAO parameters, even though the use of temporal derivation is signalled once for the whole CTU group (whole image in this embodiment).
Figure 16 is a flow chart for use in explaining a method of decoding an image in the seventh embodiment.
In step 2101 a first syntax element is read from the bitstream 2103 and decoded. This first syntax element in this example is a simple temporal merge flag which indicates for the whole image whether or not temporal derivation of SAO parameters is to be used. In step 2102 it is checked if the syntax element indicates temporal derivation is to be used. If the outcome is“YES” at least a second syntax element is extracted from the bitstream. This second syntax element is a reference frame index refidx which identifies a reference image to be used for the
temporal derivation. If bidirectional temporal prediction is used, a third syntax element is extracted from the bitstream 2103. This is a list index Ly indicating whether the reference frame index is from List 0 (L0) or List 1 (Ll). In this embodiment, the same reference frame is used for the temporal derivation of SAO parameters in all CTUs of the group (whole image).
In the context of temporal prediction, a reference image means another image of a sequence of images (previous or future image) which is used to perform temporal prediction for an image to be encoded. In the context of SAO parameter derivation, a reference image means another image of the sequence (previous or future image which is used to perform temporal derivation of SAO parameters. The reference images for the temporal derivation of SAO parameters may be the same as the reference images for the temporal prediction, or may be different.
Incidentally, the HEVC specification uses the term“reference frame” instead of “reference image” and refidx is usually referred to as a reference frame index accordingly. The terms“reference image” and“reference frame” are used interchangeably in the present specification.
A loop through all the CTUs of the image is then started in step 2105. In this embodiment, the decoder has a storage unit 2106, which may be called a Decoded Picture Buffer (DPB), which stores the SAO parameters for each CTU of the reference image. Preferably, the DPB 2106 stores the SAO parameters for each CTU explicitly, without relying on merge flags such as merge_up and mergejeft because reading merge flags as part of the SAO parameters temporal derivation increases the complexity and slows down the derivation.
In step 2107 the SAO parameters stored in the DPB 2106 for the collocated CTU in the reference image identified by refidx or by Ly and refidx are obtained. These are then set as the SAO parameters 2108 for the current CTU. In this embodiment it is assumed that the SAO parameters comprise dedicated SAO parameters for each color component X (Y, U, V) and in steps 2109-2111 SAO filtering is performed for each color component in turn using the dedicated SAO parameters for the color component concerned. As noted above, in other embodiments, the SAO parameters may be common to two or more components, for example U and V, or a mixture of common and dedicated (per-component) SAO parameters may be used. After finishing the processing of the current CTU processing moves to the next CTU, or the processing ends if the current CTU is the last CTU of the image.
It will be appreciated that in step 2107 no SAO parameters will be obtainable from the collocated CTU in the reference image if the collocated CTU did not use SAO filtering (saojypejdx = 0). This situation is addressed in subsequent embodiments. It means that
even when temporal derivation of SAO parameters is applied to a group of image parts in the present invention, there may be some image parts of the group that do not derive an SAO parameter from an SAO parameter of the respective collocated image part. For example, in the simplest case, no SAO filtering may be performed on an image part for which the SAO parameters are unobtainable using the temporal derivation.
Although not shown in Figure 16, if the outcome of the test in step 2102 is that the SAO parameters derivation is not the temporal derivation, the SAO parameters for the CTUs of the group (whole image in this case) are read from the bitstream, for example using the process of Figure 5.
Figure 16 relates to the steps carried out on the decoder side. The steps involve reading and decoding the syntax elements for the group of image parts (whole image in this case) from the bitstream and then performing SAO filtering on the image parts of the group. On the encoder side, the same SAO filtering as on the decoder side is performed on the image parts of the group to ensure that the encoder has the same reference images as the decoder. This means using the same derivation of SAO parameters as the decoder and using the same reference image for the temporal derivation as the decoder. However, on the encoder side the syntax elements do not need to be read and decoded from the bitstream, as the related information is available in the encoder already. The determination of whether or not to use the temporal derivation of SAO parameters for the group (whole image in this case) is made on the encoder side in this embodiment. Similarly, the choice of reference image for the temporal derivation is made on the encoder side.
In a variant of the seventh embodiment, the reference image is simply the first reference image of the first list L0. In that case, no syntax elements are necessary to identify refidx and Ly and step 2104 can be omitted. This removes some signalling and simplifies the decoder design.
Eighth Embodiment
The eighth embodiment relates to an encoding process. In the preceding embodiment a temporal derivation of SAO parameters is applied to a group of image parts. For example, in the seventh embodiment a temporal derivation is applied to a whole image.
In the eighth embodiment, when the temporal derivation is not applied a non-temporal derivation of the SAO parameters is used in which SAO parameters are determined by the encoder for each image part (CTU) and signalled in the bitstream. This may be referred to as
a CTU-level non-temporal derivation of SAO parameters. The decoder reads from the bitstream the first syntax element (e.g. temporal merge flag) and when it indicates temporal derivation is not applied to the group the decoder reads the per-CTU SAO parameters from the bitstream and filters each CTU according to the SAO parameters for the CTU concerned, for example using the decoding process of Figure 5.
In the eighth embodiment the temporal derivation and the CTU-level non-temporal derivation are available derivations and the encoder selects one of them to apply to the group (e.g. frame or slice).
Figure 17 is a flow chart illustrating steps carried out an encoder to determine SAO parameters for the CTUs of a group (frame or slice) in the CTU-level non-temporal derivation of SAO parameters. The process starts with a current CTU (1101). First the statistics for all possible SAO types and classes are accumulated in the variable CTUStats (1102). The process of Step 1102 is described below with reference to Figure 18. According to the value set in the variable CTUStats, the RD cost for the SAO merge Left is evaluated if the Left CTU is in the current Slice (1103) as the RD cost of the SAO Merge UP (1104). Thanks to the statistics in CTUStats (1102), new SAO parameters are evaluated for Luma (1105) and for both Chroma components (1109). (Both Chroma components because the Chroma components share the same SAO type in the HEVC standard). For each SAO type (1006), the best RD offsets and other parameters for Band offset classification are obtained (1107). Steps 1107 and 1110 are explained below for Edge and Band classification with reference to Figure 19 and Figure 20 respectively. All RD costs are computed thanks to their respective SAO parameters (1108). In the same way for both Chroma components, the optimal RD offsets and parameters are selected (1111). All this RD costs are compared in order to select the best SAO parameters set (1115). These RD costs are also compared to disable SAO independently for the Luma and the Chroma components (1113, 1114). The use of a new SAO parameters set (1115) is compared to the SAO parameters set“Merging” or sharing (1116) from the left and up CTU.
Figure 18 is a flow chart illustrating steps of an example of a statistics computed at the encoder side that can be applied for the Edge Offset type filter, in the case of the conventional SAO filtering. The similar approach may also be used for the Band Offset type filter.
Figure 18 illustrates the setting of the variable CTUStats containing all information needed to derive each best rate distortion offsets for each class. Moreover, it illustrates the selection of the best SAO parameters set for the current CTU. For each colour component Y,
U, V (or RGB) (811) each SAO type is evaluated. For each SAO type (812) the variables Sum . and SumNbPix j are set to zero in an initial step 801. The current frame area 803 contains N pixels. j is the current range number to determine the four offsets (related to the four edge indexes shown in Figure 7B for Edge Offset type or to the 32 ranges of pixel values shown in Figure 8 for Band Offset type). Sum j is the sum of the differences between the pixels in the range j and their original pixels. SumNbPix . is the number of pixels in the frame area, the pixel value of which belongs to the range j .
In step 802, a variable i, used to successively consider each pixel Pi of the current frame area, is set to zero. Then, the first pixel Pt of the frame area 803 is extracted in step 804.
In step 805, the class of the current pixel is determined by checking the conditions defined in Figure 7B. Then a test is performed in step 805. During step 805, a check is performed to determine if the class of the pixel value Pt corresponds to the value“none of the above” of
Figure 7B. If the outcome is positive, then the value“i” is incremented in step 808 in order to consider the next pixels of the frame area 803.
Otherwise, if the outcome is negative in step 806, the next step is 807 where the related SumNbPix . (i.e. the sum of the number of pixels for the class determined in step 805) is incremented and the difference between Pt and its original value P·’"' is added to Sum j . In the next step 808, the variable i is incremented in order to consider the next pixels of the frame area 803.
Then a test is performed to determine if all pixels have been considered and classified. If the outcome is negative, the process loops back to step 804 described above. Otherwise, if the outcome is positive, the process proceeds to step 810 where the variable CTUStats for the current colour component X and the SAO type SAO type and the current class j are set equal to Sum j for the first value and SumNbPix . for the second value. These variables can be used to compute for example the optimal offset parameter Offset . of each
class j. This offset Offset . may be the average of the differences between the pixels of class j and their original values. Thus, Offset . is given by the following formula:
Note that the offset Offset . is an integer value. As a consequence, the ratio defined in this formula may be rounded, either to the closest value or using the ceiling or floor function.
Each offset Offset . is an optimal offset Ooptj in terms of distortion
To evaluate an RD cost for a merge of SAO parameters, the encoder uses the statistics set in table CTUCStats. According to the following examples for the SAO Merge Left and by considering the type for Luma Left Type Y and the four related offsets O Left O, O Left l, 0_Left_2, 0_Left_3, the distortion can be obtained by the following formula:
Distortion Left Y =
(CTUStats[Y][ Left_Type_Y][0][l] x O Left O x O Left O - CTUStats[Y][ Left_Type_Y][0][0] x O Left O x 2)» Shift
+ (CTUStats[Y][ Left_Type_Y][l][l] x O Left l
O Left l - CTUStats[Y][ Left_Type_Y][l][0] x O Left l x 2)» Shift
CTUStats[Y][ Left_Type_Y][2][0] x 0_Left_2 x 2)» Shift
+ (CTUStats[Y][ Left_Type_Y][3][l] x 0_Left_3 x O Left 3 CTUStats[Y][ Left_Type_Y][3][0] x 0_Left_3 x 2)» Shift The variable Shift is designed for a distortion adjustment. The distortion should be negative as SAO is a post filtering.
The same computing is applied for Chroma components. The Lambda of the Rate distortion cost is fixed for the three components. Lor an SAO parameters merged with the left CTU, the rate is only 1 flag which is CABAC coded. The encoding process illustrated in Figure 19 is applied in order to find the best offset in terms of rate distortion criterion, offset referred to as ORDj. This process is applied in steps 1109 to 1112.
In an initial step 901 of the encoding process of Figure 19, the rate distortion value Jj is initialized to the maximum possible value. Then a loop on Oj from Ooptj to 0 is applied in step 902. Note that Oj is modified by 1 at each new iteration of the loop. If Ooptj is negative, the value Oj is incremented and if Ooptj is positive, the value Oj is decremented. The rate distortion cost related to Oj is computed in step 903 according to the following formula:
J(Oj)= SumNbPix x Oj x Oj - Sumj x Oj x 2 + l R(Oj) where l is the Lagrange parameter and R(Oj) is a function which provides the number of bits needed for the code word associated with Oj.
Formula‘SumNbPix ) x Oj x Oj - Sumj x Oj x 2’ gives the improvement in terms of the distortion provided by the use of the offset Oj. If J(Oj) is inferior to Jj then Jj = J(Oj) and ORDj is equal to Oj in step 904. If Oj is equal to 0 in step 905, the loop ends and the best ORDj for the class j is selected.
This algorithm of Figures 18 and 19 provides a best ORDj for each class j. This algorithm is repeated for each of the four directions of Figure 7A. Then the direction that provides the best rate distortion cost (sum of Jj for each direction) is selected as the direction to be used for the current CTU.
This algorithm (Figures 18 and 19) for selecting the offset values at the encoder side for the Edge offset tool can be easily applied to the Band Offset filter to select the best position (SAO_band_position) where j is in the interval [0,32] instead of the interval [1,4] in Figure 18. It involves changing the value 4 to 32 in modules 801, 810, 811. More specifically, for the 32 classes of Figure 8, the parameter Sumj (j=[0,32]) is computed. This corresponds to computing for each range j, the difference between the current pixel value (Pi) and its original value (Porgi), each pixel of the image belonging to a single range j. Then the best offset in terms of rate distortion ORDj is computed for the 32 classes, with the same process as described in Figure 19.
The next step involves finding the best position of the S AO band position of Figure 8. This is determined with the encoding process set out in Figure 20. The RD cost Jj for each range has been computed with the encoding process of Figure 19 with the optimal offset ORDj in terms of rate distortion. In Figure 20, in an initial step 1001 the rate distortion value J is initialized to the maximum possible value. Then a loop on the 28 positions j of 4 consecutive classes is run in step 1002. Next, the variable Jj corresponding to the RD cost of the band (of 4
consecutive classes) is initialized to 0 in step 1003. Then the loop on the four consecutive offset j is run in step 1004. Ji is incremented by the RD costs of the four classes Jj in step 1005 (j=i to i+4).
If this cost Ji is inferior to the Best RD cost J, J is set to Ji, and sao_band _position = i in step 1007, and the next step is step 1008.
Otherwise, the next step is step 1008.
Test 1008 checks whether or not the loop on the 28 positions has ended. If not, the process continues in step 1002, otherwise the encoding process returns the best band position as being the current value of sao_band _position 1009.
Thus, the CTUStats table in the case of determining the SAO parameters at the CTU level is created by the process of Figure 17. This corresponds to evaluating the CTU level in terms of the rate-distortion compromise. The evaluation may be performed for the whole image or for just the current slice.
A further evaluation is carried out for the temporal derivation. Again, this temporal derivation may apply to the whole image or just to the current slice. Figure 21 shows the RD cost evaluation of temporal derivation at Slice level. First the distortion for the current colour component X is set equal to 0 (1601). For each CTU number nbCTU from 0 to LastCTU (1602), the temporal SAO parameters set of the collocated CTU in a reference frame (Ly, refidx) (1605) is extracted (1604) from the DPB (1603). If the SAO parameters set (1605) is equal to OFF (No SAO), the next CTU is processed (1610). Otherwise, for each in turn of the four offsets (1607), the distortion Distortion TEMPORAL X is incremented by an amount equal to the associated distortion of the offset Oi (1609). This is the same process as the RD cost evaluation for a merge of SAO parameters as described previously. Please note that sao_band _position is set equal to 0 when the SAO type is equal to an Edge type. When the distortion of all offsets have been added to Distortion TEMPORAL X (1608), the next CTU is processed (1610). When the number of CTU nbCTU is equal to the lastCTU (1610), the RDCost for the temporal mode at Slice level, for component X, is set equal to the sum of this computed distortion Distortion TEMPORAL X and l multiplied by the rate for this temporal mode at Slice level (1611). This rate is equal to the rate of the signalling of temporal mode plus, if needed, the rate of the reference frame index refidx and if needed plus the rate of the list Ly.
The two evaluations are then compared and the one with the best performance is selected. The selected derivation (temporal or CTU level) is then signalled to the decoder in the bitstream, for example using the first syntax element as described in connection with the seventh embodiment. Figure 22 illustrates the competition between the CTU level for SAO and for temporal derivation at encoder side. The current slice/frame 1901 is used to set the CTUStats table (1903) for each CTU (1902). This table (1903) is used to evaluate the CTU level derivation (1904) and the temporal derivation for the whole slice (1915) as described previously in Figure 21. This table (1903) is also used to evaluate several reference frames for temporal derivation. The best derivation for the slice is selected according to the rate distortion criterion computed for each available derivation (1910). The SAO parameters sets for each CTU are set (1911) according to the derivation selected in step 1910. These SAO parameters are then used to apply the SAO filtering (1913) in order to obtain the filtered frame/slice. The selected derivation may be signalled in the slice header, for example using a syntax element indicating temporal derivation (which the decoder reads, see 2101 and 2201 in Figures 13 and 14).
Ninth Embodiment
In the eighth embodiment the temporal derivation was put into competition with one alternative non-temporal method of deriving the SAO parameters. In the ninth embodiment two alternative methods are in competition with the temporal derivation.
Figure 23 shows various different groupings 1201-1206 of CTUs in a slice.
A first grouping 1201 has individual CTUs. This first grouping requires one set of SAO parameters per CTU. It corresponds to the CTU-level derivation in the eighth embodiment.
A second grouping 1202 makes all CTUs of the entire image one group. Thus, in contrast to the CTU-level derivation, all CTUs of the frame (and hence the slice which is either the entire frame or a part thereof) share the same SAO parameters.
To make all CTUs of the image share the same SAO parameters one of two methods can be used. In both methods, the encoder first computes a set of SAO parameters to be shared by all CTUs of the image. Then, in the first method, these SAO parameters are set for the first CTU of the slice. For each remaining CTU from the second CTU to the last CTU of the slice,
the sao_merge_left flag is set equal to 1 if the flag exists (that is, if the current CTU has a left CTU). Otherwise, the sao_merge_up flag is set equal to 1. Figure 24 shows an example of CTUs with SAO parameters set according to the first method. This method has the advantage that no signalling of the grouping to the decoder is required. Also, no changes to the decoder are required to introduce the groupings and only the encoder is changed. The groupings could therefore be introduced in an encoder based on HEVC without modifying the HEVC decoder. Surprisingly, groupings do not increase the rate too much. This is because the merge flags are generally CAB AC coded in the same context. Since for the second group (entire image) these flags all have the same value (1), the rate consumed by these flags is very low. This follows because they always have the same value and the probability is 1.
In the second method of making all CTUs of the image share the same SAO parameters, the grouping is signalled to the decoder in the bitstream. The SAO parameters are also signalled as SAO parameters for the group (whole image), for example in the slice header. In this case, the signalling of the grouping consumes bandwidth. However, the merge flags can be dispensed with, saving the rate related to the merge flags, so that overall the rate is reduced.
The first and second groupings 1201 and 1202 provide very different rate-distortion compromises. The first grouping 1201 is at one extreme, giving very fine control of the SAO parameters (CTU by CTU), which should lower distortion, but at the expense of a lot of signalling. The second grouping is at the other extreme, giving very coarse control of the SAO parameters (one set for the whole image), which raises distortion but has very light signalling.
Next, a description will be given of how to determine in the encoder the SAO parameters for the second grouping 1202. In the second grouping 1202 the determination is done for a whole image and all CTUs of the slice/frame share the same SAO parameters.
Figure 25 is an example of the setting of SAO parameters for a frame/slice level using the first method of sharing SAO parameters (i.e. without new SAO classifications at encoder side). This figure is based on Figure 17. At the beginning of the process, the CTUStats table is set for each CTU (in the same way as the CTU level encoding choice). This CTUStats can be used for the traditional CTU level (1302). Then the table FrameStats is set by adding each value for all CTUs of the table CTUStats (1303). Then the same process as for CTU level is applied to find the best SAO parameters (1305 to 1315). To set the SAO parameters for all CTUs of the frame, the selected SAO parameters set at step 1315 is set for the first CTU of the slice/frame. Then for each CTU from the second CTU to the last CTU of the slice/frame, the sao_merge_left_flag is set equal to 1 if it exists otherwise the sao_merge_up_flag is set equal to 1 (indeed for the second CTU to the last CTU a merge Left or Up or both exist) (1317). The
syntax of the SAO parameters set is unchanged from that presented in Figure 9. At the end of the process the SAO parameters are set for the whole slice/frame.
Thus, the CTUStats table in the case of determining the SAO parameters for the whole image (frame level) is created by the process of Figure 25. This corresponds to evaluating the frame level in terms of the rate-distortion compromise.
As described previously in connection with the eighth embodiment, the encoder also evaluates the CTU level non-temporal derivation and the temporal derivation in terms of their respective rate-distortion compromises. Each evaluation is performed for the whole image in this case. The three evaluations are then compared and the one with the best performance is selected. The selected derivation (temporal or CTU level or frame level) is then signalled to the decoder in the bitstream.
The signalling of the selected derivation can be made in many different ways. For example, a grouping index can be signalled. The first syntax element can then still be used to signal whether the SAO parameters for all CTUs of the slice are derived temporally or not (e.g. temporal merge flag), supplemented by the grouping index in the case when temporal derivation is not used. For example, the CTU level may have grouping index 0 and the frame level may have grouping index 1. Alternatively, the first syntax element may be adapted to signal everything, for example CTU level and frame level may have index 0 and index 1 respectively and temporal derivation may have another index such as 2. In this case, in Figures 21, 22 and 24 the first syntax element is changed accordingly.
The example of determining the SAO parameters in Figure 25 corresponds to the first method of sharing SAO parameters as it uses the merge flags to share the SAO parameters among all CTUs of the image (see steps 1316 and 1317). These steps can be omitted if the second method of sharing SAO parameters is used.
Incidentally, if the second method is used, and merge flags are not used within the group (image), the process of Figure 17 should be modified appropriately, in particular to not evaluate the RD costs in 1103 and 1104.
Tenth Embodiment
In the eighth embodiment the CTU-level non-temporal derivation is in competition with the temporal derivation. In the tenth embodiment the CTU-level non-temporal derivation is not available and instead the frame-level non-temporal derivation is in competition with the temporal derivation.
Eleventh Embodiment
The CTU and Frame levels used in the ninth embodiment offer extreme rate-distortion compromises. It is also possible to include other groupings intermediate between the CTU and frame levels which can offer other rate-distortion compromises. Referring again to Figure 15 a third grouping 1203 makes a column of CTUs a group as in the third embodiment.
Figure 26 is an example of the setting of SAO parameters sets for the third grouping 1203 at the encoder side. This Figure is based on Figure 17. To reduce the amount of steps in the figure, the modules 1105 to 1115 have been merged in one step 1405 in this Figure 26. At the beginning of the process, the CTUStats table is set for each CTU. This CTUStats can be used for the traditional CTU level (1302) encoding choice. For each column (1403) of the current slice/frame, the table ColumnStats is set by adding each value (1405) from CTUStats (1402), for each CTUs of the current column (1404). Then the new SAO parameters are determined as for CTU level (1406) encoding choice (cf. Figure 17). If it is not the first column, the RD cost to share the SAO parameters with the previous left column is also evaluated (1407), in the same way as the sharing of SAO parameters set between left and up CTU (1103, 1104) is evaluated. If the sharing of SAO parameters gives a better RD cost (1408) than the RD cost for the new SAO parameters set, the sao merge left flag is set equal to 1 for the first CTU of the column. This CTU has the address number equal to the value“Column”. Otherwise, the SAO parameters set for this first CTU of the column is set equal (1409) to the new SAO parameters obtained in step 1406.
For all other CTUs of the column (1411), their SAO merge Feft sao merge left flag is set equal to 0 if it exists and the SAO merge up sao merge up flag is set equal to 1. Then the SAO parameters set for the next column can be processed (1403). Please note that, except for the first line of CTU all other CTUs of the frame have the sao merge left flag equall to 0 if it exists and the sao merge up flag equals to 1. So, step 1412 can be processed once per frame.
The advantage of this CTU grouping is another RD compromise between the CTU level encoding choice and the frame level which can be useful for some conditions. Also, in this example, merge flags are used within the group, which means that the third grouping can be introduced without modifying the decoder (i.e. the grouping can be HE VC-compliant).
In one variant, the Merge between columns doesn’t need to be checked. It means that steps 1407 1408 1410 are removed from the process of Figure 26. The advantage of removing this possibility is a simplification of the implementation and the ability to parallelize the process. This has a small impact on coding efficiency.
Another possible compromise intermediate between the CTU level and the frame level can be offered by a fourth grouping 1204 in Figure 23 which makes a line of CTUs a group. To determine the SAO parameters for this fourth grouping, a similar process to that of Figure 25 can be applied. In that case, the variable ColumnStats is changed by LineStats. The step 1403 is replaced by“For Line = 0 to Num CTU in Height”. The step 1404 is replaced by “For CTU_in_line= 0 to Num CTU in Width”. Step 1405 by ColumnStats[][][][] += CTUStats[Line* Num CTU in Width + CTU in line] [][][][]. The New SAO parameters and the merge with the up CTU is evaluated based on this LineStats table (steps 1406 1407). The step 1410 is replaced by setting of sao merge up flag to 1 for the first CTU of the Line. And for all CTUs of the slice/frame except each first CTU of each Line, sao merge left flag is set equal to 1.
The advantage of the line is another RD compromise between the CTU level and Frame level. Please note that the frame or slice are most of the time rectangles and their width is larger than their height. So the line CTUs grouping 1204 is expected to be an RD compromise closer to the frame CTU grouping 1202 than the column CTU grouping 1203.
As for the other CTU groupings 1202 and 1203, the line CTU grouping can be HE VC compliant if the merge flags are used within the groups.
As for the column CTU grouping 1203 the evaluation of merging 2 lines can be removed.
Further RD compromises can be offered by putting two or more columns of CTUs or two or more lines of CTUs together as a group. The process of Figure 25 can be adapted to determine SAO parameters to such groups.
In one embodiment, the number N of columns or lines in a group may depend on the number of groups that are targeted.
The use of several columns or lines for the CTU groupings may be particularly advantageous when the slices or frames are large (for HD, 4K or beyond).
As described previously, in one variant, the merge between these groups containing two or more columns or two or more lines doesn’t need to be evaluated.
Another possible grouping includes split columns or split lines, where the split is tailored to the current slice/frame.
Another possible compromise between the CTU level and the frame level can be offered by square CTU groupings 1205 and 1206 as illustrated in Figure 23. The grouping 1205 makes 2x2 CTUs a group. The grouping 1206 makes 3x3 CTUs a group (one example of the second embodiment).
Figure 27 shows an example of how to determine the SAO parameters for such groupings. For each NxN group (1503), the table NxNStats (1507) is set (1504, 1505, 1506) based on CTUstats. This table is used to determine the New SAO parameters (1508) and its RD cost, in addition to the RD cost for a Left (1510) sharing or Up (1509) sharing of SAO parameters. If the Best RD cost is the new SAO parameters (1511), the SAO parameters of the first CTU (top left CTU) of the NxN group is set equal to this new SAO parameters (1514). If the best RD cost is the sharing of SAO parameters with the up NxN group (1512), the sao merge up flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 and the sao merge left flag to 0 (1515). If the best RD cost is the sharing of SAO parameters with the left NxN group (1513), the sao_merge_left_flag of the first CTU (Top left CTU) of the NxN group is set equal to 1 (1516). Then the sao_merge_left_flag and sao_merge_up_flag are set correctly for the other CTUs of the NxN group in order to form the SAO parameters for the current NxN group (1517). Figure 28 illustrates this setting for a 3x3 SAO group. The top left CTU is set equal to the SAO parameters determined in step 1508 to 1516. For the 2 other top CTUs, the sao_merge_left_flag is set equal to 1. As the sao_merge_left_flag is the first flag encoded or decoded and as it is set to 1, there is no need to set the sao merge up flag to 0. For the 2 other CTUs in the first row, the sao merge left flag is set equal to 0 and sao merge up flag is set equal to 1. For the other CTUs, the sao merge left flag is set equal to 1.
The advantage of the NxN CTU groupings is to create several RD compromises for SAO. As for the other groupings, these groupings can be HEVC compliant if merge flags within the groups are used. As for the other groupings, the test of Merge left and Merge up between groups can be dispensed with in Figure 27. So steps 1509, 1510, 1512, 1513, 1515 and 1516 can be removed, especially when N is high.
In one variant, the value N depends on the size of the frame/slice. The advantage of this embodiment is to obtain an efficient RD compromise.
In a preferred variant, only N equal to 2 and 3 are evaluated. This offers an efficient compromise.
The possible groupings are in competition with one another and with the temporal derivation as the SAO parameter derivation to be selected for the current slice. Figure 29 illustrates an example of how to select the SAO parameter derivation using a rate-distortion compromise comparison.
In this example, the first method of sharing SAO parameters among the CTUs of a group is used. Accordingly, merge flags are used within groups. If applied to HEVC, the resulting bitstream can be decoded by an HE VC-compliant decoder.
The current slice/frame 1701 is used to set the CTUStats table (1703) for each CTU
(1702). This table (1703) is used to evaluate the CTU level (1704), the temporal derivation (1715), the frame/ Slice Grouping (1705), the Column grouping (1706), the line grouping (1707), the 2x2 CTUs grouping (1708) or 3x3 CTU grouping (1709) or all other described CTUs groupings as described previously. The best derivation (a non-temporal derivation with a CTU grouping or the temporal derivation) is selected according to the rate distortion criterion computed for each available derivation (1710). The SAO parameters sets for each CTU are set (1711) according to the derivation selected in step 1710. These SAO parameters are then used to apply the SAO filtering (1713) in order to obtain the filtered frame/slice.
The second method of sharing SAO parameters among the CTUs of the CTU grouping may be used instead of the first method. Both methods have the advantage of offering a coding efficiency increase. A second advantage, obtained when the first method is used but not when the second method is used, is that this competition method doesn’t require any additional SAO filtering or classification. Indeed, the main impacts on encoder complexity are the step 1702 which needs SAO classification for all possible SAO type and the step 1713 which filters the samples. All other CTU groupings evaluations are only some additions of values already obtained during the CTU level encoding choice (set in the table CTUStats).
One other possibility to increase the coding efficiency at encoder side is to test all possible SAO groupings but this should increase the encoding time compared to the example of Figure 29 where a small subset of groupings is evaluated.
As mentioned just now, it is also possible to use the second method of sharing SAO parameters among the CTUs of a group. In this case, the encoder signals in the bitstream which derivation of the SAO parameters is selected (CTU level, frame level, column, line, 2x2 CTUs, 3x3 CTUs, temporal derivation). A possible indexing scheme is shown in Table 2 below:
Table 2
Because the majority of the derivation index values (values 0 to 5) signal groupings, the derivation index is also referred to as a grouping index hereinafter.
Figure 30 is a flow chart illustrating a decoding process when the CTU grouping is signaled in the slice header according to the second method of sharing SAO parameters among the CTUs of the group. First the flag SaoEnabledFlag is extracted from the bitstream (1801). If SAO is not enabled, the next slice header syntax element is decoded (1807) and SAO will not be applied to the current slice. Otherwise the decoder extracts N bits form the slice header (1803). N depends on the number of available CTUs groupings. Ideally the number of CTUs groupings should be equal to 2 power of N. The corresponding CTUs grouping index (1804) is used to select the CTUs grouping method (1805). This grouping method will be applied to extract the SAO syntax and to determine the SAO parameters set for each CTU (1806). Then the next slice header syntax element is decoded. If the CTU grouping index (1804) corresponds to the temporal derivation, other parameters can be extracted from the bitstream such as the reference frame index and/or other parameters necessary for the temporal derivation.
The advantage of the signalling at slice header of the CTUs grouping is its low impact on the bitrate.
But when the number of slices is significant for a frame, it may be desirable to reduce this signalling. So, in one variant, the CTUs grouping index uses a unary max code in the slice header. In that case, the CTUs groupings are ordered according to their probabilities of occurrences (highest to lowest).
In the eleventh embodiment, at least one non-temporal derivation is an intermediate level derivation (SAO parameters not at CTU level or at group level). When applied to a group it causes the group (e.g. frame or slice) to be subdivided into subdivided parts (CTU groupings
1203-1206, e.g. columns of CTUs, lines of CTUs, NxN CTUs, etc.) and derives SAO parameters for each of the subdivided parts. Each subdivided part is made up of two or more said image parts (CTUs). The advantage of the intermediate level derivation(s) is introduction of one or more effective rate-distortion compromises. The intermediate level derivation(s) can be used without the CTU-level derivation or without the frame-level derivation or without either of those two derivations.
Twelfth Embodiment In the ninth embodiment the temporal derivation is in competition with CTU level derivation and the frame level derivation. The twelfth embodiment builds on this and adds one or more of the intermediate groupings so that the competition includes CTU level, frame level, one or more groupings intermediate between the CTU and frame levels, and the temporal derivation.
Thirteenth Embodiment
In the eighth embodiment the temporal derivation is in competition with CTU level derivation but not the frame level derivation. The thirteenth embodiment builds on this and adds one or more NxN CTU groups so that the competition includes CTU level, one or more NxN CTU groups, and the temporal derivation.
Fourteenth Embodiment In the eighth embodiment the temporal derivation is in competition with CTU level derivation but not the frame level derivation. The eighth embodiment builds on this and adds the third grouping 1203 (column of CTUs) or the fourth grouping 1204 (line of CTUs) or both the third and fourth groupings 1203 and 1204. The competition therefore includes CTU level, the third and/or fourth grouping, and the temporal derivation.
The ninth and eleventh to fourteenth embodiments each promote diversity for the SAO parameter derivation to be applied to a group by making at least first and second said non temporal derivations available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level
different from the first level. The levels may any two levels from the frame level to a CTU level. The levels may correspond to the groupings 1201-1206 in Figure 23.
Fifteenth Embodiment
In the eighth to fourteenth embodiments, the smallest grouping is the first grouping 1201 in which each CTU is a group and there is one set of S AO parameters per CTU. However, in the fifteenth embodiment, a set of SAO parameters can be applied to a smaller block than the CTU. In this case, the non-temporal derivation is not at the CTU level, frame level or an intermediate level between the CTU and frame levels but at a sub-CTU level (a level smaller than an image part).
In this case, instead of signalling a grouping it is effective to signal an index representing a depth of the SAO parameters. Table 3 below shows one example of a possible indexing scheme:
Table 3
The index 0 means that each CTU is divided into 16 blocks and each may have its own SAO parameters. Index 1 means that each CTU is divided into 4 blocks, again each having its own SAO parameters.
These different depths of SAO parameters are put in competition with the temporal derivation and the encoder selects one derivation (either temporal derivation or non-temporal derivation at one of available depths). The selection may be based on a RD comparison.
The selected derivation is then signalled to the decoder in the bitstream. The signalling may comprise a temporal/non-temporal syntax element plus a depth syntax element (e.g. using the indexing scheme above). Alternatively, a combined syntax element may be used to signal
temporal/non-temporal and the depth. Temporal derivation could be assigned index 6 ,for example, with the non-temporal derivations having index 0-5.
In the fifteenth embodiment, at least one non-temporal derivation when applied to a group causes the group to be subdivided into subdivided parts and derives SAO parameters for each of the subdivided parts, and each image part is made up of two or more said sub-divided parts.
In the fifteenth embodiment, as in the ninth and eleventh to fourteenth embodiments, at least first and second said non-temporal derivations are available, the first non-temporal derivation when applied to a group causing the group to have SAO parameters at a first level, and the second non-temporal derivation when applied to a group causing the group to have SAO parameters at a second level different from the first level. The levels may any two levels from the frame level to a sub-CTU level. The levels may correspond to the groupings 1201- 1206 in Figure 23. Sixteenth Embodiment
According to an embodiment, the selected derivation of the SAO parameters may be signalled for a slice, which means that the temporal derivation (when selected) is used for all CTUs of the slice. According to yet another embodiment, the available non-temporal derivations include derivations having SAO parameters at different levels (depths) lower than the slice or frame level. However, in such cases it is not possible to determine at the CTU level (or at the chosen level of the SAO parameters) whether to use temporal derivation or not.
In the sixteenth embodiment, the SAO parameters derivation is modified so that a temporal derivation at the CTU level is available, rather than only a temporal derivation at the group level. The temporal derivation at the CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts. For example, the competition is between the 3x3 grouping and a group using the temporal derivation at CTU level.
For example, in one implementation, first a level of the SAO parameters is selected for a slice or frame, which may include the CTU level. Then, when the CTU level is selected it is selected whether to use a temporal derivation or non-temporal derivation for each CTU of the slice or frame.
Also, when the selected level of the SAO parameters for a slice is an intermediate level between the slice level and the CTU level, a temporal derivation or non-temporal derivation
may be selected per CTU group (e.g. each column of CTUs) of the slice or frame. In this case, the temporal derivation does still apply to a group of two or more CTUs (image parts). One or more CTU groups within the slice may then use temporal derivation (with each CTU deriving an SAO parameter from a collocated CTU of a reference image), whilst other CTU groups use a non-temporal derivation. In this case, the benefit of selecting between temporal and non temporal SAO parameter derivation per CTU group is achieved in addition to the benefit of applying the temporal derivation on a group basis. This is illustrated in Figure 31 for a 2x2 CTU grouping (grouping 1205 in Figure 23). This is also applicable to a 3x3 CTU grouping.
In Figure 31 the SAO merge flags are usable between groups of the CTUs grouping. As depicted in Figure 31, for the 2x2 CTU grouping, the SAO merge Left and SAO merge up are kept for each group of 2x2 CTUs. But they are removed for CTUs inside the group. Please note that only the saojnergejeft Jlag is used for the grouping 1203 of a column of CTUs, and only the sao_merge_up Jlag is used for the grouping 1204 of a line of CTUs.
In a variant, a flag signals if the current CTU group shares its SAO parameters or not. If it is true, a syntax element representing one of the previous groups is signalled. So each group of a slice can be predicted by a previous group except the first one. This improves the coding efficiency by adding several new possible predictors.
Seventeenth Embodiment
In the sixteenth embodiment a depth of the SAO parameters was selected for a slice, including depths smaller than a CTU, making it possible to have a set of SAO parameters per block in a CTU. However, when the use of temporal derivation was selected no depth could be selected and all CTUs of the slice had to use temporal derivation.
In the seventeenth embodiment, the SAO parameters derivation is modified so that a depth is selected for the slice and then it is selected for an image part at the selected depth whether or not to use temporal derivation. The depths may be the ones in Table 3.
In the sixteenth embodiment, the SAO parameters derivation is modified so that a temporal derivation at the sub-CTU level is available, rather than only a temporal derivation at the group level. The temporal derivation at the sub-CTU level is not applied to a group of image parts as in the previous embodiments. However, this temporal derivation is in competition with a temporal derivation applied to a group of image parts.
For example, in one implementation, first a level of the SAO parameters is selected for a slice or frame, which may include the sub-CTU level. Then, when the sub-CTU level is
selected it is selected whether to use a temporal derivation or non-temporal derivation for each block of the slice or frame.
Also, when the selected level of the SAO parameters for a slice is an intermediate level between the slice level and the block level, a temporal derivation or non-temporal derivation may be selected per CTU or per CTU group (e.g. each column of CTUs) of the slice or frame. In this case, the temporal derivation does still apply to a group of two or more blocks (image parts). One or more CTUs or CTU groups within the slice may then use temporal derivation (with each block deriving an SAO parameter from a collocated block of a reference image), whilst other CTUs or CTU groups use a non-temporal derivation. In this case, the benefit of selecting between temporal and non-temporal SAO parameter derivation per CTU or CTU group is achieved in addition to the benefit of applying the temporal derivation on a CTU or CTU group basis.
In the seventeenth embodiment and in the sixteenth embodiment one possibility is to remove the SAO merge flags for all levels. It means that steps 503 504 505 506 of Figure 9 are removed. The advantage is that it reduces significantly the signalling of SAO and consequently it reduces the bitrate. Moreover, it simplifies the design by removing 2 syntax elements at CTU level.
In one variant, the merge flags are kept for CTU level but removed for all other CTU groupings. The advantage is a flexibility of the CTU level.
In another variant, the merge flags are used for CTU when the SAO signalling is lower or equal to the CTU level (1/16 CTU or ¼ CTU) and removed for other CTUs groupings having larger groups.
The merge flags are important for small block sizes because a SAO parameters set is costly compared to the amount of samples that it can improve. In that case, these syntax elements reduce the cost of SAO parameters signalling. For large groups, the SAO parameters set is less costly so the usage of merge flags is not efficient. So the advantage of these embodiments is a coding efficiency increase.
In another variant, the level where the SAO merge flags are enabled is explicitly signalled in the bitstream. For example, a flag indicates if the SAO merge flags are used or not. The flag may be included after the index of the CTUs grouping (or the depth) in the slice header.
This offers to the encoder to efficiently select the usage of SAO Merge flags or not.
Eighteenth Embodiment
In the eighth to fifteenth embodiments there is competition between the temporal derivation and at least one alternative derivation method not using temporal derivation. Similarly, in the sixteenth and seventeenth embodiments there is competition between groupings or depths, with temporal derivation being possible for each grouping or depth. Whilst such competition is useful in identifying an efficient SAO parameters derivation for the slice or frame, it can place quite a big burden on the encoder which has to perform an evaluation for each candidate derivation. This burden may be undesirable, especially for a hardware encoder.
Accordingly, in the eighteenth embodiment, the competition between the different permitted derivations (e.g. in the eighth embodiment the competition between non-temporal derivation at the CTU level and temporal derivation) is modified so that only one derivation is permitted in the encoder for any given slice or frame. The permitted derivation may be determined in dependence upon one or more characteristics of the slice or frame. For example, the permitted derivation may be selected based on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP). As a result, for certain slices or frames, only temporal derivation is permitted, while for other slices or frames only non-temporal derivation is permitted, for example non-temporal derivation at the CTU level. For example, the Intra Frames and the Inter frames at the highest position in the hierarchy of the GOP structure or with the low QP may be permitted only to use the CTU level. And the other frames which have lower positions in the GOP hierarchy or a high QP may be permitted only to use temporal derivation. The different parameters can be set depending on the rate distortion compromise. The advantage of this embodiment is a complexity reduction. Instead of evaluating two or more competing derivations just one derivation is selected, which can be useful for a hardware encoder.
Thus, in the eighteenth embodiment a first derivation is associated with first groups of the image (e.g. Intra slices) and a second derivation is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, the first derivation is used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, the second derivation is used to filter the image parts of the group. Evaluation of the two derivations is not required.
Whether a group to be filtered is determined to be a first group or a second group may depend on one or more of:
a slice type;
a frame type of the image to which the group to be filtered belongs;
a position in a quality hierarchy of a Group of Pictures of the image to which the group to be filtered belongs;
a quality of the image to which the group to be filtered belongs; and
a quantisation parameter applicable to the group to be filtered.
For example, when the first groups have a higher quality or higher position in the quality hierarchy than the second groups, the first derivation may have fewer image parts per group than the second derivation. Nineteenth Embodiment
In the eigtheenth embodiment a particular derivation of the SAO parameters was selected for a given slice or frame. However, if the encoder has the capacity to evaluate a limited number of competing derivations, it is unnecessary to eliminate the competition altogether. In the fourteenth embodiment, the competition for a given slice or frame is still permitted but the set of competing derivations is adapted to the slice or frame. For example, the set of competing derivations may depend on the slice type (Intra, Inter P, Inter B), quantization level (QP) of the slice, or position in the hierarchy of a Group of Pictures (GOP).
The set of competing derivations may depend on the slice type.
For Intra slices, the set preferably contains groupings with groups containing small numbers CTUs (e.g. CTU level, 2x2 CTU, 3x3 CTU, and Column). Also, if depths lower than a CTU are available (as in the tenth embodiment), these depths are preferably also included. Of course, the temporal derivation is not used.
For Inter slices, the set of derivations preferably contains groupings with groups containing large numbers of CTUs such as Fine, Frame level, and the temporal derivation. However, smaller groupings can also be considered down to the CTU level.
The advantage of this embodiment is a coding efficiency increase thanks to the use of derivations adapted for a slice or frame.
In one variant, the set of derivations can be different for an Inter B slice from that for an Inter P slice.
In another variant, the set of competing derivations depends on the characteristics of the frame in the GOP. This is especially beneficial for frames which vary in quality (QP) based on a quality hierarchy. For the frames with the highest quality or highest position in the hierarchy, the set of competing derivations should include groups containing few CTUs or even
sub-CTU depths (same as for Intra slices above). For frames with a lower quality or lower position in the hierarchy, the set of competing derivations should include groups with more CTUs.
The set of competing derivations can be defined in the sequence parameters set.
Thus, in the nineteenth embodiment a first set of derivations is associated with first groups of the image (e.g. Intra slices) and a second set of derivations is associated with second groups of the image (e.g. Inter P slices). It is determined whether a group to be filtered is a first group or a second group. If it is determined that the group to be filtered is a first group, a derivation is selected from the first set of derivations and used to filter the image parts of the group, and if it is determined that the group to be filtered is a second group, a derivation is selected from the second set of derivations and used to filter the image parts of the group. Evaluation of derivations not in the associated set of derivations is not required.
Whether a group to be filtered is a first group or a second group may be determined in the preceding embodiment. For example, when the first groups have a higher quality or higher position in the quality hierarchy than the second groups, the first set of derivations may have at least one derivation with fewer image parts per group than the derivations of the second set of derivations.
Twentieth Embodiment
In the preceding embodiments, the temporal derivation involves simply copying SAO parameters from a collocated CTU (or from a collocated block within a CTU if SAO parameters at the block level are used). In a video, there are generally background and moving objects. When comparing a frame to its following frames, a large part can be static. When the SAO temporal derivation is applied on this static part for several consecutive frames, the SAO filtering should filter nothing, especially for edge offset. As a result, the temporal derivation will not be selected.
To solve this problem and increase the coding efficiency of the temporal derivation, in the twentieth embodiment the set of SAO parameters from the previous frame is changed according to some defined rules. Figure 32 is an example of an algorithm to produce such a modification of the set of SAO parameters. In this example, a 90° rotation is applied to the edge classification. If sao_eo_class_Luma or sao_eo_class_Chroma (2301) from the collocated CTU is equal to 0, which corresponds to edge type 0° (2302), the edge type for the current frame (2310) is set equal to 1 (2303) corresponding to SAO edge type 90°. And if
sao_eo_class_X is equal to 1 (2304), sao_eo_class_X (2305) is set equal to 0. In the same way, the edge offset type 135° {sao_eo_class_X equal to 2 (2306)) is rotated to edge offset type 45° (2307). And the edge offset type 45° {sao_eo_class_X equal to 3 (2308)) is rotated to edge offset type 45° (2309). The offsets values have not been changed.
It will be appreciated that although the effect of the algorithm of Figure 32 is to apply a rotation, in practice the changes to the edge classification parameters ( sao_eo_class_Luma or sao_eo_class_Chroma ) may be effected by using a mapping table. In the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index. Thus, the mapping table implements the required rotation.
Figure 33 illustrates this temporal rotation by 90°. In this example, it is assumed that the temporal derivation with 90° rotation is applied to a whole frame or slice (e.g. as in the seventh embodiment).
Of course, as variants, the 45° and the 135° rotations can be considered instead of 90°. Yet, in a preferred embodiment the rotation of temporal SAO parameters sets is the 90° rotation. This gives the best coding efficiency.
In one variant, when the temporal rotation is applied, band offsets are not copied and SAO is not applied on this CTU.
In another variant, as for the basic“copying” temporal derivation, for all CTUs for which an SAO parameter set is unobtainable (this means a CTU whose collocated CTU uses “no SAO” or all of whose collocated CTUs use“no SAO”), a default SAO parameter set can be used for the CTUs concerned.
Twenty-first embodiment
In the twentieth embodiment, the“rotation” temporal derivation is introduced. In the twenty-seventh embodiment, the“rotation” temporal derivation is put in competition with the “copying” temporal derivation as shown in Figure 34. In this example the competition is applied to each slice or each frame. The best temporal derivation may be selected based on a rate-distortion criterion.
Twenty-second Embodiment
In several preceding embodiments, the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different
depths). In the twenty-second embodiment the“rotation” temporal derivation is put into competition with the same non-temporal derivation(s) instead of the“copying” temporal derivation. Twenty-third Embodiment
In several preceding embodiments, the “copying” temporal derivation was in competition with one or more non-temporal derivations (different groupings or different depths). In the twenty-third embodiment both the “copying” and “rotation” temporal derivations are put into competition with the same non-temporal derivation(s) instead of just the“copying” temporal derivation. For example, Table 4 below shows the competing derivations when the eleventh embodiment is modified in this way:
Table 4
As a variant, further temporal derivations with 135° and 45° rotations respectively or with other rotation angles are possible.
Twenty-fourth Embodiment
In the twenty- seventh embodiment the“copying” and“rotation” temporal derivations are in competition with one another. In the twenty-fourth embodiment these two temporal derivations and further“rotation” temporal derivations are used cyclically.
In one exemplary cycle, a first frame FO is followed by second, third, fourth and fifth frames F1-F4. The first frame FO does not use temporal derivation of SAO parameters. For Fl, the“copying” temporal derivation is applied (i.e. copying the SAO parameters from F0). For F2, the temporal derivation is a 90° rotation of SAO parameters of F0. For F3, the temporal derivation is a 135° rotation of SAO parameters of F0. For F4, the temporal derivation is a 45° rotation of SAO parameters of F0. In this case, F0 is a reference image for each of Fl to F4.
The same effect can be achieved by using the previous frame only as the reference frame:
Frame F0: (SAO parameters not derived temporally)
Frame Fl : (temporal‘copy’ Frame 0)
Frame F2: (temporal‘90°’ Frame 1)
Frame F3: (temporal‘45°’ Frame 2)
Frame F4: (temporal‘90°’ Frame 3)
By filtering an image part in a first image using the rotation copy derivation and filtering the same image part in two or more further images following the first image using different ones of the two or more temporal rotation derivations in a predetermined sequence the direction of edge filtering of an image part may be switched successively through all possible edge- filtering directions.
Second group of embodiments
In HEVC, SAO filtering is performed CTU by CTU. In some of the first group of embodiments, temporal derivation is introduced, and to improve the signalling efficiency, a group of image parts is formed and the use of temporal prediction is signalled for this group of image parts, rather than for each image part individually.
Twenty-fifth Embodiment
In the twentieth embodiment, the“rotation” temporal derivation is applied to all CTUs of a slice or frame. In other words, a rotation temporal derivation is signalled for a group (slice, frame, column, line, NxN CTUs, etc.) composed of two or more image parts (CTUs). The image parts (CTUs) may still have different SAO parameters depending on the SAO parameters of the respective collocated image parts.
Signalling the temporal derivation at the slice of frame level is useful for compatibility with the embodiments described previously a grouping of CTUs is selectable for the slice or frame from among plural groupings (e.g. the groupings 1201-1206 in Figure 23), the selected grouping also being signalled at the slice or frame level. However, it is not essential to signal the use of temporal derivation at the slice or frame level. This applies whether there is just one type of temporal derivation, e.g.“copy” or“rotation”, or the type can be selected from plural different types. Instead, the signalling of the use of temporal derivation can be at the CTU level or at the block level (i.e. sub-CTU). In this case, a syntax element may be provided per CTU to indicate whether or not rotation temporal derivation is used for the CTU concerned. Equally, a syntax element may be provided per block (i.e. sub-CTU) to indicate whether or not rotation temporal derivation is used for the block concerned.
In the twenty- fifth embodiment neither temporal derivation nor a grouping is signalled at the slice level and all the SAO signalling is at CTU level. Figure 35 shows an example decoding process in this embodiment. In this example, one or more merge flags are used at CTU level to signal the SAO derivation including usage of the temporal derivation. A new SAO temporal merge flag is introduced compared to Figure 9.
The process of Figure 35 is performed CTU by CTU. For a current CTU the sao jnerge Jemporal Jlag_X is extracted from the bitstream if other merge flags are off (2613). If sao jnerge Jemporal Jlag_X is equal to 1, a syntax element representing a reference frame is extracted from the bitstream (2614). Please note that this step is not needed if only one reference frame is used for the derivation. Then a syntax element representing a rotation of the parameters is decoded (2615). Please note that this step is not needed, if no“rotation” option is available. This would be the case if the only type of temporal derivation is the basic“copy” type. Also, even if there is a rotation option, the step 2615 is not performed if the collocated CTU in the reference frame is not EO type. Then the respective sets of SAO parameters for the 3 color components are copied from the collocated CTU to the current CTU. Processing then moves to the next CTU in step 2610.
The advantage of the temporal merge flag signalling compared to temporal/ CTU grouping signalling at slice level is a simplification of the encoder design for some implementations. Indeed, there is no need to wait for the encoding of the whole frame before starting SAO selection, unlike in the slice level approach. But the extra signalling at the CTU level can have a significant impact on the coding efficiency is not negligible.
When two or more temporal derivations are in competition with one another, for example“no temporal”,“copy”,“rotate by 90°”,“rotate by 135°”,“rotate by 45°” the syntax
element per CTU extracted in step 2615 may indicate the selected temporal derivation, e.g. using an index. The syntax element could also specify the angle of rotation. In this way, in the same slice or frame, some CTUs may have no temporal derivation, other CTUs may use “copy”, still others may use“rotate by 90°”, and so on. These solutions lead to an extremely fine adaptation of the SAO parameter derivation to the CTUs of a slice or frame.
Signalling a grouping for a slice or frame and then signalling for each group of two or more CTUs whether to use temporal derivation or not or, if two or more temporal derivations are in competition with one another, which one of them is selected, is an effective way to achieve adaptability without having per-CTU syntax elements. For example, if the selected grouping for a slice is 3x3 CTUs, some groups may have no temporal derivation, other groups may use“copy”, still others may use“rotate by 90°”, and so on. As the number of groups is only l/9th of the number of CTUs the number of syntax elements is correspondingly smaller compared to per-CTU signalling too, yet the different CTUs in each group may still have different SAO parameters depending on the collocated CTUs.
Twenty-sixth embodiment
In the twentieth to twentieth- fifth embodiments rotation temporal derivations are introduced. These rotation temporal derivations are preferred examples from a wider class of transformations that can be applied to change the direction of EO filtering in a CTU of the current frame compared to the direction of EO filtering in a collocated CTU of a reference frame. For example, the direction-changing transformation could be a reflection about the x- axis or y-axis. Such a reflection has the effect of swapping two directions and leaving the other two directions unchanged. It could also be a reflection about a diagonal line at 45° or 135°.
As in the twentieth embodiment it will be appreciated that although the effect of the algorithm of Figure 32 is to apply a transformation, in practice the changes to the edge classification parameters ( sao_eo_class_Luma or sao_eo_class_Chroma ) may be effected by using a mapping table. In the mapping table there is an entry for each existing edge index which maps to a corresponding“new” edge index. Thus, the mapping table implements the required transformation.
This embodiment is applicable to the first group of embodiments (which use a group- wise derivation) or to the second group of embodiments (which do not use a group-wise derivation).
Third group of Embodiments
In the first and second groups of embodiments, temporal derivation of SAO parameters was introduced, either as a group-wise derivation (applied to a group of two or more image parts) or for individual image parts.
In a third group of embodiments, new spatial derivations of SAO parameters are introduced. These may be group-wise derivations or for individual image parts.
In the case of a group-wise spatial derivation, as in the first group of embodiments, a group can be any two or more CTUs, for example a whole image, a slice, a line of CTUs, a column of CTUs, N lines of CTUs, N columns of CTUs, where N is integer greater than 1. A group could also be NxN CTUs, where N is an integer greater than 1, or MxN CTUs or NxM CTUs, where M > 1 and N > 1.
Alternatively, a group of image parts can be a CTU, and each constituent block of the CTU can be an image part. In such a case, each block of a CTU may have its own SAO parameters, but the signalling to use spatial derivation of the SAO parameters can be made for the CTU as a whole.
In the simplest case, where there is only one type of spatial derivation, a flag temporal merge can be used to signal the use of temporal derivation for all image parts of the group.
In the case of a group-wise spatial derivation the manner in which the SAO parameters are derived in the spatial derivation is not particularly limited except that the source image part belongs to another group of image parts in the same image as the subject group. The source image part and the image part to be derived are at the same positions in their respective groups. For example, in a 3x3 CTU grouping, there are 9 positions from the top left to the bottom right. If the other group is, for example, the left group of the subject group, then at least one SAO parameter of an image part at position 1 (the top left position, say) in the subject group is derived from an SAO parameter of the image part at the same position (position 1 or top left position) in the left group. This image part in the left group serves as a source image part for the image part to be derived in the subject group. The same is true for each other position in the subject group.
In the simplest case, the at least one SAO parameter of an image part belonging to the group is derived from an SAO parameter of the source image part by copying the SAO parameter of the source image part. One, more than one, or all SAO parameters may be copied.
Alternatively, one, more than one, or all SAO parameters may be copied only when the SAO filtering is of a particular type (edge or band).
Instead of copying a spatial derivation of SAO parameters which involves modifying one, more than one, or all SAO parameters of the source image part may be used.
It will be appreciated that spatial and temporal group-wise derivations are both“group- wise sourcing derivations”. Each involves applying a group-wise sourcing derivation of SAO parameters to a group of two or more image parts, the group-wise sourcing derivation permitting different image parts belonging to the group to have different SAO parameters and comprising deriving at least one said SAO parameter of an image part belonging to the group from an SAO parameter of another image part serving as a source image part for the image part to be derived. In the case of a temporal derivation the source image part is a collocated image part in a reference image having a position in the reference image collocated with a position of the image part to be derived in its image. In the case of a spatial derivation, the source image part belongs to another group of image parts in the same image as the image part to be derived, said source image part and said image part to be derived being at the same positions in their respective groups.
Twenty-seventh Embodiment
In the twenty- fifth embodiment the rotation derivation was a non-group-wise temporal derivation. However, in the twenty-seventh embodiment a spatial rotation derivation is used as a derivation, i.e. where the SAO parameters of a CTU in a current image are derived by rotation from the SAO parameters of another CTU of the same image (as opposed to being derived by rotation from the SAO parameters of a collocated CTU of a reference image).
Similarly to the“copy” spatial derivation, the other CTU in the“rotation” spatial derivation may be a left CTU or an upper CTU, in which case a sao_merge_rotation_left flag or sao_merge_rotation_up flag may be used to signal when the rotation spatial derivation is selected. Figure 36 shows two examples where the other CTU is the left CTU and the rotation from the left CTU to the current CTU is 90 degrees.
In one variant, the rotation spatial derivation may be in competition with the temporal copy derivation and/or the rotation temporal derivation.
In another variant, there are no temporal derivations and the rotation spatial derivation is in competition with the copy spatial derivation (which of course may be copy-left and/or copy-up).
In these cases , the“rotation” derivation is applied on a spatial basis to generate additional SAO merge parameters set candidates to predict the SAO parameters set of the current CTU. Accordingly, the“rotation” can be applied to increase the list of SAO Merge candidates or to find new SAO Merge candidates for empty positions.
The advantage of using the twenty-seventh embodiment instead of using several SAO parameters set from previously decoded SAO parameters set is an increase of coding efficiency performance. Moreover it offers additional flexibility for encoder implementation by accessing to a limited number of already encoded SAO parameters sets.
Figure 37 is a flow chart represented on of example of the possible usage of the rotation derivation of SAO parameters.
In the process for decoding a set of SAO parameters for a current CTU, the sao_merge_rotation_Left_X flag is extracted from the bitstream if other merge flags are off (3613). If sao_merge_rotation_Left_X is equal to 1, for each color component YUV of the current CTU the set of SAO parameters is derived from the set SAO parameters for the same component of the left CTU YUV by applying rotation to the edge classification as described in the twenty-fifth embodiment. The SAO parameters other than the direction may be simply copied.
Twenty-eighth Embodiment
In the twenty- seventh embodiment, the rotation spatial derivation was applied to one CTU. In the twenty-eighth embodiment a group-based rotation spatial derivation is applied. Then, each CTU of a current group derives its SAO parameters by rotation from the CTU at the same position in another group of the same image. For example, the group may be 3x3 CTUs. The other group may be a group above or on the left.
Again, the group-based spatial derivation may be in competition with a group-based temporal derivation (either copy or rotation or both).
Similarly, the group-based spatial derivation may be in competition with a group-based “copy” spatial derivation (which may be copy-left and/or copy up).
Twenty-ninth Embodiment
In the twenty-seventh and twenty-eighth embodiments a rotation spatial derivation was introduced. Just as the rotation temporal derivation is one of a wider class of possible direction-
transforming temporal derivations, so the rotation spatial derivation is one of a wider class of possible direction-changing spatial derivations. The direction-changing spatial derivation may be applied to an individual CTU or to a group of CTUs. It may be in competition with other spatial and/or temporal derivations.
Thirtieth Embodiment
Figure 38 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention. According to an embodiment, the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100. Such a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user. The system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal - e.g. while earlier video/audio are being displayed/output) via the communication network 199. According to an embodiment, the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time. The system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191. The bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus. The system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g. the title of the content and other meta/storage location data for identifying, selecting and requesting the content), and for receiving and processing a user request for a content so that the requested content can be delivered/streamed from the storage apparatus to the user terminal. Alternatively, the encoder
150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content. The decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
In the preceding embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer- readable medium and executed by a hardware-based processing unit.
Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non- transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer- readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically,
while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Claims
1. A method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising:
performing SAO filtering on a group made up of N x N image parts of the image using SAO parameters associated with the group, wherein N is three or more.
2. The method of claim 1, wherein two or more different groupings of said image parts are available, and the group made up of the N x N image parts is formed by one of said available groupings.
3. The method of claim 2 further comprising:
comparing SAO filtering using two or more of the available groupings; and selecting one grouping based on the comparison.
4. The method of claim 3, wherein the comparison is based on rate-distortion evaluation for the two or more groupings.
5. The method of claim 3 or 4, wherein at least one available grouping is excluded from the comparison and/or evaluation.
6. The method of any one of claims 2 to 5, wherein one available grouping forms another group made up of M image parts in a column of the image, and M is three or more.
7. The method of any one of claims 2 to 5, wherein one available grouping forms another group made up of image parts in a complete column of the image.
8. A method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising:
performing SAO filtering on a group made up of M image parts in a column of the image, wherein M is three or more.
9. A method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the method comprising:
performing SAO filtering on a group made up of image parts in a complete column of the image.
10. The method of claim 8 or 9, wherein two or more different groupings of said image parts are available, and the group is formed by one of said available groupings.
11. The method of claim 10 further comprising:
comparing SAO filtering using two or more of the available groupings ; and selecting one grouping based on the comparison.
12. The method of claim 11, wherein the comparison is based on rate-distortion evaluation for the two or more groupings.
13. The method of claim 11 or 12, wherein at least one available grouping is excluded from the comparison and/or evaluation.
14. A method of performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, two or more different groupings of said image parts being available, the method comprising:
determining a grouping; and
performing the SAO filtering using SAO parameters associated with the determined grouping,
wherein the two or more different groupings comprise one or more of:
a grouping for forming a group made up of N x N image parts of the image, wherein N is three or more;
another grouping for forming another group made up of M image parts in a column of the image, wherein M is three or more; and
another grouping for forming another group made up of image parts in a complete column of the image.
15. The method of claim 14, wherein the two or more different groupings further comprise one or more of a grouping(s) for forming:
a group made up of p x q image parts of the image, wherein p and q are one or larger; a group made up of j x j image parts of the image, wherein j is two;
a group made up of an image part of the image;
a group made up of all the image parts of the image;
a group made up of image parts in a line of the image;
a group made up of k image parts in a row of the image, wherein k is three or more/the width of the image;
a group made up of image part(s) which use(s) temporal derivation for at least one SAO parameter;
a group made up of image part(s) which use(s) temporal derivation with a modified image or image part for at least one SAO parameter; and
a group made up of image part(s) which uses(s) temporal derivation with another image or image part which has been rotated by 45, 90 or 135 degrees for at least one SAO parameter.
16. The method of claim 14 or 15, wherein the determining comprises:
comparing SAO filtering using two or more of the available groupings ; and selecting one grouping based on the comparison.
17. The method of claim 16, wherein the comparison is based on rate-distortion evaluation for the two or more groupings.
18. The method of claim 16 or 17, wherein at least one available grouping is excluded from the comparison and/or evaluation.
19. The method of any one of claims 14 to 18 further comprising providing, in a bitstream: the SAO parameters associated with the grouping;
data indicating a grouping; or
data indicating SAO parameters for the SAO filtering are inferred from another image part of the image or of another image.
20. The method of claim 19 further comprising providing, in a bitstream, data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used.
21. The method of claim 20, wherein when said data indicates either of the data is not available for use, not including the unavailable data in the bitstream.
22. The method of any one of claims 14 to 21, wherein the determining comprises obtaining, from a bitstream,:
the SAO parameters associated with the grouping;
data indicating a grouping, and determining the grouping using the obtained data; and/or
data indicating inferring of the SAO parameters for the SAO filtering from another image part of the image or of another image, and inferring the SAO parameters using the obtained data.
23. The method of claim 22 further comprising:
obtaining, from a bitstream, data indicating which one of the data indicating a grouping or the data indicating inferring of the SAO parameters is used; and
when the data indicates either or both data is available, obtaining, from the bitstream, the available data and using the obtained available data to determine the grouping or inferring the SAO parameters for the SAO filtering.
24. The method of claim 23 further comprising:
when the data indicates either of the data is not available for use, obtaining, from the bitstream, the SAO parameters associated with the grouping.
25. A method of encoding an image or a sequence of images, the method comprising performing sample adaptive offset (SAO) filtering according to the method of any one of claims 1 to 21.
26. A method of decoding an image or a sequence of images, the method comprising performing sample adaptive offset (SAO) filtering according to the method of any one of claims 1 to 18 or 22 to 24.
27. A device for performing sample adaptive offset (SAO) filtering on an image comprising a plurality of image parts, the device comprising a means for performing sample adaptive offset (SAO) filtering according to the method of any one of claims 1 to 24.
28. A device for encoding an image or a sequence of images, the device comprising a device of claim 27.
29. A device for decoding an image or a sequence of images, the device comprising a device of claim 27.
30. A signal carrying an information dataset for an image or a sequence of images represented by a bitstream, the image comprising a plurality of image parts, wherein the information dataset comprises data for performing SAO filtering using SAO parameters associated with a group made up of :
N x N image parts of the image, wherein N is three or more;
M image parts in a column of the image, wherein M is three or more; or
image parts in a complete column of the image.
31. A program which, when executed causes the method of any one of claims 1 to 26 to be performed.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1809237.9 | 2018-06-05 | ||
GB1809237.9A GB2574426A (en) | 2018-06-05 | 2018-06-05 | Video coding and decoding |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019234002A1 true WO2019234002A1 (en) | 2019-12-12 |
Family
ID=62975675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2019/064459 WO2019234002A1 (en) | 2018-06-05 | 2019-06-04 | Video coding and decoding |
Country Status (3)
Country | Link |
---|---|
GB (1) | GB2574426A (en) |
TW (1) | TW202013964A (en) |
WO (1) | WO2019234002A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140192860A1 (en) | 2013-01-04 | 2014-07-10 | Canon Kabushiki Kaisha | Method, device, computer program, and information storage means for encoding or decoding a scalable video sequence |
US9769450B2 (en) | 2012-07-04 | 2017-09-19 | Intel Corporation | Inter-view filter parameters re-use for three dimensional video coding |
WO2018054286A1 (en) * | 2016-09-20 | 2018-03-29 | Mediatek Inc. | Methods and apparatuses of sample adaptive offset processing for video coding |
-
2018
- 2018-06-05 GB GB1809237.9A patent/GB2574426A/en not_active Withdrawn
-
2019
- 2019-05-28 TW TW108118392A patent/TW202013964A/en unknown
- 2019-06-04 WO PCT/EP2019/064459 patent/WO2019234002A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9769450B2 (en) | 2012-07-04 | 2017-09-19 | Intel Corporation | Inter-view filter parameters re-use for three dimensional video coding |
US20140192860A1 (en) | 2013-01-04 | 2014-07-10 | Canon Kabushiki Kaisha | Method, device, computer program, and information storage means for encoding or decoding a scalable video sequence |
WO2018054286A1 (en) * | 2016-09-20 | 2018-03-29 | Mediatek Inc. | Methods and apparatuses of sample adaptive offset processing for video coding |
Non-Patent Citations (2)
Title |
---|
CHIH-MING FU ET AL: "Sample Adaptive Offset in the HEVC Standard", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, US, vol. 22, no. 12, 1 December 2012 (2012-12-01), pages 1755 - 1764, XP011487153, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2012.2221529 * |
LAROCHE (CANON) G ET AL: "Non-CE2: On SAO parameter signalling", no. JVET-K0201, 8 July 2018 (2018-07-08), XP030199094, Retrieved from the Internet <URL:http://phenix.int-evry.fr/jvet/doc_end_user/documents/11_Ljubljana/wg11/JVET-K0201-v2.zip JVET-K0201-v2.docx> [retrieved on 20180708] * |
Also Published As
Publication number | Publication date |
---|---|
TW202013964A (en) | 2020-04-01 |
GB2574426A (en) | 2019-12-11 |
GB201809237D0 (en) | 2018-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11601687B2 (en) | Method and device for providing compensation offsets for a set of reconstructed samples of an image | |
WO2020002117A2 (en) | Methods and devices for performing sample adaptive offset (sao) filtering | |
WO2019233997A1 (en) | Prediction of sao parameters | |
WO2019234000A1 (en) | Prediction of sao parameters | |
WO2019233999A1 (en) | Video coding and decoding | |
WO2019234002A1 (en) | Video coding and decoding | |
WO2019233998A1 (en) | Video coding and decoding | |
WO2019234001A1 (en) | Video coding and decoding | |
WO2024213516A1 (en) | Image and video coding and decoding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19728943 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19728943 Country of ref document: EP Kind code of ref document: A1 |