GB2582929A - Residual signalling - Google Patents

Residual signalling Download PDF

Info

Publication number
GB2582929A
GB2582929A GB1904969.1A GB201904969A GB2582929A GB 2582929 A GB2582929 A GB 2582929A GB 201904969 A GB201904969 A GB 201904969A GB 2582929 A GB2582929 A GB 2582929A
Authority
GB
United Kingdom
Prior art keywords
residual
mode
determining
flag
bit stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1904969.1A
Other versions
GB201904969D0 (en
Inventor
Laroche Guillaume
Onno Patrice
Gisquet Christophe
Taquet Jonathan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to GB1904969.1A priority Critical patent/GB2582929A/en
Publication of GB201904969D0 publication Critical patent/GB201904969D0/en
Publication of GB2582929A publication Critical patent/GB2582929A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/18Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Encoding or decoding image or video data, comprising: determining a motion prediction mode 1204, 1206, 1218 from a bit stream; decoding a residual flag 1208 from said bit stream; wherein said motion prediction mode is combined inter-intra prediction (CIIP) 1218 and said residual flag indicates the presence or absence of a residual. Preferably a residual flag corresponding to a luma component and a residual flag corresponding to a chroma component is included. Embodiments improve coding efficiency by allowing effective signalling a combined inter intra predictive skip mode in which there are no residual values coded. Acknowledged prior art only specifies a CIIP merge mode. Another independent claim is included for decoding an image by determining a motion prediction mode from a bitstream and determining a last significant value coefficient which indicates (preferably by taking a zero value) whether a residual is zero. In embodiments, once CIIP mode is determined, a parameter cu_cbt may be read from the bitstream to indicate whether or not a residual is included. The parameter cu_cbt may have an independent context for CABAC encoding. Furthermore, in embodiments sub-block transform for inter blocks (SBT) may be optionally enabled in CIIP mode.

Description

Intellectual Property Office Application No. GII1904969.1 RTM Date:4 October 2019 The following terms are registered trade marks and should be read as such wherever they occur in this document: Wi-Fi (Page 8) Intellectual Property Office is an operating name of the Patent Office www.gov.uk /ipo
RESIDUAL SIGNALLING
Field of invention
The present invention relates to video coding and decoding. Background Recently, the Joint Video Experts Team (IVET), a collaborative team formed by MPEG and ITU-T Study Group 16's VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include -but not limited to -360-degree and high-dynamic-range (UDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UED) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
The JVET exploration model (JEM) uses all the HEVC tools, as well as various new motion prediction tools. One new tool of the current VVC is the Combined Inter MERGE / Intra prediction (CIIP) also known as Multi hypothesis Intra/Inter prediction. This mode is enabled for the MERGE mode only. It means that it is not enabled for AMVP and Skip mode prediction. Consequently, this mode shall have a residual. This residual coding decreases the coding efficiency and increase the encoding complexity of the VVC codecs.
Accordingly, a solution to at least one of the aforementioned problems is desirable. The present invention relates to the manner in which residuals are signalled in the bit stream (and encoded / decoded) so as enable usage of the CIIP mode when there is a zero residual. This means that CIIP mode can be available for use more frequently and as such coding performance is improved. In particular, coding efficiency, complexity and encoder runtime may all be improved (to varying degrees depending on what is prioritised) when using embodiments of the present invention.
According to one aspect of the present invention, there is provided a method of decoding an image or image portion from a bit stream, the method comprising: determining a motion prediction mode from said bit stream; decoding a residual flag from said bit stream; wherein said motion prediction mode is combined inter-intra prediction (CUP) and said residual flag indicates the presence or absence of a residual. In such a way coding efficiency is improved, encoder runtime can be reduced, and coding complexity is reduced Optionally, so as to further improve coding efficiency, the method may further comprise determining a context specific to said residual flag from said bit stream.
Optionally, so as to further improve coding efficiency, the image portion comprises a block comprising multiple sub-blocks, each sub-block having a corresponding residual flag in said bit stream.
Optionally, so as to further improve coding efficiency the bit stream comprises a residual flag corresponding to a colour component. Optionally, the colour component comprises a luma component. Optionally, the bit stream further comprises a residual flag corresponding to a chroma component.
Optionally, so as to further improve coding efficiency, the residual is indicated to be zero if both the luma and chroma residual flags indicate zero residual.
Optionally, so as to further improve coding efficiency the method further comprises determining a context specific to each residual flag from said bit stream.
Optionally, the residual flag corresponds to the coding unit for the image portion being 15 decoded.
Optionally, for improved coding performance the method further comprises decoding a residual flag corresponding to a sub-block after decoding said residual flag corresponding to the coding unit in dependence on the value of the residual flag corresponding to the coding unit.
Optionally, so as to further improve coding efficiency the residual flag corresponding to a sub-block is decoded if the residual flag corresponding to the coding unit indicates no residual.
Optionally, the residual flag corresponding to a sub-block is decoded if the residual flag corresponding to the coding unit indicates a residual and said motion prediction mode is MERGE CIIP.
Optionally, determining the residual flag comprises determining the status of a Skip mode from said bit stream.
Optionally, so as to further improve coding efficiency, the method further comprises determining a context for said residual flag, said context depending on the status of the Skip mode.
Optionally, to improve coding complexity, determining the CUP motion prediction mode occurs after determining that the CU is not another merge mode.
According to another aspect of the present invention, there is provided a method of decoding an image or image portion from a bit stream, the method comprising: determining a motion prediction mode from said bit stream; determining the value of a last significant coefficient; wherein said last significant coefficient indicates whether or not the residual is different to zero. In such a way coding efficiency is improved, coding complexity is reduced and encoder runtime can be reduced.
Optionally, the method further comprises setting a residual flag in dependence on the value of said last significant coefficient. Optionally, the method further comprises decoding said residual flag if the last significant position is equal to zero.
Optionally, the method further comprises inferring said residual flag prior to determining the motion prediction mode, Optionally, the method further comprises determining whether the motion prediction mode is CIIP and in the case of said determining, decoding the residual flag from the bitstream.
Optionally, for a less complex decoder the last significant coefficient is equal to the first coefficient.
Optionally, the residual flag is determined based on the value of sig coeff flag Encoder According to another aspect of the present invention, there is provided a method of encoding an image or image portion into a bit stream, the method comprising: determining a motion prediction mode for said image or image portion; encoding a residual flag into said bit stream; wherein said motion prediction mode is Combined Inter Intra Prediction (CLIP) mode and said residual flag indicates the presence or absence of a residual. In such a way, encoder runtime can be reduced, and a simpler, more efficient coding process can be achieved.
Optionally, for improved coding performance, determining a motion prediction mode comprises determining a mode with the lowest rate distortion; Optionally, for improved coding performance, the method further comprises setting a preference for Skip mode if said residual flag indicates the absence of a residual.
Optionally, the residual flag corresponds to the coding unit for the image portion being encoded.
Optionally, the residual flag comprises multiple flags corresponding to the chroma and luma components.
Optionally, for improved coding performance, setting said preference for Skip mode is performed when all of the residual flags corresponding to the chroma and luma components indicate the absence of a residual.
Optionally, setting a preference for Skip mode comprises setting a BestlsSkip variable.
Optionally, for improved coding performance, setting a preference for Skip mode occurs after determining that the coding mode resulting in the lowest rate distortion has a zero residual.
Optionally, setting a preference for Skip mode is equal to true after determining that the coding mode resulting in the lowest rate distortion is a Skip mode and to false otherwise.
Optionally, said setting a preference for Skip mode is equal to true after determining that the coding mode resulting in the lowest ratedistortion is CIIP and if there is no residual.
Optionally, for a less complex encoder, setting a preference for Skip mode occurs at the beginning of a merge estimation process.
Optionally, the method further comprises evaluating said CIIP mode when said residual flag indicates the absence of a residual.
Optionally, determining a motion prediction mode for said image or image portion comprises determining that a variable indicating a preference for Skip mode is false.
Device According to another aspect of the present invention, there is provided a device for decoding an image or image portion from a bit stream, comprising: means (for example a suitably programmed processor and associated memory) for determining a motion prediction mode from said bit stream; means (for example a suitably programmed processor and associated memory) for decoding a residual flag from said bit stream; wherein said motion prediction mode is combined inter-intra prediction (CIIP) and said residual flag indicates the presence or absence of a residual.
According to another aspect of the present invention, there is provided a device for decoding an image or image portion from a bit stream, comprising: means (for example a suitably programmed processor and associated memory) for determining a motion prediction mode from said bit stream; means (for example a suitably programmed processor and associated memory) for determining the value of a last significant coefficient; wherein said last significant coefficient indicating whether or not the residual is different to zero.
According to another aspect of the present invention, there is provided a device for encoding an image or image portion into a bit stream, comprising: means (for example a suitably programmed processor and associated memory) for determining a motion prediction mode for said image or image portion; means (for example a suitably programmed processor and associated memory) for encoding a residual flag into said bit stream; wherein said motion prediction mode is Combined Inter Intra Prediction (CLIP) mode and said residual flag indicates the presence or absence of a residual.
Yet further aspects of the present invention relate to a program as defined by claim 38. The program may be provided on its own or may be carried on, by or in a carrier medium. The carrier medium may be non-transitory, for example a storage medium, in particular a computer-readable storage medium. The carrier medium may also be transitory, for example a signal or other transmission medium. The signal may be transmitted via any suitable network, including the Internet.
Further features of the invention are characterised by the other independent and dependent claims Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.
Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly Any apparatus feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.
It should also be appreciated that particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.
Reference will now be made, by way of example, to the accompanying drawings, in which: Figure 1 is a diagram for use in explaining a coding structure used in ELEVC; Figure 2 is a block diagram schematically illustrating a data communication system in 30 which one or more embodiments of the invention may be implemented; Figure 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented; Figure 4 is a flow chart illustrating steps of an encoding method according to embodiments of the invention; Figure 5 is a flow chart illustrating steps of a decoding method according to embodiments of the invention; Figures 6 and 7 show the labelling scheme used to describe blocks situated relative to a current block; Figures 8(a) and (b) illustrate the Affine mode; Figures 9(a) and (b) illustrate the Triangle mode; Figure 10 illustrates the Combined Inter MERGE / Intra prediction (CIIP) mode; Figure 11 illustrates an example decoding process of various Merge modes; Figure 12 illustrates an example first stage of residual signalling; Figure 13 shows an alternative residual signalling method; Figure 14 illustrates sub-block transform (SBT) partitioning; Figure 15 illustrates Intra Sub-Partitions; Figure 16 shows an embodiment enabling SBT for CIIP mode; Figures 17 and 18 show two embodiments for enabling SKIP mode for CIIP mode; Figure 19 shows an example transform tree process which decodes the residual from the bit stream; Figure 20 shows an example transform-unit process; Figure 21 shows an alternative transform-unit process; Figure 22 shows three example scanning patterns; Figure 23 illustrates an example manner in which residual data is transmitted in a bitstream; Figure 24 schematically illustrates a simplified residual coding process; Figures 25 and 26 show alternative residual coding processes; Figure 27 shows a simplified example of an encoding process for the CIIP mode; Figure 28 shows a simplified example of a rate distortion (RD) evaluation process; Figures 29-31, 33 and 34 show alternative encoding process for the CIIP mode; Figure 32 shows an alternative rate distortion (RD) evaluation process; Figure 35 is a diagram showing a system comprising an encoder or a decoder and a communication network according to embodiments of the present invention.
Figure 36 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention; FIG. 37 is a diagram illustrating a network camera system; and FIG. 38 is a diagram illustrating a smart phone.
Detailed description
Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard. A video sequence 1 is made up of a succession of digital images i.
Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
An image 2 of the sequence may be divided into slices 3. A slice may in some instances constitute an entire image. These slices are divided into non-overlapping Coding Tree Units (CTUs). A Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards. A CTU is also sometimes referred to as a Largest Coding Unit (LCU). A CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
A CTU is generally of size 64 pixels x 64 pixels for HEVC, yet for VVC this size can be 128 pixels x 128 pixels. Each CTU may in turn be iteratively divided into smaller variable-size Coding Units (CUs) 5 using a quadtree decomposition.
Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU). The maximum size of a PU or TU is equal to the CU size. A Prediction Unit corresponds to the partition of the CU for prediction of pixels values. Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs. A Transform Unit is an elementary unit that is subjected to spatial transformation using DCT. A CU can be partitioned into TUs based on a quadtree representation 607.
Each slice is embedded in one Network Abstraction Layer (NAL) unit. In addition, the coding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC and H.264/AVC two kinds of parameter sets NAL units are employed: first, a Sequence Parameter Set (SP S) NAL unit that gathers all parameters that are unchanged during the whole video sequence. Typically, it handles the coding profile, the size of the video frames and other parameters. Secondly, a Picture Parameter Set (PPS) NAL unit includes parameters that may change from one image (or frame) to another of a sequence. HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream. The WS is a new type of parameter set defined in HEVC, and applies to all of the layers of a bitstream. A layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer. HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
Figure 2 illustrates a data communication system in which one or more embodiments of the invention may be implemented. The data communication system comprises a transmission device, in this case a server 201, which is operable to transmit data packets of a data stream to a receiving device, in this case a client terminal 202, via a data communication network 200. The data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.11a or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks. In a particular embodiment of the invention the data communication system may be a digital television broadcast system in which the server 201 sends the same data content to multiple clients.
The data stream 204 provided by the server 201 may be composed of multimedia data 15 representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 using a microphone and a camera respectively.
In some embodiments data streams may be stored on the sewer 201 or received by the sewer 201 from another data provider, or generated at the server 201. The server 201 is provided with an encoder for encoding video and audio streams in particular to provide a compressed bitstream for transmission that is a more compact representation of the data presented as input to the encoder.
In order to obtain a better ratio of the quality of transmitted data to quantity of transmitted data, the compression of the video data may be for example in accordance with the HEVC format or H.264/AVC format or VVC format.
The client 202 receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images on a display device and the audio data by a loud speaker. Although a streaming scenario is considered in the example of Figure 2, it will be appreciated that in some embodiments of the invention the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
In one or more embodiments of the invention a video image is transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
Figure 3 schematically illustrates a processing device 300 configured to implement at least one embodiment of the present invention. The processing device 300 may be a device such as a micro-computer, a workstation or a light portable device. The device 300 comprises a communication bus 313 connected to: -a central processing unit 311, such as a microprocessor, denoted CPU; -a read only memory 306, denoted ROM, for storing computer programs for implementing the invention; -a random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention; and -a communication interface 302 connected to a communication network 303 over which digital data to be processed are transmitted or received Optionally, the apparatus 300 may also include the following components: -a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention; -a disk drive 305 for a disk 306, the disk drive being adapted to read data from the disk 306 or to write data onto said disk; -a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means.
The apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
The communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
The disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
The executable code may be stored either in read only memory 306, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously.
According to a variant, the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the hard disk 304.
The central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 304 or in the read only memory 306, are transferred into the random access memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or A S IC). Figure 4 illustrates a block diagram of an encoder according to at least one embodiment of the invention. The encoder is represented by connected modules, each module being adapted 20 to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
An original sequence of digital images i0 to in 401 is received as an input by the encoder 400. Each digital image is represented by a set of samples, sometimes also referred to as pixels (hereinafter, they are referred to as pixels).
A bitstream 410 is output by the encoder 400 after implementation of the encoding process. The bitstream 410 comprises a plurality of encoding units or slices, each slice comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
The input digital images i0 to in 401 are divided into blocks of pixels by module 402. The blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered). A coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405. Firstly a reference image from among a set of reference images 416 is selected, and a portion of the reference image, also called reference area or image portion, which is the closest area (closest in terms of pixel value similarity) to the given block to be encoded, is selected by the motion estimation module 404. Motion compensation module 405 then predicts the block to be encoded using the selected area. The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405. The selected reference area is indicated using a motion vector.
Thus, in both cases (spatial and temporal prediction), a residual is computed by subtracting the predictor from the original block.
In the INTRA prediction implemented by module 403, a prediction direction is encoded. In the Inter prediction implemented by modules 404, 405, 416, 418, 417, at least one motion vector or data for identifying such motion vector is encoded for the temporal prediction.
Information relevant to the motion vector and the residual block is encoded if the Inter prediction is selected. To further reduce the bitrate, assuming that motion is homogeneous, the motion vector is encoded by difference with respect to a motion vector predictor. Motion vector predictors from a set of motion information predictor candidates is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417.
The encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion. In order to further reduce redundancies a transform (such as DCT) is applied by transform module 407 to the residual block, the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bitstream 410.
The encoder 400 also performs decoding of the encoded image in order to produce a reference image (e.g. those in Reference images/pictures 416) for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames (reconstructed images or image portions are used). The inverse quantization ("dequantization") module 411 performs inverse quantization ("dequantization") of the quantized data, followed by an inverse transform by inverse transform module 412. The intra prediction module 413 uses the prediction information to determine which predictor to use for a given block and the motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images 416. Post filtering is then applied by module 415 to filter the reconstructed frame (image or image portions) of pixels. In the embodiments of the invention an SAO loop filter is used in which compensation offsets are added to the pixel values of the reconstructed pixels of the reconstructed image. It is understood that post filtering does not always have to performed.
Also, any other type of post filtering may also be performed in addition to, or instead of, the SAO loop filtering.
Figure 5 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
The decoder 60 receives a bitstream 61 comprising encoded units (e.g. data corresponding to a block or a coding unit), each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. As explained with respect to Figure 4, the encoded video data is entropy encoded, and the motion vector predictors' indexes are encoded, for a given block, on a predetermined number of bits. The received encoded video data is entropy decoded by module 62. The residual data are then dequantized by module 63 and then an inverse transform is applied by module 64 to obtain pixel values.
The mode data indicating the coding mode are also entropy decoded and based on the mode, an INTRA type decoding or an IN FER type decoding is performed on the encoded blocks (units/sets/groups) of image data.
In the case of INTRA mode, an INTRA predictor is determined by intra prediction module 65 based on the intra prediction mode specified in the bitstream.
If the mode is INTER, the motion prediction information is extracted from the bitstream so as to find (identify) the reference area used by the encoder. The motion prediction information comprises the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual by motion vector decoding module 70 in order to obtain the motion vector. The various motion predictor tools used in VVC are discussed in more detail below with reference to Figures 6-10.
Motion vector decoding module 70 applies motion vector decoding for each current block encoded by motion prediction. Once an index of the motion vector predictor for the current block has been obtained, the actual value of the motion vector associated with the current block can be decoded and used to apply motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to apply the motion compensation 66. The motion vector field data 71 is updated with the decoded motion vector in order to be used for the prediction of subsequent decoded motion vectors.
Finally, a decoded block is obtained. Where appropriate, post filtering is applied by post filtering module 67. A decoded video signal 69 is finally obtained and provided by the decoder 60.
Motion prediction (INTER) modes HEVC uses 3 different INTER modes: the Inter mode (Advanced Motion Vector Prediction (AMVP)), the "classical" Merge mode (i.e. the "non-Affine Merge mode" or also known as "regular" Merge mode) and the "classical" Merge Skip mode (i.e. the "non-Affine Merge Skip" mode or also known as "regular" Merge Skip mode). The main difference between these modes is the data signalling in the bitstream. For the Motion vector coding, the current HEVC standard includes a competitive based scheme for Motion vector prediction which was not present in earlier versions of the standard. It means that several candidates are competing with the rate distortion criterion at encoder side in order to find the best motion vector predictor or the best motion information for respectively the Inter or the Merge modes (i.e. the "classical/regular" Merge mode or the "classical/regular" Merge Skip mode). An index corresponding to the best predictors or the best candidate of the motion information is then inserted in the bitstream, together with a 'residual' which represents the difference between the predicted value and the actual value. The decoder can derive the same set of predictors or candidates and uses the best one according to the decoded index. Using the residual, the decoder can then recreate the original value.
In the Screen Content Extension of HEVC, the new coding tool called Intra Block Copy is signalled as any of those three INTER modes, the difference between IBC and the equivalent INTER mode being made by checking whether the reference frame is the current one. This can be implemented e.g. by checking the reference index of the list LO, and deducing this is Intra Block Copy if this is the last frame in that list. Another way to do is comparing the Picture Order Count of current and reference frames: if equal, this is Intra Block Copy.
The design of the derivation of predictors and candidates is important in achieving the best coding efficiency without a disproportionate impact on complexity. In HEVC two motion vector derivations are used: one for Inter mode (Advanced Motion Vector Prediction (AMVP)) and one for Merge modes (Merge derivation process -for the classical Merge mode and the classical Merge Skip mode). The following describes the various motion predictor modes used in V VC.
Figures 6 and 7 show the labelling scheme used herein to describe blocks situated relative to a current block (i.e. the block currently being en/decoded) between frames (Fig. 6) and within the same frame (Fig. 7).
Affine mode In BEVC, only translation motion model is applied for motion compensation prediction (MCP). While in the real world, there are many kinds of motion, e.g. zoom in/out, rotation, perspective motions and other irregular motions.
In the JEM, a simplified affine transform motion compensation prediction is applied and the general principle of Affine mode is described below based on an extract of document JVET-G1001 presented at a JVET meeting in Torino at 13-21 July 2017. This entire document is hereby incorporated by reference insofar as it describes other algorithms used in JEM.
As shown in Figure 8(a), the affine motion field of the block is described by two control point motion vectors.
The affine mode is a motion compensation mode like the Inter modes (AMVP, "classical-Merge, or 'classical-Merge Skip). Its principle is to generate one motion information per pixel according to 2 or 3 neighbouring motion information. In the JEM, the affine mode derives one motion information for each 4x4 block as depicted in Figure 8(a) (each square is a 4x4 block, and the whole block in Figure 8(a) is a 16x16 block which is divided into 16 blocks of such square of 4x4 size -each 4x4 square block having a motion vector associated therewith). The Affine mode is available for the AMVP mode and the Merge modes (i.e. the classical Merge mode which is also referred to as "non-Affine Merge mode" and the classical Merge Skip mode which is also referred to as "non-Affine Merge Skip mode"), by enabling the affine mode with a flag. This flag is Context Adaptive Binary Arithmetic Coding (CABAC) coded.
CABAC is an extension of the arithmetic coding used in H.265/HEVC which separates the probabilities of a syntax element depending on a 'context' defined by a context variable. This corresponds to a conditional probability. The context variable may be derived from the value of the current syntax for the top left block (A2 in Figure 7 as described in more detail below) and the above left block (B3 in Figure 7), which are already decoded. It should be appreciated that other coding modes may be used -for example Context-Adaptive Variable-Length Coding (CAVLC), Golomb-rice Code, or simple binary representation called Fixed Length Coding -however CABAC has been found to provide the greatest bit-rate savings (at a cost of increased complexity).
Other modes In addition to the regular MERGE mode and affine MERGE mode of the JEM software, the current VVC standard under definition contains 3 other MERGE modes. One mode is the Combined Inter MERGE / Intra prediction (CLIP) also known as Multi-Hypothesis Intra Inter (MHID MERGE mode. Another one is the TRIANGLE MERGE mode and the last one is the Motion Vector Difference (MMVD) MERGE mode.
Triangle Mode The TRIANGLE MERGE mode is a particular bi-prediction mode. Figure 9 illustrates this particular block predictor generation. The block predictor contains one triangle from a first block predictor (901 or 911) and a second triangle from a second block predictor (902 or 912).
There are two types of block generations. For the first one the splitting between the 2 block predictors is from top the left corner to the bottom right corner as depicted in Figure 9(a). For the second one, the splitting between the two block predictors is from the top right corner to the bottom left corner as depicted in Figure 9(b). In addition, the samples near the frontier between both triangle sample predictors are filtered with a weighted average where the weight depends on the sample position.
MMVD
The MMVD MERGE mode is a specific regular MERGE mode candidate derivation. It can be considered as an independent MERGE candidates list. The selected MMVD MERGE candidate, for the current CU, is obtained by adding an offset value to one motion vector component (mvx or mvy) to an initial regular MERGE candidate. The offset value is added to the motion vector of the first list LO or to the motion vector of the second list LI depending on the configuration of these reference frames (both backward, both forward or forward and backward). The initial Merge candidate is signalled thanks to an index. The offset value is signalled thanks to a distance index between the 8 possible distances 0/4-pel, 1/2-pel, 1 -pel, 2-pel, 4-pel, 8-pel, 16-pel, 32-pel) and a direction index giving the x or the y axis and the sign of the offset.
CLIP
The Combined Inter MERGE / Intra prediction (CIIP) MERGE can be considered as a combination of the regular MERGE mode and the Intra mode and is described below with reference to Figure 10. The block predictor for the current block (1001) of this mode is an average between a MERGE predictor block and an Intra predictor block as depicted in Figure 10. The MERGE predictor block is obtained with exactly the same process of the MERGE mode so it is a temporal block (1002) or bi-predictor of 2 temporal blocks. As such, a MERGE index is signalled for this mode in the same manner as the regular MERGE mode. The Intra predictor block is obtained based on the neighbouring sample (1003) of the current block (1001). The amount of available Intra modes for the current block is however limited compared to an Intra block. Moreover, there is no Chroma Intra predictor block signalled for a CIIP block. The Chroma predictor is equal to the Luma predictor. As a consequence, 1, 2 or 3 bits are used to signal the Intra predictor for a CIIP block.
The CIIP block predictor is obtained by a weighted average of the MERGE block predictor and the Intra block predictor. The weighting of the weighted average depends on the block size and/or the Intra predictor block selected.
The obtained CLIP predictor is then added to the residual of the current block to obtain the reconstructed block. It should be noted that the CLIP mode is enabled only for non-Skipped blocks. Indeed, use of the CLIP Skip typically results in losses in compression performance and an increase in encoder complexity. This is because the CIIP mode has often a block residual in opposite to the other Skip mode. Consequently, its signalling for the Skip mode increase the bitrate -when the current CU is Skip, the CIIP is avoided. A consequence of this restriction is that the CIIP block can't have a residual containing only 0 value as it is not possible to encode a VVC block residual equal to 0. Indeed, in VVC the only way to signal a block residual equal to 0 for a Merge mode is to use the Skip mode, this is because the CU CBF flag is inferred to be equal to true for Merge modes. And when this CBF flag is true, the block residual can't be equal to 0.
The current invention can be applied also for alternative to CIIP mode such as methods which use Intra filtering of the Inter predictor block instead of mixing an Inter predictor block and an Intra predictor block. Similarly, the invention can be applied to methods which apply an Intra filtering of the Inter block and mix the obtained Inter block with an Intra predictor block. In the same way the current invention can be applied also to methods which use neighbouring pixels of the current block to determine a modification of the Inter predictor block.
In such a way, 'CI1P' should be interpreted functionally in this specification -e.g. as being a mode which combines features of Inter and Intra prediction, and not necessarily as a label given to one specific mode.
Signalling of motion prediction (INTER) modes This Merge modes are signalled inside the bitstream with their related syntaxes. Figure 11 illustrates the decoding process of these new Merge modes for the current CU. First the CU Skip flag is extracted from the bitstream (1101). If the CU is not Skip the pred mode flag (1103) and/or Merge flag (1106) are decoded to identify if the current CU is a Merge CU. If the current CU is a Skip Merge (1102) or a Merge CU (1107) a MMVD Skip Flag or a MMVD Merge Flag is decoded (1108). If this flag is equal to 1 (1109) the current CU will be decoded with the IVEVIVD MERGE mode, consequently the IVEVIVD MERGE index is decoded (1110) followed by the MIVIVD distance index (1111) and the MIVIVD direction index (1112).
If the CU is not a MMVD MERGE CU (1109), the Merge sub block flag is decoded (1113). This flag is also denoted as Affine flag in the previous description. If the block is an AFFINE MERGE (or sub block MERGE) the Merge sub block index (or AFFINE MERGE index) is decoded (1115). If the current CU is not an AFFINE MERGE (1114) and not a Skip CU (1116), CLIP MERGE flag is decoded (1120), If this block is coded with the CLIP MERGE (1121), the Regular MERGE index (1119) is decoded with the related Infra prediction mode (1122). Please note that the CILP MERGE is available only for the Merge mode and not for the Skip mode. If the CLIP MERGE flag is set equal to 0 (1121) or if the current CU is Skipped (1116) the MERGE TRIANGLE flag is decoded (1117). If this CU is a TRIANGLE MERGE, a splitdir is decoded to identify the split direction (Figure 9a or Figure 9b) (1125). Then a first MERGE index is decoded (1123) and a second Merge index is then decoded (1119). If the current CU is not a TRIANGLE MERGE (1117), the current CU is a Regular MERGE CU and the MERGE index is decoded.
Residual signalling As discussed above, a residual is encoded into the bitstream to indicate the difference 30 between the value being predicted and the actual value being encoded. The following description outlines the manner Coding Block Flag (CBF) Figure 12 illustrates a first step of the residual signalling for a current CU (from the perspective of a decoder). In this figure the CBF value of the current CU is determined. The Coding Block Flag (CBF) flag specifies that the current block contains a residual. First, if the current CU is a SKIP CU (1202), the cu cbf flag is set equal to 0 (1203) and no residual is decoded for the current block. Indeed by definition, a SKIP CU has no residual. If the current CU is Intra (1204), the cu cbf is set equal to 1 (1205). If the Intra block has no residual it will be signalled thanks to TU cbf flags as described in Figure CC. If the current CU is a MERGE CU (1206), the cu cbf flag is set equal to 1 (1207). Indeed, if the current MERGE block has no residual it will be signalled thanks to the SKIP mode. If the current CU is not a MERGE (1206), the current CU is coded with another Inter mode different to the SKIP or MERGE mode, then the cu cbf flag is decoded (1208). If this cbf flag is equal to 1 (1210) or if the current CU is a MERGE CU but not CIIP (1209), the cu sbt flag (1212) is decoded if the current CU has a size allowed for SBT method (1211). In the same way, the other data related to SBT are decoded (1213) (1214) (1215). Please note that when the cu cbf is equal to 0 (1210), the process is ended (1217) so no residual is decoded.
If the cbf flag is equal to 1, one or several residuals are decoded (1216). This "transform tree" process is described in Figure 19.
However, this process means that it is not possible to signal a zero residual for a non-Skip motion prediction mode. When a CU CBF is equal to 1, the block residual can't be a block with only 0 residuals. As such, the residual loop is required for each CIIP block evaluated for a real time encoder. Consequently, the CIIP may not be evaluated with real time encoder and such modes may be 'skipped' (i.e. not selected) even if the prediction is completely (or sufficiently) correct.
Figure 13 shows a modification which allows for signalling a zero residual when CIIP mode is used.
In this embodiment, the cu_cbf flag is decoded from the bitstream (i.e. not inferred) after it is determined that the current Merge mode is CIIP. Consequently, when the cu cbf is false, no residual is decoded and added to the CIIP predictor.
Figure 13 illustrates the decoding or setting process of the cu_cbf flag. Compared to Figure 12, step 1209 has been removed and steps 1218 and 1219 have been added. If the current CU is a Merge mode (1206) and if the CU is CIIP (1218) the cu cbf flag is decoded (1208).
Otherwise the cu cbf is set equal to 1 (1207). When the cu cbf is equal to 1 (1210) and when the CU is CIIP (1219), the transform tree() decoding process is applied (1216), otherwise the SBT flag is decoded (1211, 1212).
By allowing a CIIP block without a residual, the design of the encoder/decoder is cleaner. Moreover, it improves real encoder implementation. Indeed, in real encoder implementation, the residual is not typically evaluated for an Inter block if the modes without residual have been not tested. Consequently, the CIIP is not evaluated often and rarely selected. With the proposed invention, the CI1P mode is more likely to be selected, improving coding efficiency.
With this solution, encoder complexity is decreased (if using a corresponding encoder implementation) because the CIIP mode can be evaluated without a residual. As a consequence, the early termination conditions are more often true, and as such encoder runtime may be reduced.
The proposed solution improves the coding efficiency by a signalling of the no residual 10 for CIIP Merge mode with another way as signalling as the Skip mode. The proposed solution gives better coding efficiency than a Skip CIIP mode because the statistics of this mode is not adapted to the Skip mode.
Specific context for cu CBF for CIIP In the current VVC design the cu cbf flag is CABAC coded with only one context. In 15 the proposed embodiment when the cu cbf is signalled for the CEP Merge mode, an independent context is used to code the cu cbf.
The advantage of this solution is an additional coding efficiency increase. Indeed, it has been found that the CIIP mode often has a residual (and more often than the other Inter modes), as such it is preferable, and gives rise to a coding efficiency gain, to separate the probabilities 20 by using a second context dedicated to the cu_cbf of the CIIP Merge mode.
SBT
The Sub-block transform for Inter blocks (SBT) is a particular Transform Unit (TU) splitting in addition to some particular transform selection based on the splitting parameters. Figure 14 illustrates the splitting which can be used when cu_sbt_flag is set equal to 1. In this Figure 14, only the dashed blocks contain a residual. For example of CU 1410, only the partition 1412 contains a residual and not the partition 1411. The same applies to the other example splitting where the numerals 14m0, 14ml and 14m2 (where m=1...7) indicate the CU, partition not containing a residual, and partition containing a residual respectively. These different partitions are signalled by the syntax elements cu sbt flag, cu sbt horizontal flag and cu_sbt_pos_flag. These TU partitioning is allowed only for some TU size (1211).
ISP
For Intra blocks there are also some possible sub-partitionings. The Intra Sub-Partitions Coding Mode (ISP) is a particular partitioning scheme. In contrast to the SBT method discussed above, each sub partition can contain a residual for both Luma and Chroma blocks. Figure EE illustrates the possible partitioning of a block (1510). The block can be split into 4 vertical blocks (1520) or four horizontal blocks (1530). For small block size 8x4, 4x8, only 2 sub-partitions are considered. The last sub partition has its to cbf luma always equal to 1 in the current VVC design.
Allow SBT for CIIP In the current VVC, the CRP mode is not allowed for SBT mode as the coding efficiency complexity trade-off meant this did not lead to any worthwhile gains with the current version of the CIIP mode. In an additional, complementary, embodiment, SBT mode is allowed for CIIP when the cu cbf is coded for CIIP. Figure 16 illustrates this embodiment. Compared to Figure 13 the module 1219 has been removed. In that case, when the current CU is a CIIP MERGE mode (1206) and when the cu cbf is true (1210) the cu sbt flag can be (1211) decoded (1212).
In such a way, the coding efficiency gains of SBT can be utilised in combination with the advantages discussed above.
SKIP CIIP + context for CUP flag The SKIP mode CIIP was proposed for the VVC standard but not adopted due to increasing encoder run time and decreasing coding efficiency compared to enable CIIP for Merge mode only.
In terms of design the usage of the Skip mode to signal a no residual CIIP is 'cleaner' than other proposed solutions; however, the coding efficiency decrease removes the gain obtained by the CIIP mode.
Figure 17 illustrates an alternative way of enabling SKIP mode for CEP which does not suffer from the disadvantages of the 'clean' proposal discussed above. Figure 17 is based on Figure 11. A key difference between this figure is that the condition (1116) has been removed.
In one embodiment the CIIP Skip is enabled and two contexts are considered for the CIIP Flag. The first context is used when the Skip mode is enabled and the second one when the Skip mode is disabled.
This embodiment allows for CIIP mode to have zero residual and improves the coding efficiency of the 'clean' solution but may in some circumstances be less efficient than other proposed embodiments described herein.
SKIP CIIP + context for CIIP flag In one embodiment, the Skip CIIP is signalled and the CIIP flag is set at the end of all Merge modes in order to reduce (as far as possible) its cost of signalling. Figure 18 illustrates this embodiment; compared to Figure 17, the Triangle flag (1117) is decoded before the CIIP flag (1120).
With this solution, coding efficiency is improved, encoder complexity is decreased (if using a corresponding encoder implementation).
It should be appreciated that the Skip CIIP may be signalled and the CIIP flag is set at the end of all Merge modes and two contexts are considered for the CIIP Flag (i.e. a combination of the above two embodiment). The first context is used when the Skip mode is enabled and the second one when the Skip mode is disabled.
Such a combination provides additional coding efficiency gains.
Transform_tree Figure 19 illustrates the transform tree process which decodes the residual from the bit stream. This process is a recursive process (1916). The input parameters of this process are the top left position of the current block (x0, yO) and the size of the current block (tbWidth, tbHeight) (1901); the output of this process are input parameters for the transform unit process (shown in Figure CC). If the cu sbt flag is enabled (1902), the SBT method is applied. If the cu sbt horizontal flag is enabled (1903), the block is split horizontally and two transform units are considered (1904) (1905). If the cu sbt horizontal flag is set equal to 0, the block will be split vertically, and two transform units are considered (1906) (1907). The Transform-unit parameters are set based on the different SBT syntax elements already decoded as depicted in Figure 12 in order to produce the current sub-partitioning as depicted in Figure 14.
If the IntraSubPartitionsSplitType is set equal to ISP HOR SPLIT (1908), the transform unit process (1910) is applied for each NumIntraSubPartitions partitions (1909). Otherwise if the IntraSubPartitionsSplitType is set equal to ISP_VER_SPLIT (1911), the transform unit process (1913) is applied for each NumIntraSubPartitions partition (1912).
Please note that IntraSubPartitionsSplitType is only decoded for Intra blocks and not for Inter blocks. So for an Inter block, IntraSubPartitionsSplitType is set equal to 0 (ISP_NO_SPLIT) (and it is different to ISP HOR SPLIT and ISP VER SPLIT).
If the block is not split with SBT for Inter blocks or with ISP for Intra block, the block size (tbWidth, tbHeight) is compared to the maximum transform size MaxTbSizeY for the height and width (1914). If at least one is higher to the MaxTbSizeY, the block is split into 2 to 4 blocks (1915) and the transform tree process (1916) is applied for each block. If the block size is less or equal to the maximum transform size, the transform-unit process is applied for the current block (1917).
Transform unit Figure 20 illustrates the transform-unit process. The input parameters of this process are the top left position of the current block (x0, y0), the size of the current block (tbWidth, tbHeight) and the subTulndex (2001). The subTulndex indicates the sub-partition index related to SBT and ISP methods. According to a first condition (2002) the TU CBF for chroma, tu cbf cb (2003) tu cbf or (2004), are extracted from the bitstream. The first condition (2002) is given by the following pseudo code: ( IntraSubPartitionsSplitType = = ISP_NO_SPLIT && !( cu sbt flag && ( ( sub TuIndex = = 0 && cu sbt_pos flag) ( subTulndex == I && !cu sbt pos flag) ) ) ) II ( IntraSubPartitionsSplitType!= ISP NO_SPLIT && ( subTuIndex < NumIntraSubPartitions -1) ) This condition means that the TU CBF flags for Chroma are decoded: - if cu sbt flag is false, - or if cu _ sbt flag is enabled and if the current partition is the second partition _ - or if the TU is Intra and if ISP is enabled and if it is not the last partition of the block.
When these conditions are all false, tu_cbf cb and tu_cbf cr are both set equal to 0.
When the tu cbf cb (2003) tu obf or (2004) are decoded or not (2002), a second condition of Luma is tested (2005). This condition is given by the following pseudo code: ( IntraSubPartitionsSplitType = = ISP NO SPLIT && 1( cu_sbt_flag && ( ( subTulndex = = 0 && cu sbt_pos flag) ( subTulndex == I && !cu sbt pos flag) ) ) ) II ( IntraSubPartitionsSplitType!= ISP NO SPLIT && ( subTulndex <NumIntraSubPartitions -1 I!InferTuCbfLuma) ) This condition means that the TU CBF flags for Luma is decoded: - if cu sbt flag is false, - or if cu sbt flag is enabled and if the current partition is the second partition - or if the TU is Intra and if ISP is enabled and if it is not the last partition of the block.
- or if the InferTuCbfLuma is false.
This condition is the same as for Chroma TU CBF except the condition "if InferTuCbfLuma is false". The varaible InferTuCbfLuma have been set equal to 1 (1918) in the transform tree process. And it is set equal to false when at least one partition has a tu cbf cb set equal to false.
In addition to this conditions (2005), if the current CU is Infra and if tu_ cbf cb is
_
different to 0 or tu cbf cr is different to 0 (2006), the tu cbf luma is extracted to the bitstream (2007). If the current CU is Inter and if both tu cbf cr and tu cbf cb are set equal 0 the tu _ cbf luma is not decoded and inferred to be equal to 1 (2018).
_
If none of all these are true, and IntraSubPartitionsSplitType is equal to ISP NO SPLIT, the tu cbf luma are set equal to 0 otherwise it is set equal to 1.
When TU CBF flags have been set, the transform-unit process decode if needed (2008) (2010) some information related to delta QP (2009) and/or transform Skip (2011).
If the tu _ cbf luma is true (2012), the data related to the Luma residual for the current block are decoded (2013).
If the tu cbf cb is true (2014), the data related to the cb component residual for the current block are decoded (2015).
If the tu _ cbf cr is true (2016), the data related to the cr component residual for the current block are decoded (2017).
The residual compression process of the current Version of VVC isn't able to encode zero residual. Indeed, each residual shall contain at least one coefficient different from 0 because the first coded coefficient is always different to 0. In such a way, the encoder may not evaluate all of the motion prediction modes, leading to complexity increase, coding efficiency decrease and an increase in encoder runtime.
Signal Residual CIIP with TU CBF So as to solve at least some of the problems discussed above, the no residual for CIIP is signalled by using the TU CBF flags Figure 21 illustrates this embodiment. This Figure is based on Figure 20 illustrating the transform unit process. Compared to Figure 20, a new condition (2019) has been added. When the condition on Luma is true (2005) and if the CU is CIIP (2019), the tu cbf luma (2007) is decoded even if both tu _ cbf cb and tu _cbf cbf cr are equal to zero. In that case, when the 3 TU
_ _
CBF flags are false, there is no residual for the CRP MERGE CU.
A suitable encoder implementation may be required to obtain the full advantages of this solution -appropriate encoder implementations are described below in the section 'encoder'.
Specific context for TU CBF for CIIP In a similar manner as discussed above, two separated contexts for the tu cbf luma may be used. One when the current CU is CIIP and one or more for the other to cbf luma. Alternatively, two separated contexts for the tu cbf luma may be used. One when the current CU is CIIP and when tu _ cbf cr and tu _ cbf cb are both equal to 0 and one or more for
_ _
the other to cbf luma.
The advantage of using two separated contexts is an increase in coding efficiency. Signal CIIP as an Intra mode In one embodiment the CIIP Merge mode is considered as an Intra mode and not as an 10 Inter mode. Consequently, the three TU CBF are extracted from the bitstream in the Transform unit process (cf. Figure 20). Indeed the to cbf luma is not inferred to be equal to I (2018) when the CU is intra (2006).
In such a way, it is possible to encode a zero residual for CIIP mode by decoding (as opposed to inferring) tu cbf luma, to cbf cb and to cbf cr all equal to zero and consequently to have a residual equal to zero for the CIIP mode. This leads to a decrease in complexity, increase in coding efficiency, and reduction in encoder runtime.
A suitable encoder implementation may be required to obtain the full advantages of this solution -appropriate encoder implementations are described below in the section 'encoder'. Residual Coding When using transform coefficients, residuals often have statistical biases on how they are spread in the TU. Correspondingly, scanning them in particular ways allows exploitation these biases to reduce the amount of bitstream generated. Figure 22 illustrates several aspects of this for HEVC and VVC.
Firstly, coefficients are organized in groups of 4x4 coefficients, commonly referred to as coefficient groups. There are thus four such groups in an 8x8 transform, depicted as the top left, top right, bottom left and bottom right in Figure 22. HEVC entropy coding signals which groups contain data using a so-called coefficient group flag. It should be noted that in HEVC, the position of the last coefficient is transmitted, so the last non-empty coefficient group can be known. Additionally, the first group (top left) is always transmitted. When the block has been transformed, and thus contains transform coefficients, this first group holds the lowest frequencies, such as the DC coefficient.
Then, the order in which coefficients are laid in the bitstream matters too. Firstly, it is in reverse order: the last coefficient is transmitted first. Besides this, there are horizontal and vertical scans for 4x4 and 8x8 TUs for some cases of the prediction mode is INTRA. In other cases, (INTER prediction, other cases of INTRA prediction), the scan is a diagonal one. Figure 22 illustrates the overall design: starting with the last coefficient (its group implicitly being non-empty and the corresponding flag not being transmitted), which for the sake of explanation is in the white group, coefficients are scanned according to the pattern of coefficients inside the group. Once all information for coefficients in said group has been read according to said scan, the next group is being tested, which is always the green one: the coefficient scan order is thus also applied to the CGs.
In any case, for each group that is required to be explicitly signalled (i.e. all except the first and last ones), a flag has to be transmitted to determine whether said group holds residual data. This residual is detailed in next section.
Figure 23 illustrates how the residual data is transmitted for a non-empty coefficient group, but also serves to illustrate a 4x4 TU (which contains a single coefficient group that is explicitly transmitted).
In particular, syntax elements named "last significant coefficient coeff x/y" are present to indicate for each TU the position of the last coefficient. More precisely, it allows to derive: -The last coefficient group: as there are no coefficient after the last one, the corresponding coefficient group are empty; -Within that last coefficient group, how many coefficients are present, the others having their 16 coefficients explicitly signalled.
Then, for each transmitted coefficient of the group according to the scan, a flag called "significant coeff flag" sig coeff flag in VVC, indicates whether the coefficient is zero: if it is, no other information is needed to know its value. This is very important, because transform residuals are very sparse after quantization, and the 0 value is the most common one. Indeed, it is the one that is relevant to the present invention.
Now that all non-zero coefficients are known, four iterative so-called maps of sequential information exist: each new map provides information about what coefficients need more information, i.e. about the next map.
Those maps are: - Whether each coefficient transmitted is non-zero (Mist coeff flag"): the decoder will have the complete map of flags decoded before moving to the next level - For each non-zero coefficient, whether the coefficient magnitude (absolute value) is greater than 1 ("coeff abs level areaterl flag"); - For those greater than 1, if it is greater than 2 ("coeff absievel_greater2_flag-); - For those greater than 2, the remainder of the magnitude (i.e. for a coefficient of magnitude "level", it is level-3) with a specific family of entropic codes (-Exponential-Golomb code of order 3", whose details are not important to the present invention); -For all non-zero coefficient, the sign of the coefficient ("coeff sign_flag").
Each level of information is iteratively determined as it needs the previous one, and each level produces a so-called map.
In the current VVC standard, there are 4 syntax elements to signal the last significant coefficient position: last sig coeff x_prefix, last sig coeff_y_prefix, last sig coeff x suffix, last sig coeff_y suffix, instead of 2 in FIEVC. There is also several restriction depending on the transform size for the usage of these flags. In the current version of the VVC software, the transformed block contains at least one coefficient different to 0. Indeed if the last coefficient is the coefficient with the coordinates (0, 0) it means that the first coefficient of the transform is coded.
In the current version of VVC, it is impossible to have the first sig coeff flag of this list of coefficients (from the last significant coefficient to the first coefficient) equal to O. Indeed it is inferred equal to I and not extracted from the bitstream. Consequently, it is impossible to have a residual equal to zero with this coding.
Figure 24 schematically illustrates a simplified residual coding process. First the syntax elements related to the x position of the last significant coefficient position (last sig coeff x_prefix, last sig coeff x suffix) are decoded (2401). In the same way, the syntax elements related to the y position of the last significant coefficient position (last sig coeff v prefix, last sig coeff_y suffix) are decoded (2402). These syntax elements are used to obtain the last significant position (2403). Then the coefficients from this last significant coefficient to the position 0, (top left coefficient (DC coefficient)) are determined (2404). If the sig coef flag is inferred (2405), it is set equal to I (24(2407). The sig coef flag is inferred when it is sure that the current coefficient is different to 0. For example, when the current position is the last significant position. Indeed, by definition, the last significant coefficient is not equal to 0. If it is not inferred (2405), the sig coef flag is decoded (2406). If sig_coef flag is equal to 1 the rest of the current coefficient is extracted from the bitstream (2409).
Allow a Residual equal to 0 in residual coding process As discussed above, the residual coding is modified to allow a residual equal to zero when CRP is enabled. Figures 25 and 26 show two ways in which this can be achieved so as to decrease complexity, increase coding efficiency, and reduce encoder runtime.
Additional flag after the last significant coefficient signalling.
In one embodiment, to signal the residual of a CRP CU, the last significant coefficient is set equal to the first coefficient in order to signal only one coefficient and an additional flag is transmitted This flag signals whether or not the current block contains a residual different to 0.
Figure 25 illustrates this embodiment. Compared to Figure 24 the modules 2410, 2411, 2412 and 2413 have been added.
If the last significant position is equal to 0 (2410) and if the CU is CLIP (2411) a flag is decoded (2412) to know if the residual is equal to 0 (2413). If it is not the case the loop on coefficients will not processed (2404) and no other residual data will be read for the current 15 block.
sig_coeff signalling when the last coefficient is the first coefficient In one embodiment, when the last significant coefficient is equal to the first coefficient and when the current residual is the residual of a CIIP CU, the coefficient the sig_coeff flag is explicitly signalled.
Figure 26 illustrates this embodiment. In this figure when the sig_coeff flag is inferred (2405), it is verify that the current CU is CIIP (2411) If it is not CIIP the sig_coeff flag is set equal to 1 (2407) as usual. If the CU is CIIP the sig_coeff flag is decoded (2406).
Encoder CIIP selection in VENT (encoding choice) Figure 27 illustrates the VTM software implementation for the evaluation of the CIIP mode. The CIIP Merge mode is evaluated in the same process as the Regular Merge and Skip modes and in the same process as the MIMVD Merge and Skip modes. The Affine and Triangle modes are evaluated in separate processes as all other Inter and Intra modes.
First the variable BestlsSkip is set equal to true if the current variable BestCU is a Skip CU (2701). The variable BestCU represents the best set of coding parameters among all already evaluated coding parameters for the current CU (eg. Current block size). If the variable BestlsSkip is false or if the CIIP mode is enabled for the current CU (2702), a first "fast" RD loop is computed (2705). The CIIP is enabled for the current CU if the following condition is true: CUwidth * CUheight < 64 or CUwidth >= 128 or CUheight >= MAX_128_SIZE) The "fast" RD loop (2705) consists in evaluating the SAD and an estimated rate (2706) of each possible Merge candidates among all possible candidates between the regular, MMVD and CLIP Merge modes. This loop orders these possible candidates from the best candidate (in term of RD cost) to the worst in a list Cand[] (2707). It should be noted that in this loop the best Intra predictor for each possible CIIP merge candidate is also selected. At the end of this fast loop, the ordered Merge candidates have been set (2708), and the Maximum number for the Full RD evaluation is set to 4, or 5 if CIIP is allowed for the current CU size (2709). If CIIP is disabled and if the best parameters set for the current CU is the Skip mode, the maximum number of candidates for the Full RD loop is set equal to 70 (2704) (6 for Regular 15 MRG + 64 MMVD) if the best mode is MMVD or to 6 if not.
There are 2 passes for the Full RD loop (2710). In the first one, the candidates are evaluated with their residuals and in the second one without their residuals. Each candidate index idx from 0 to MAX (2711) in the list Cand[] is evaluated. If the current pass is the no residual pass and if the Cand[Idx] is a CIIP candidate (2712), the candidate Cand[idx] is replaced by its corresponding Regular Merge candidate (2713). (So it is the CIIP Merge candidate but not averaged with the Intra predictor). If the pass is the residual pass and if the variable BestlsSkip is true (2714), the next candidate will be proceed (2711). (Please note that all candidates will be not processed in that case so it corresponds to a switch to the next pass without residual). Otherwise the Full RD evaluation process (2715) gives the best parameters set BestCU (2716). If the BestCU is not CUP, the variable BestlsSkip is set equal to true if the cu cbf of the best parameters set is equal to 0 (2718).
The full RD evaluation 2715 is detailed in Figure 28. First, the block residual is encoded with the Rate-distortion optimized quantization (RDOQ) process (2802) if the evaluated mode is not the Skip mode. Please note, that RDOQ can provide an encoded residual equal to 0. Then all syntax elements are fully coded with the real CABAC states (2803). If current parameters set is a CIIP and if the cu cbf is equal to 0, the RD cost for the current parameters set is set equal to the Maximum possible value (2805) and the process is ended (2809). Otherwise the RD cost is computed (2806), and if the RD cost is inferior to the best RD cost (2807) the current mode and all related parameters are the new BestCU parameters set (2808).
It should be appreciated that Figures 27 and 28 represent summaries of the encoding process and the VTIVI software contains several other tricks for different modes, tools, etc which are omitted for the sake of brevity.
When implementing one or more of the decoder embodiments described above, it is possible to reduce significantly the encoding time and increase the coding efficiency -and such advantages are assisted and increased by suitable encoding choices.
Allow BestlSSkip if CIIP has no residual In one embodiment when any of the previous embodiments are enabled, the encoding process set BestlsSkip for CIIP CU if this CU has no residual. In such a way, the encoder can skip the RD evaluation of the residual coding and as such decrease encoder runtime.
For example, when the embodiments described above with reference to Figure 13 or 14 are enabled the condition is: "if CU is CIIP and cu cbf is equal to 0". In that case the module 2717 and 2719 can be removed in Figure 29 (below).
When the embodiments described above with reference to Figure 21 is enabled, the condition is: "if CU is CIIP and to cbf luma is equal to 0 and to cbf cb is equal to 0 and to cbf cr is equal to 0".
When the embodiments described above with reference to Figures 25 or 26 are enabled, the condition become if CU is CUP and if the residual is equal to O. This process can be performed at one or more different places in the Merge process selection -as described below with reference to Figures 29 to 31.
In the second loop of Merge process selection Figure 29 illustrates this embodiment to set the BestlsSkip variable in the second loop of the Merge candidate estimation. Compared to Figure 27 the module 2719 have been added. In this module when the BestCU is a CIIP Merge, the BestlsSkip variable is set equal to 1 if this BestCU has no residue. In that case the encoding time is reduced because the Full RD pass will not test new residual for the other candidates in the list Cand[].
At the beginning of the Merge estimation process Figure 30 illustrates this embodiment to set the BestlsSkip variable at the beginning of the SAO merge estimation process. Compared to Figure FF the First Step 2701 has been updated to set BestlsSkip to 1 when the BestCU is CIIP and Best CU has no residue. Please note that at the beginning of the process BestCU contain the best parameters for the same size or and higher block size.
The variable BestlSSkip is used for each Inter mode evaluation. The formula of 2701 can be applied to all these BestIsSkip. In this formula the varaibel BestlsSkip is set equal to true is BestCU is a Skip mode and to false otherwise. In addition if the BestCU is CIIP and if there is no residual, the variable BestIsSkip is set equal to true.
Figure 31 illustrates the combination of the two encoder embodiments described above with reference to Figures 29 and 30.
Each of these embodiments provide a reduction in encoder run time.
Allow CIIP without residual in full RD loop In one embodiment CIIP is allowed when the Rate Distortion (RD) loop found no residual as the BestCU parameters set. Figure 32 illustrates this embodiment. In this Figure compared to Figure 28, the steps 2804 and 2805 are removed.
1() This embodiment provides advantages when one or more of the decoder embodiments are enabled.
In an additional embodiment this embodiment is combined with one of the previous encoder embodiment(s) to provide additional advantages, at a potential cost of increase complexity.
No search of regular N1RG candidate when No residual loop In one embodiment the switch to a regular MERGE when the loop is "No residual" is disabled. Figure 33 illustrates this embodiment. This Figure is based on Figure 29 where the modules 2712 and 2713 have been removed. This results in a less complex encoder.
Change Condition for first loop In one embodiment the first loop is not enabled always when the CLIP is enabled but only when BestISSkip is false. Figure 34 illustrates this embodiment where the condition in 2702 has been changed compared to Figure 27.
Figures 29-34 illustrate that there are numerous encoder-side modifications possible which all reduce encoder runtime.
Implementation of the invention Figure 35 shows a system 191 195 comprising at least one of an encoder 150 or a decoder 100 and a communication network 199 according to embodiments of the present invention. According to an embodiment, the system 195 is for processing and providing a content (for example, a video and audio content for displaying/outputting or streaming video/audio content) to a user, who has access to the decoder 100, for example through a user interface of a user terminal comprising the decoder 100 or a user terminal that is communicable with the decoder 100. Such a user terminal may be a computer, a mobile phone, a tablet or any other type of a device capable of providing/displaying the (provided/streamed) content to the user. The system 195 obtains/receives a bitstream 101 (in the form of a continuous stream or a signal -e.g. while earlier video/audio are being displayed/output) via the communication network 199. According to an embodiment, the system 191 is for processing a content and storing the processed content, for example a video and audio content processed for displaying/outputting/streaming at a later time. The system 191 obtains/receives a content comprising an original sequence of images 151, which is received and processed (including filtering with a deblocking filter according to the present invention) by the encoder 150, and the encoder 150 generates a bitstream 101 that is to be communicated to the decoder 100 via a communication network 191. The bitstream 101 is then communicated to the decoder 100 in a number of ways, for example it may be generated in advance by the encoder 150 and stored as data in a storage apparatus in the communication network 199 (e.g. on a server or a cloud storage) until a user requests the content (i.e. the bitstream data) from the storage apparatus, at which point the data is communicated/streamed to the decoder 100 from the storage apparatus. The system 191 may also comprise a content providing apparatus for providing/streaming, to the user (e.g. by communicating data for a user interface to be displayed on a user terminal), content information for the content stored in the storage apparatus (e.g. the title of the content and other meta/storage location data for identifying, selecting and requesting the content), and for receiving and processing a user request for a content so that the requested content can be delivered/streamed from the storage apparatus to the user terminal. Alternatively, the encoder 150 generates the bitstream 101 and communicates/streams it directly to the decoder 100 as and when the user requests the content. The decoder 100 then receives the bitstream 101 (or a signal) and performs filtering with a deblocking filter according to the invention to obtain/generate a video signal 109 and/or audio signal, which is then used by a user terminal to provide the requested content to the user.
Any step ofthe method/process according to the invention or functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the steps/functions may be stored on or transmitted over, as one or more instructions or code or program, or a computer-readable medium, and executed by one or more hardware-based processing unit such as a programmable computing machine, which may be a PC ("Personal Computer"), a DSP ("Digital Signal Processor"), a circuit, a circuitry, a processor and a memory, a general purpose microprocessor or a central processing unit, a microcontroller, an ASIC ("Application-Specific Integrated Circuit"), a field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques describe herein.
Embodiments of the present invention can also be realized by wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of JCs (e.g. a chip set). Various components, modules, or units are described herein to illustrate functional aspects of devices/apparatuses configured to perform those embodiments, but do not necessarily require realization by different hardware units. Rather, various modules/units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors in conjunction with suitable software/firmware.
Embodiments of the present invention can be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium to perform the modules/units/functions of one or more of the above-described embodiments and/or that includes one or more processing unit or circuits for performing the functions of one or more of the above-described embodiments, and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiments and/or controlling the one or more processing unit or circuits to perform the functions of one or more of the above-described embodiments. The computer may include a network of separate computers or separate processing units to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a computer-readable medium such as a communication medium via a network or a tangible storage medium. The communication medium may be a signal/bitstream/carrier wave. The tangible storage medium is a "non-transitory computer-readable storage medium" which may include, for example, one or more of a hard disk, a random-access memory (RANI), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)TM), a flash memory device, a memory card, and the like. At least some of the steps/functions may also be implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").
Figure 36 is a schematic block diagram of a computing device 3600 for implementation of one or more embodiments of the invention. The computing device 3600 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 3600 comprises a communication bus connected to: -a central processing unit (CPU) 3601, such as a microprocessor; -a random access memory (RAM) 3602 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least part of an image according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; -a read only memory (ROM) 3603 for storing computer programs for implementing embodiments of the invention; -a network interface (NET) 3604 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface (NET) 3604 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 3601; -a user interface (UI) 3605 may be used for receiving inputs from a user or to display information to a user; -a hard disk (111D) 3606 may be provided as a mass storage device; -an Input/Output module (10) 3607 may be used for receiving/sending data from/to external devices such as a video source or display. The executable code may be stored either in the ROM 3603, on the HD 3606 or on a removable digital medium such as, for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the NET 3604, in order to be stored in one of the storage means of the communication device 3600, such as the HD 3606, before being executed.
The CPU 3601 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 3601 is capable of executing instructions from main RAM memory 3602 relating to a software application after those instructions have been loaded from the program ROM 3603 or the HD 3606, for example. Such a software application, when executed by the CPU 3601, causes the steps of the method according to the invention to be performed.
It is also understood that according to another embodiment of the present invention, a decoder according to an aforementioned embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a table or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user. According to yet another embodiment, an encoder according to an aforementioned embodiment is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode. Two such examples are provided below with reference to Figures 37 and 38.
FIG. 37 is a diagram illustrating a network camera system 3700 including a network camera 3702 and a client apparatus 202.
The network camera 3702 includes an imaging unit 3706, an encoding unit 3708, a communication unit 3710, and a control unit 3712.
The network camera 3702 and the client apparatus 202 are mutually connected to be able to communicate with each other via the network 200.
The imaging unit 3706 includes a lens and an image sensor (e.g., a charge coupled 10 device (CCD) or a complementary metal oxide semiconductor (CMOS)), and captures an image of an object and generates image data based on the image. This image can be a still image or a video image.
The encoding unit 3708 encodes the image data by using said encoding methods explained above. or a combination of encoding methods described above.
The communication unit 3710 of the network camera 3702 transmits the encoded image data encoded by the encoding unit 3708 to the client apparatus 202.
Further, the communication unit 3710 receives commands from client apparatus 202. The commands include commands to set parameters for the encoding of the encoding unit 3708. The control unit 3712 controls other units in the network camera 3702 in accordance 20 with the commands received by the communication unit 3712.
The client apparatus 202 includes a communication unit 3714, a decoding unit 3716, and a control unit 3718.
The communication unit 3714 of the client apparatus 202 transmits the commands to the network camera 3702.
Further, the communication unit 3714 of the client apparatus 202 receives the encoded image data from the network camera 3712.
The decoding unit 3716 decodes the encoded image data by using said decoding methods explained above, or a combination of the decoding methods explained above.
The control unit 3718 of the client apparatus 202 controls other units in the client apparatus 202 in accordance with the user operation or commands received by the communication unit 3714.
The control unit 3718 of the client apparatus 202 controls a display apparatus 2120 so as to display an image decoded by the decoding unit 3716.
The control unit 3718 of the client apparatus 202 also controls a display apparatus 2120 so as to display GUI (Graphical User Interface) to designate values of the parameters for the network camera 3702 includes the parameters for the encoding of the encoding unit 3708.
The control unit 3718 of the client apparatus 202 also controls other units in the client apparatus 202 in accordance with user operation input to the GUI displayed by the display apparatus 2120.
The control unit 3718 of the client apparatus 202 controls the communication unit 3714 of the client apparatus 202 so as to transmit the commands to the network camera 3702 which designate values of the parameters for the network camera 3702, in accordance with the user operation input to the GUI displayed by the display apparatus 2120.
FIG. 38 is a diagram illustrating a smart phone 3800.
The smart phone 3800 includes a communication unit 3802, a decoding unit 3804, a control unit 3806 and a display unit 3808 the communication unit 3802 receives the encoded image data via network 200.
The decoding unit 3804 decodes the encoded image data received by the communication unit 3802.
The decoding / encoding unit 3804 decodes / encodes the encoded image data by using said decoding methods explained above. above..
The control unit 3806 controls other units in the smart phone 3800 in accordance with a user operation or commands received by the communication unit 3806.
For example, the control unit 3806 controls a display unit 3808 so as to display an image decoded by the decoding unit 3804. The smart phone 3800 may also comprise sensors 3812 and an image recording device 3810. In such a way, the smart phone 3800 may record images, encode the images (using a method described above).
The smart phone 3800 may subsequently decode the encoded images (using a method described above) and display them via the display unit 3808 -or transmit the encoded images to another device via the communication unit 3802 and network 200.
Alternatives and modifications While the present invention has been described with reference to embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. It will be appreciated by those skilled in the art that various changes and modification might be made without departing from the scope of the invention, as defined in the appended claims. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
It is also understood that any result of comparison, determination, assessment, selection, execution, performing, or consideration described above, for example a selection made during an encoding or filtering process, may be indicated in or determinable/inferable from data in a bitstream, for example a flag or data indicative of the result, so that the indicated or determined/inferred result can be used in the processing instead of actually performing the comparison, determination, assessment, selection, execution, performing, or consideration, for example during a decoding process.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.

Claims (38)

  1. CLAIMS1 A method of decoding an image or image portion from a bit stream, the method comprising: determining a motion prediction mode from said bit stream; decoding a residual flag from said bit stream; wherein said motion prediction mode is combined inter-intra prediction (CLIP) and said residual flag indicates the presence or absence of a residual.
  2. 2. A method as claimed in claim 1 further comprising determining a context specific to said residual flag from said bit stream.
  3. 3. A method as claimed in any preceding claim wherein said image portion comprises a block comprising multiple sub-blocks, each sub-block having a corresponding residual flag in said bit stream.
  4. 4. A method as claimed in any preceding claim wherein said bit stream comprises a residual flag corresponding to a colour component.
  5. 5. A method as claimed in claim 4 wherein said colour component comprises a luma component.
  6. 6. A method as claimed in claim 5 wherein said bit stream further comprises a residual flag corresponding to a chroma component.
  7. 7 A method as claimed in claim 6 wherein said residual is indicated to be zero if both the luma and chroma residual flags indicate zero residual.
  8. 8. A method as claimed in any of claims 3 to 7 further comprising determining a context specific to each residual flag from said bit stream.
  9. 9. A method as claimed in any preceding claim wherein said residual flag corresponds to the coding unit for the image portion being decoded.
  10. 10. A method as claimed in claim 9 further comprising decoding a residual flag corresponding to a sub-block after decoding said residual flag corresponding to the coding unit in dependence on the value of the residual flag corresponding to the coding unit.
  11. 11. A method as claimed in claim 10 wherein said residual flag corresponding to a sub-block is decoded if the residual flag corresponding to the coding unit indicates no residual.
  12. 12. A method as claimed in claim 10 wherein said residual flag corresponding to a sub-block is decoded if the residual flag corresponding to the coding unit indicates a residual and said motion prediction mode is MERGE CIIP.
  13. 13 A method as claimed in any preceding claim wherein determining said residual flag comprises determining the status of a Skip mode from said bit stream.
  14. 14. A method as claimed in claim 13 comprising determining a context for said residual flag, said context depending on the status of the Skip mode.
  15. 15. A method as claimed in claim 13 or 14 wherein determining the CLIP motion prediction mode occurs after determining that the CU is not another merge mode.
  16. 16. A method of decoding an image or image portion from a bit stream, the method comprising: determining a motion prediction mode from said bit stream; determining the value of a last significant coefficient; wherein said last significant coefficient indicates whether or not the residual is different to zero.
  17. 17. A method as claimed in claim 16 further comprising setting a residual flag in dependence on the value of said last significant coefficient.
  18. 18. A method as claimed in claim 17 comprising decoding said residual flag if the last significant position is equal to zero.
  19. 19. A method as claimed in claim 17 or 18 comprising inferring said residual flag prior to determining the motion prediction mode, the method further comprising determining whether the motion prediction mode is CIIP and in the case of said determining, decoding the residual flag from the bitstream.
  20. A method as claimed in claim 19 wherein the last significant coefficient is equal to the first coefficient.
  21. 21 A method as claimed in any of claims 16 to 20 wherein said residual flag is determined based on the value of sig coeff flag.
  22. 22. A method of encoding an image or image portion into a bit stream, the method comprising: determining a motion prediction mode for said image or image portion; encoding a residual flag into said bit stream; wherein said motion prediction mode is Combined Inter Intra Prediction (CIIP) mode and said residual flag indicates the presence or absence of a residual.
  23. 23. A method as claimed in claim 22 wherein determining a motion prediction mode comprises determining a mode with the lowest rate distortion.
  24. 24. A method as claimed in claim 22 or 23 further comprising setting a preference for Skip mode if said residual flag indicates the absence of a residual.
  25. 25. A method as claimed in claim 24 wherein said residual flag corresponds to the coding unit For the image portion being encoded.
  26. 26. A method as claimed in claim 24 wherein said residual flag comprises multiple flags corresponding to the chroma and luma components.
  27. 27. A method as claimed in claim 26 wherein setting said preference for Skip mode is performed when all of the residual flags corresponding to the chroma and luma components indicate the absence of a residual.
  28. 28. A method as claimed in any of claims 24 to 27 wherein setting a preference for Skip mode comprises setting a BestIsSkip variable.
  29. 29. A method as claimed in any of claims 24 to 28 wherein said setting a preference for Skip mode occurs after determining that the coding mode resulting in the lowest rate distortion has a zero residual.
  30. 30. A method as claimed in claim 29 wherein said setting a preference for Skip mode is equal to true after determining that the coding mode resulting in the lowest rate distortion is a Skip mode and to false otherwise.
  31. 31. A method as claimed in claim 29 wherein said setting a preference for Skip mode is equal to true after determining that the coding mode resulting in the lowest rate distortion is CIIP and if there is no residual.
  32. 32. A method as claimed in any of claims 24 to 29 wherein said setting a preference for Skip mode occurs at the beginning of a merge estimation process.
  33. 33. A method as claimed in any of claims 22 to 32 comprising evaluating said CIIP mode when said residual flag indicates the absence of a residual.
  34. 34. A method as claimed in any of claims 22 to 32 wherein determining a motion prediction mode for said image or image portion comprises determining that a variable indicating a preference for Skip mode is false.
  35. 35. A device for decoding an image or image portion from a bit stream, comprising: means for determining a motion prediction mode from said bit stream; means for decoding a residual flag from said bit stream; wherein said motion prediction mode is combined inter-intra prediction (CIIP) and said residual flag indicates the presence or absence of a residual.
  36. 36. A device for decoding an image or image portion from a bit stream, comprising: means for determining a motion prediction mode from said bit stream; means for determining the value of a last significant coefficient; wherein said last significant coefficient indicating whether or not the residual is different to zero.
  37. 37 A device for encoding an image or image portion into a bit stream, comprising: means for determining a motion prediction mode for said image or image portion; means for encoding a residual flag into said bit stream; wherein said motion prediction mode is Combined Inter Intra Prediction (CIIP) mode and said residual flag indicates the presence or absence of a residual.
  38. 38. A program which, when executed by a computer or processor, causes the computer or processor to carry out the method of any one of claims 1 to 34.
GB1904969.1A 2019-04-08 2019-04-08 Residual signalling Withdrawn GB2582929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1904969.1A GB2582929A (en) 2019-04-08 2019-04-08 Residual signalling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1904969.1A GB2582929A (en) 2019-04-08 2019-04-08 Residual signalling

Publications (2)

Publication Number Publication Date
GB201904969D0 GB201904969D0 (en) 2019-05-22
GB2582929A true GB2582929A (en) 2020-10-14

Family

ID=66809443

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1904969.1A Withdrawn GB2582929A (en) 2019-04-08 2019-04-08 Residual signalling

Country Status (1)

Country Link
GB (1) GB2582929A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220021894A1 (en) * 2019-04-09 2022-01-20 Beijing Dajia Internet Information Technology Co., Ltd. Methods and apparatuses for signaling of merge modes in video coding
US12096022B2 (en) * 2019-06-19 2024-09-17 Lg Electronics Inc. Image decoding method for performing inter-prediction when prediction mode for current block ultimately cannot be selected, and device for same

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11611759B2 (en) * 2019-05-24 2023-03-21 Qualcomm Incorporated Merge mode coding for video coding
KR20210149856A (en) * 2019-06-19 2021-12-09 엘지전자 주식회사 Method and apparatus for removing redundant signaling in video/video coding system
US20220239918A1 (en) * 2019-06-23 2022-07-28 Lg Electronics Inc. Method and device for syntax signaling in video/image coding system
CN114009016A (en) * 2019-06-23 2022-02-01 Lg 电子株式会社 Method and apparatus for removing redundant syntax from merged data syntax
CN118631997A (en) * 2019-06-25 2024-09-10 华为技术有限公司 Inter-frame prediction method and device
KR20210000689A (en) * 2019-06-25 2021-01-05 한국전자통신연구원 Method and Apparatus for Image Encoding and Decoding Thereof
EP3881530A4 (en) * 2019-08-23 2022-09-28 Tencent America LLC Method and apparatus for video coding
WO2021138476A1 (en) * 2019-12-30 2021-07-08 Beijing Dajia Internet Information Technology Co., Ltd. Coding of chrominance residuals

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012139192A2 (en) * 2011-04-15 2012-10-18 Research In Motion Limited Methods and devices for coding and decoding the position of the last significant coefficient
CN107995489A (en) * 2017-12-20 2018-05-04 北京大学深圳研究生院 A kind of combination forecasting method between being used for the intra frame of P frames or B frames

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012139192A2 (en) * 2011-04-15 2012-10-18 Research In Motion Limited Methods and devices for coding and decoding the position of the last significant coefficient
CN107995489A (en) * 2017-12-20 2018-05-04 北京大学深圳研究生院 A kind of combination forecasting method between being used for the intra frame of P frames or B frames

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
(AV1) AV1 Bitstream and Decoding Specification *
(CHIANG et al) CE10.1.1: Multi-hypothesis prediction for improving AMVP mode, skip or merge mode, and intra mode. JVET-L100, Macao October 2018. *
Versatile Video Coding Draft 5 (JVET-N1001), April 2019. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220021894A1 (en) * 2019-04-09 2022-01-20 Beijing Dajia Internet Information Technology Co., Ltd. Methods and apparatuses for signaling of merge modes in video coding
US12096022B2 (en) * 2019-06-19 2024-09-17 Lg Electronics Inc. Image decoding method for performing inter-prediction when prediction mode for current block ultimately cannot be selected, and device for same

Also Published As

Publication number Publication date
GB201904969D0 (en) 2019-05-22

Similar Documents

Publication Publication Date Title
TWI782904B (en) Merging filters for multiple classes of blocks for video coding
GB2582929A (en) Residual signalling
JP7530465B2 (en) Video Encoding and Decoding
JP7514236B2 (en) Video Encoding and Decoding
JP7514345B2 (en) Motion Vector Predictor Index Coding in Video Coding
GB2585017A (en) Video coding and decoding
US20220337814A1 (en) Image encoding/decoding method and device using reference sample filtering, and method for transmitting bitstream
US11743469B2 (en) Image encoding/decoding method and apparatus for selectively encoding size information of rectangular slice, and method for transmitting bitstream
GB2585019A (en) Residual signalling
JP7413576B2 (en) Video encoding and decoding
JP7321345B2 (en) video encoding and decoding
GB2585018A (en) Residual signalling
GB2611367A (en) Video coding and decoding
GB2628209A (en) Image and video coding and decoding
WO2023202956A1 (en) Video coding and decoding
GB2617626A (en) Data coding and decoding
GB2597616A (en) Video coding and decoding
GB2589735A (en) Video coding and decoding

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)