WO2021055640A1 - Procédés et appareils pour modes de codage sans perte dans un codage vidéo - Google Patents

Procédés et appareils pour modes de codage sans perte dans un codage vidéo Download PDF

Info

Publication number
WO2021055640A1
WO2021055640A1 PCT/US2020/051326 US2020051326W WO2021055640A1 WO 2021055640 A1 WO2021055640 A1 WO 2021055640A1 US 2020051326 W US2020051326 W US 2020051326W WO 2021055640 A1 WO2021055640 A1 WO 2021055640A1
Authority
WO
WIPO (PCT)
Prior art keywords
decoder
coding
residual block
lossless
transform
Prior art date
Application number
PCT/US2020/051326
Other languages
English (en)
Inventor
Tsung-Chuan MA
Xianglin Wang
Yi-Wen Chen
Xiaoyu XIU
Original Assignee
Beijing Dajia Internet Information Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co., Ltd. filed Critical Beijing Dajia Internet Information Technology Co., Ltd.
Priority to CN202080054161.2A priority Critical patent/CN114175653B/zh
Publication of WO2021055640A1 publication Critical patent/WO2021055640A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • This disclosure is related to video coding and compression. More specifically, this disclosure relates to methods and apparatus for lossless coding in video coding.
  • Video coding is performed according to one or more video coding standards.
  • video coding standards include versatile video coding (VVC), joint exploration test model (JEM), high- efficiency video coding (H.265/HEVC), advanced video coding (H.264/AVC), moving picture expert group (MPEG) coding, or the like.
  • Video coding generally utilizes prediction methods (e.g., inter-prediction, intra-prediction, or the like) that take advantage of redundancy present in video images or sequences.
  • An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate, while avoiding or minimizing degradations to video quality.
  • Examples of the present disclosure provide methods and apparatus for lossless coding in video coding.
  • a method for decoding a video signal may include a decoder obtaining a plurality of coding units (CUs) that may include a lossless CU.
  • the decoder may also acquire at least one partially reconstructed absolute level in a local neighborhood of the lossless CU.
  • the decoder may further select a context model independent of a scalar quantizer state and based on the at least one partially reconstructed absolute level.
  • a method for decoding a video signal may include a decoder obtaining a plurality of CUs that may include a lossless CU.
  • the decoder may also acquire a transform block (TB) based on the lossless CU.
  • the decoder may further acquire a maximum number of context coded bin (CCB) for the TB.
  • the maximum number of CCB may be greater than a number of samples within the TB after coefficient zero-out times a preset value.
  • a method for decoding a video signal may include a decoder obtaining a plurality of CUs that may include a lossless CU.
  • the decoder may also a determine that a transform coefficient coding scheme is applied to code a residual block based on the lossless CU.
  • the decoder may further signal a sign flag of transform coefficients as CCB using the transform coefficient coding scheme
  • a method for decoding a video signal may include a decoder obtaining a plurality of coding units (CUs).
  • the decoder may also acquire a residual block based on the plurality of CUs.
  • the decoder may further adaptively rotate the residual block based on predefined procedures.
  • the predefined procedures may be followed by both an encoder and decoder.
  • a method for decoding a video signal may include a decoder obtaining a plurality of CUs that may include a lossless CU.
  • the decoder may also determine that a transform coefficient coding scheme is applied based on the lossless CU.
  • the decoder may further set a scanning order of residual block samples in the transform coefficient coding scheme to a scanning order used in residual coding scheme under transform skip mode in order to align the scanning order of both coding schemes.
  • a method for decoding a video signal may include a decoder obtaining a plurality of CUs.
  • the decoder may also obtain a last non-zero coefficient based on a coefficient zero-out operation applied to the plurality of CUs.
  • the decoder may further select a context model for coding a position of the last non-zero coefficient based on a reduced transform unit (TU) pixel size in order to reduce a total number of contexts used for coding last non-zero coefficient.
  • TU reduced transform unit
  • a computing device may include one or more processors, a non-transitory computer-readable memory storing instructions executable by the one or more processors.
  • the one or more processors may be configured to obtain a plurality of CUs comprising a lossless CU.
  • the one or more processors may also be configured to acquire at least one partially reconstructed absolute level in a local neighborhood of the lossless CU.
  • the one or more processors may further be configured to select a context model independent of a scalar quantizer state and based on the at least one partially reconstructed absolute level [0012]
  • a computing device is provided.
  • the computing device may include one or more processors, a non-transitory computer-readable memory storing instructions executable by the one or more processors.
  • the one or more processors may be configured to obtain a plurality of CUs comprising a lossless CU.
  • the one or more processors may also be configured to acquire a transform block (TB) based on the lossless CU.
  • the one or more processors may further be configured to acquire a maximum number of CCB for the TB.
  • the maximum number of CCB may be greater than a number of samples within the TB after coefficient zero-out times a preset value.
  • a computing device may include one or more processors, a non-transitory computer-readable memory storing instructions executable by the one or more processors.
  • the one or more processors may be configured to obtain a plurality of CUs comprising a lossless CU.
  • the one or more processors may also be configured to determine that a transform coefficient coding scheme is applied to code a residual block based on the lossless CU.
  • the one or more processors may further be configured to signal a sign flag of transform coefficients as context-coded bin (CCB) using the transform coefficient coding scheme.
  • CCB context-coded bin
  • a non-transitory computer- readable storage medium having stored therein instructions When the instructions are executed by one or more processors of the apparatus, the instructions may cause the apparatus to obtain a plurality of CUs. The instructions may also cause the apparatus to acquire a residual block based on the plurality of CUs. The instructions may further cause the apparatus to adaptively rotate the residual block based on predefined procedures. The predefined procedures may be followed by both an encoder and decoder.
  • a non-transitory computer- readable storage medium having stored therein instructions.
  • the instructions may cause the apparatus to obtain a plurality of CUs comprising a lossless CU.
  • the instructions may also cause the apparatus to determine that a transform coefficient coding scheme is applied based on the lossless CU.
  • the instructions may further cause the apparatus to set a scanning order of residual block samples in the transform coefficient coding scheme to a scanning order used in residual coding scheme under transform skip mode in order to align the scanning order of both coding schemes.
  • a non-transitory computer- readable storage medium having stored therein instructions.
  • the instructions may cause the apparatus to obtain a plurality of CUs.
  • the instructions may also cause the apparatus to obtain a last non-zero coefficient based on a coefficient zero-out operation applied to the plurality of CUs.
  • the instructions may further cause the apparatus to select a context model for coding a position of the last non-zero coefficient based on a reduced TU pixel size in order to reduce a total number of contexts used for coding last non-zero coefficient.
  • FIG. 1 is a block diagram of an encoder, according to an example of the present disclosure.
  • FIG. 2 is a block diagram of a decoder, according to an example of the present disclosure.
  • FIG. 3A is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3B is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3C is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3D is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3E is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 4 is a diagram illustration of a picture with 18 by 12 luma CTUs, according to an example of the present disclosure.
  • FIG. 5 is an illustration of a picture with 18 by 12 luma CTUs, according to an example of the present disclosure.
  • FIG. 6A is an illustration of an example of disallowed ternary tree (TT) and binary tree (BT) partitioning in VTM, according to an example of the present disclosure.
  • TT ternary tree
  • BT binary tree
  • FIG. 6B is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 6C is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 6D is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 6E is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 6F is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 6G is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 6H is an illustration of an example of disallowed TT and BT partitioning in VTM, according to an example of the present disclosure.
  • FIG. 7 is an illustration of a residual coding structure for transform blocks, according to an example of the present disclosure.
  • FIG. 8 is an illustration of a residual coding structure for transform skip blocks, according to an example of the present disclosure.
  • FIG. 9 is an illustration of two scalar quantizers, according to an example of the present disclosure.
  • FIG. 10A is an n illustration of state transition, according to an example of the present disclosure.
  • FIG. 10B is an illustration of quantizer selection, according to an example of the present disclosure.
  • FIG. 11 is an illustration of a template used for selecting probability models, according to the present disclosure.
  • FIG. 12 is an illustration of a decoding side motion vector refinement, according to the present disclosure.
  • FIG. 13 is a method for decoding a video signal, according to an example of the present disclosure.
  • FIG. 14 is a method for decoding a video signal, according to an example of the present disclosure.
  • FIG. 15 is a method for decoding a video signal, according to an example of the present disclosure.
  • FIG. 16 is a method for decoding a video signal, according to an example of the present disclosure.
  • FIG. 17 is a method for decoding a video signal, according to an example of the present disclosure.
  • FIG. 18 is a method for decoding a video signal, according to an example of the present disclosure.
  • FIG. 19 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.
  • first, second, third, etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information.
  • first information may be termed as second information; and similarly, second information may also be termed as first information.
  • second information may also be termed as first information.
  • the term “if’ may be understood to mean “when” or “upon” or “in response to a judgment” depending on the context.
  • the first version of the HEVC standard was finalized in October 2013, which offers approximately 50% bit-rate saving or equivalent perceptual quality compared to the prior generation video coding standard H.264/MPEG AVC.
  • the HEVC standard provides significant coding improvements than its predecessor, there is evidence that superior coding efficiency can be achieved with additional coding tools over HEVC.
  • JVET Joint Video Exploration Team
  • One reference software called joint exploration model (JEM) was maintained by the JVET by integrating several additional coding tools on top of the HEVC test model (HM).
  • VVC test model VTM
  • FIG. 1 shows a general diagram of a block-based video encoder for the VVC. Specifically, FIG. 1 shows atypical encoder 100.
  • the encoder 100 has video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related info 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.
  • a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction approach or an intra prediction approach.
  • a prediction residual representing the difference between a current video block, part of video input 110, and its predictor, part of block predictor 140, is sent to a transform 130 from adder 128.
  • Transform coefficients are then sent from the Transform 130 to a Quantization 132 for entropy reduction.
  • Quantized coefficients are then fed to an Entropy Coding 138 to generate a compressed video bitstream.
  • prediction related information 142 from an intra/inter mode decision 116 such as video block partition info, motion vectors (MVs), reference picture index, and intra prediction mode, are also fed through the Entropy Coding 138 and saved into a compressed bitstream 144.
  • Compressed bitstream 144 includes a video bitstream.
  • decoder-related circuitries are also needed in order to reconstruct pixels for the purpose of prediction.
  • a prediction residual is reconstructed through an Inverse Quantization 134 and an Inverse Transform 136.
  • This reconstructed prediction residual is combined with a Block Predictor 140 to generate un-filtered reconstructed pixels for a current video block.
  • Spatial prediction uses pixels from samples of already coded neighboring blocks (which are called reference samples) in the same video frame as the current video block to predict the current video block.
  • Temporal prediction uses reconstructed pixels from already-coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
  • the temporal prediction signal for a given coding unit (CU) or coding block is usually signaled by one or more MVs, which indicate the amount and the direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.
  • Motion estimation 114 intakes video input 110 and a signal from picture buffer 120 and output, to motion compensation 112, amotion estimation signal.
  • Motion compensation 112 intakes video input 110, a signal from picture buffer 120, and motion estimation signal from motion estimation 114 and output to intra/inter mode decision 116, a motion compensation signal.
  • an intra/inter mode decision 116 in the encoder 100 chooses the best prediction mode, for example, based on the rate- distortion optimization method.
  • the block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is de-correlated using the transform 130 and the quantization 132.
  • the resulting quantized residual coefficients are inverse quantized by the inverse quantization 134 and inverse transformed by the inverse transform 136 to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
  • in-loop filtering 122 such as a deblocking filter, a sample adaptive offset (SAO), and/or an adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage of the picture buffer 120 and used to code future video blocks.
  • coding mode inter or intra
  • prediction mode information motion information
  • quantized residual coefficients are all sent to the entropy coding unit 138 to be further compressed and packed to form the bitstream.
  • FIG. 1 gives the block diagram of a generic block-based hybrid video encoding system.
  • the input video signal is processed block by block (called coding units (CUs)).
  • CUs coding units
  • VTM-1.0 a CU can be up to 128x128 pixels.
  • CTU coding tree unit
  • coding tree block (CTB) is an NxN block of samples for some value of N such that the division of a component into CTBs is a partitioning.
  • CTU includes a CTB of luma samples, two corresponding CTBs of chroma samples of a picture that has three sample arrays, or a CTB of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples. Additionally, the concept of multiple partition unit type in the HEVC is removed, i.e., the separation of CU, prediction unit (PU) and transform unit (TU) does not exist in the VVC anymore; instead, each CU is always used as the basic unit for both prediction and transform without further partitions. In the multi-type tree structure, one CTU is firstly partitioned by a quad-tree structure. Then, each quad-tree leaf node can be further partitioned by a binary and ternary tree structure.
  • FIG. 3A, 3B, 3C, 3D, and 3E there are five splitting types, quaternary partitioning, horizontal binary partitioning, vertical binary partitioning, horizontal ternary partitioning, and vertical ternary partitioning.
  • FIG. 3 A shows a diagram illustrating block quaternary partition in a multi-type tree structure, in accordance with the present disclosure.
  • FIG. 3B shows a diagram illustrating block vertical binary partition in a multi-type tree structure, in accordance with the present disclosure.
  • FIG. 3C shows a diagram illustrating block horizontal binary partition in a multi type tree structure, in accordance with the present disclosure.
  • FIG. 3D shows a diagram illustrating block vertical ternary partition in a multi-type tree structure, in accordance with the present disclosure.
  • FIG. 3E shows a diagram illustrating block horizontal ternary partition in a multi type tree structure, in accordance with the present disclosure.
  • spatial prediction and/or temporal prediction may be performed.
  • Spatial prediction (or “intra prediction”) uses pixels from the samples of already coded neighboring blocks (which are called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal.
  • Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
  • Temporal prediction signal for a given CU is usually signaled by one or more motion vectors (MVs) which indicate the amount and the direction of motion between the current CU and its temporal reference.
  • MVs motion vectors
  • one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture store the temporal prediction signal comes.
  • the mode decision block in the encoder chooses the best prediction mode, for example based on the rate- distortion optimization method.
  • the prediction block is then subtracted from the current video block; and the prediction residual is de-correlated using transform and quantized.
  • the quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
  • in-loop filtering such as deblocking filter, sample adaptive offset (SAO) and adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture store and used to code future video blocks.
  • coding mode inter or intra
  • prediction mode information motion information
  • quantized residual coefficients are all sent to the entropy coding unit to be further compressed and packed to form the bitstream.
  • FIG. 2 shows a general block diagram of a video decoder for the VVC. Specifically, FIG. 2 shows a typical decoder 200 block diagram. Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related info 234, and video output 232.
  • Decoder 200 is similar to the reconstruction-related section residing in the encoder 100 of FIG. 1.
  • an incoming video bitstream 210 is first decoded through an Entropy Decoding 212 to derive quantized coefficient levels and prediction-related information.
  • the quantized coefficient levels are then processed through an Inverse Quantization 214 and an Inverse Transform 216 to obtain a reconstructed prediction residual.
  • a block predictor mechanism implemented in an Intra/inter Mode Selector 220, is configured to perform either an Intra Prediction 222 or a Motion Compensation 224, based on decoded prediction information.
  • a set of unfiltered reconstructed pixels is obtained by summing up the reconstructed prediction residual from the Inverse Transform 216 and a predictive output generated by the block predictor mechanism, using a summer 218.
  • the reconstructed block may further go through an In-Loop Filter 228 before it is stored in a Picture Buffer 226, which functions as a reference picture store.
  • the reconstructed video in the Picture Buffer 226 may be sent to drive a display device, as well as used to predict future video blocks.
  • a filtering operation is performed on these reconstructed pixels to derive a final reconstructed Video Output 232.
  • FIG. 2 gives a general block diagram of a block-based video decoder.
  • the video bitstream is first entropy decoded at entropy decoding unit.
  • the coding mode and prediction information are sent to either the spatial prediction unit (if intra coded) or the temporal prediction unit (if inter coded) to form the prediction block.
  • the residual transform coefficients are sent to inverse quantization unit and inverse transform unit to reconstruct the residual block.
  • the prediction block and the residual block are then added together.
  • the reconstructed block may further go through in-loop filtering before it is stored in reference picture store.
  • the reconstructed video in reference picture store is then sent out to drive a display device, as well as used to predict future video blocks.
  • the basic intra prediction scheme applied in the VVC is kept the same as that of the HEVC, except that several modules are further extended and/or improved, e.g., intra sub-partition (ISP) coding mode, extended intra prediction with wide-angle intra directions, position-dependent intra prediction combination (PDPC) and 4-tap intra interpolation.
  • ISP intra sub-partition
  • PDPC position-dependent intra prediction combination
  • tile is defined as a rectangular region of CTUs within a particular tile column and a particular tile row in a picture.
  • Tile group is a group of an integer number of tiles of a picture that are exclusively contained in a single NAL unit. Basically, the concept of tile group is the same as slice as defined in HEVC. For example, pictures are divided into tile groups and tiles.
  • a tile is a sequence of CTUs that cover a rectangular region of a picture.
  • a tile group contains a number of tiles of a picture. Two modes of tile groups are supported, namely the raster-scan tile group mode and the rectangular tile group mode.
  • a tile group contains a sequence of tiles in tile raster scan of a picture.
  • a tile group contains a number of tiles of a picture that collectively form a rectangular region of the picture. The tiles within a rectangular tile group are in the order of tile raster scan of the tile group.
  • FIG. 4 shows an example of raster-scan tile group partitioning of a picture, where the picture is divided into 12 tiles and 3 raster-scan tile groups.
  • FIG. 4 includes tiles 410, 412, 414, 416, 418, 420, 422, 424, 426, 428, 430, and 432. Each tile has 18 CTUs. More specifically, FIG. 4 shows a picture with 18 by 12 luma CTUs that is partitioned into 12 tiles and 3 tile groups (informative).
  • the three tile groups are as follows (1) the first tile group includes tiles 410 and 412, (2) the second tile group includes tiles 414, 416, 418, 420, and 422, and (3) the third tile group includes tiles 424, 426, 428, 430, and 432.
  • FIG. 5 shows an example of rectangular tile group partitioning of a picture, where the picture is divided into 24 tiles (6 tile columns and 4 tile rows) and 9 rectangular tile groups.
  • FIG. 5 includes tile 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, and 556. More specifically, FIG. 5 shows a picture with 18 by 12 luma CTUs that is partitioned into 24 tiles and 9 tile groups (informative). Atile group contains tiles and a tile contain CTUs.
  • the 9 rectangular tile groups include (1) the two tiles 510 and 512 , (2) the two 514 and 516, (3) the two tiles 518 and 520, (4) the four tiles 522, 524, 534, and 536, (5) the four tiles 526, 528, 538, and 540 (6) the four tiles 530, 532, 542, and 544, (7) the two tiles 546 and 548, (8) the two tiles 550 and 552, and (9) the two tiles 554 and 556.
  • VTM4 Large Nlock-Size Transforms with High-Frequency Zeroing in WC
  • VTM4 large block-size transforms, up to 64x64 in size, are enabled, which is primarily useful for higher resolution video, e.g., 1080p and 4K sequences.
  • High frequency transform coefficients are zeroed out for the transform blocks with size (width or height, or both width and height) equal to 64, so that only the lower-frequency coefficients are retained.
  • MxN transform block with M as the block width and N as the block height
  • M is equal to 64
  • N when N is equal to 64, only the top 32 rows of transform coefficients are kept.
  • transform skip mode is used for a large block, the entire block is used without zeroing out any values.
  • VPDUs Virtual Pipeline Data Units
  • Virtual pipeline data units are defined as non-overlapping units in a picture.
  • successive VPDUs are processed by multiple pipeline stages at the same time.
  • the VPDU size is roughly proportional to the buffer size in most pipeline stages, so it is important to keep the VPDU size small.
  • the VPDU size can be set to maximum transform block (TB) size.
  • TB maximum transform block
  • TT ternary tree
  • BT binary tree
  • VTM5 In order to keep the VPDU size as 64x64 luma samples, the following normative partition restrictions (with syntax signaling modification) are applied in VTM5, as shown in FIGS. 6A, 6B, 6C, 6D, 6E, 6F, 6G, and 6H (described below):
  • TT split is not allowed for a CU with either width or height, or both width and height equal to 128.
  • FIGS. 6A, 6B, 6C, 6D, 6E, 6F, 6G, and 6H show examples of disallowed TT and BT partitioning in VTM.
  • Transform coefficient coding in VV C is similar to HEV C in the sense that they both use non-overlapped coefficient groups (also called CGs or subblocks). However, there are also some differences between them.
  • each CG of coefficients has a fixed size of 4x4.
  • the CG size becomes dependent on TB size.
  • various CG sizes (1x16, 2x8, 8x2, 2x4, 4x2 and 16x1) are available in VVC.
  • the CGs inside a coding block, and the transform coefficients within a CG are coded according to pre-defmed scan orders.
  • the area of the TB and the type of video component e.g. luma component vs. chroma component
  • the maximum number of context-coded bins is equal to TB_zosize*1.75.
  • TB zosize indicates the number of samples within a TB after coefficient zero-out.
  • the coded sub block flag which is a flag indicating if a CG contains non-zero coefficient or not, is not considered for CCB count.
  • Coefficient zero-out is an operation performed on a transform block to force coefficients located in a certain region of the transform block to be 0.
  • a 64x64 transform has an associated zero-out operation.
  • transform coefficients located outside the top-left 32x32 region inside a 64x64 transform block are all forced to be 0.
  • coefficient zero-out operation is performed along that dimension to force coefficients located beyond the top-left 32x32 region to be 0.
  • a variable, remBinsPassl is first set to the maximum number of context-coded bins (MCCB) allowed. In the coding process, the variable is decreased by one each time when a context-coded bin is signaled. While the remBinsPassl is larger than or equal to four, a coefficient is firstly signaled through syntaxes of sig_coeff_flag, abs_level_gt l_flag. par_level_flag, and abs_level_gt3_flag, all using context-coded bins in the first pass.
  • MCCB context-coded bins
  • the rest part of level information of the coefficient is coded with syntax element of abs remainder using Golomb-rice code and bypass-coded bins in the second pass.
  • a current coefficient is not coded in the first pass, but directly coded in the second pass with the syntax element of dec abs level using Golomb-Rice code and bypass-coded bins.
  • the signs (sign_flag) for all scan positions with sig_coefif_flag equal to 1 is finally coded as bypass bins.
  • FIG. 7 depicted in FIG. 7 (described below).
  • the remBinsPassl is reset for every TB.
  • FIG. 7 shows an illustration of residual coding structure for transform blocks.
  • the unified (same) rice parameter (ricePar) derivation is used for signaling the syntax of abs remainder and dec abs level. The only difference is that baseLevel is set to 4 and 0 for coding abs remainder and dec abs level, respectively.
  • Rice parameter is determined based on not only the sum of absolute levels of neighboring five transform coefficients in local template, but also the corresponding base level as follows:
  • RicePara RiceParTablef max(min( 31, sumAbs - 5 * baseLevel), 0) ]
  • VVC Unlike HEVC where a single residual coding scheme is designed for coding both transform coefficients and transform skip coefficients, in VVC two separate residual coding schemes are employed for transform coefficients and transform skip coefficients (i.e. residual), respectively.
  • the statistical characteristics of residual signal are different from those of transform coefficients, and no energy compaction around low-frequency components is observed.
  • the residual coding is modified to account for the different signal characteristics of the (spatial) transform skip residual which includes:
  • context modeling for the sign flag is determined based on left and above neighboring coefficient values and sign flag is parsed after sig_coeff_flag to keep all context coded bins together;
  • syntax elements sig_coeff_flag, coeff_sign Jflag, ubs level gtl _Jhig. par level Jlag are coded in an interleaved manner residual sample by residual sample in the first pass, followed by abs level gtX flag bitplanes in the second pass, and abs remainder coding in the third pass.
  • FIG. 8 shows an illustration of residual coding structure for transform skip blocks.
  • dependent scalar quantization refers to an approach in which the set of admissible reconstruction values for a transform coefficient depends on the values of the transform coefficient levels that precede the current transform coefficient level in reconstruction order.
  • the main effect of this approach is that, in comparison to conventional independent scalar quantization as used in HEVC, the admissible reconstruction vectors are packed denser in the N-dimensional vector space (N represents the number of transform coefficients in a transform block). That means, for a given average number of admissible reconstruction vectors per N-dimensional unit volume, the average distortion between an input vector and the closest reconstruction vector is reduced.
  • the approach of dependent scalar quantization is realized by: (a) defining two scalar quantizers with different reconstruction levels and (b) defining a process for switching between the two scalar quantizers.
  • the two scalar quantizers used are illustrated in FIG. 9 (described below).
  • the location of the available reconstruction levels is uniquely specified by a quantization step size D.
  • the scalar quantizer used (Q0 or Ql) is not explicitly signalled in the bitstream. Instead, the quantizer used for a current transform coefficient is determined by the parities of the transform coefficient levels that precede the current transform coefficient in coding/reconstruction order.
  • FIG. 9 shows an illustration of the two scalar quantizers used in the proposed approach of dependent quantization.
  • the switching between the two scalar quantizers is realized via a state machine with four quantizer states (QState).
  • the QState can take four different values: 0, 1, 2, 3. It is uniquely determined by the parities of the transform coefficient levels preceding the current transform coefficient in coding/reconstruction order.
  • the state is set equal to 0.
  • the transform coefficients are reconstructed in scanning order (i.e., in the same order they are entropy decoded).
  • the state is updated as shown in FIG. 12, where k denotes the value of the transform coefficient level.
  • FIG. 10A shows a transition diagram illustrating a state transition for the proposed dependent quantization.
  • FIG. 10B shows a table illustrating a quantizer selection for the proposed dependent quantization.
  • the DC values are separately coded for following scaling matrices: 16x16, 32x32, and 64x64.
  • scaling matrices 16x16, 32x32, and 64x64.
  • the 8x8 base scaling matrix is up-sampled (by duplication of elements) to the corresponding square size (i.e. 16x16, 32x32, 64x64).
  • FIG. 11 shows an illustration of the template used for selecting probability models.
  • the black square specifies the current scan position and the squares with an “x” represent the local neighbourhood used.
  • the selected probability models depend on the sum of the absolute levels (or partially reconstructed absolute levels) in a local neighbourhood and the number of absolute levels greater than 0 (given by the number of sig_coefif_flags equal to 1) in the local neighbourhood.
  • the context modelling and binarization depends on the following measures for the local neighbourhood:
  • the probability models for coding sig_coeff_flag, abs level gt l flag. par_level_flag, and abs_level_gt3_flag are selected.
  • the Rice parameter for binarizing abs remainder and dec abs level is selected based on the values of sumAbs and numSig.
  • reduced 32-point MTS (also called RMTS32) is based on skipping high frequency coefficients and used to reduce computational complexity of 32-point DST- 7/DCT-8. And, it accompanies coefficient coding changes including all types of zero-out (i.e., RMTS32 and the existing zero out for high frequency components in DCT2).
  • binarization of last non-zero coefficient position coding is coded based on reduced TU size, and the context model selection for the last non-zero coefficient position coding is determined by the original TU size.
  • 60 context models are used to code the sig_coefif_flag of transform coefficients.
  • the selection of context model index is based on a sum of a maximum of five previously partially reconstructed absolute level called locSumAbsPassl and the state of dependent quantization Q State as follows:
  • ctxlnc 12 * Max( 0, QState - 1 ) +
  • ctxlnc 36 + 8 * Max( 0, QState - 1) +
  • Decoder-Side Motion Vector Refinement (DMVR) in WC is a technique for blocks coded in bi-prediction Merge mode and controlled by a SPS level flag sps_dmvr_enabled_flag. Under this mode, the two motion vectors (MV) of a block can be further refined using bilateral matching (BM) prediction. As shown in the FIG. 12 (described below), the bilateral matching method is used to refine motion information of a current CU by searching the closest match between its two reference blocks along the motion trajectory of the current CU in its two associated reference pictures. In this FIG.
  • the patterned black rectangular blocks (1222 and 1264) indicate the current CU and its two reference blocks based on the initial motion information from Merge mode.
  • the patterned rectangular blocks (1224 and 1262) indicate one pair of reference blocks based on a MV candidate used in the motion refinement research process.
  • the MV differences between the MV candidate and the initial MV are MV diff and -MV diff respectively, as indicated in FIG. 14.
  • MV diff MV diff and -MV diff respectively, as indicated in FIG. 14.
  • a number of such MV candidates around the initial MV may be checked. Specifically, for each given MV candidate, its two associated reference blocks may be located from its reference pictures in List 0 and List 1 respectively, and the difference between them is calculated.
  • Such block difference is usually measured in SAD (or sum of absolute difference), or row-subsampled SAD (i.e. the SAD calculated with every other row of the block involved).
  • SAD sum of absolute difference
  • SAD row-subsampled SAD
  • the MV candidate with the lowest SAD between its two reference blocks becomes the refined MV and used to generate the bi-predicted signal as the actual prediction for the current CU.
  • FIG. 12 shows a decoding side motion vector refinement.
  • FIG. 12 includes 1220 refPic in list L0, 1240 current picture, and 1260 refPic in list LI.
  • 1220 refPic in list L0 is a reference picture of the first list and includes 1222 current CU, 1224 reference block, 1226 MVdiff, 1228 MV0, and 1230 MV0’.
  • 1226 MVdiff is the motion vector difference between 1222 current CU and 1224 reference block.
  • 1228 MVO is the motion vector between blocks 1222 current CU and 1242 current CU.
  • 1230 MVO’ is the motion vector between blocks 1222 current CU and 1242 current CU.
  • 1240 current picture is a current picture of the video and includes 1242 current CU, 1244 MVU, and 1246 MV1.
  • 1244 MVU is the motion vector between block 1242 current CU and 1262 reference block.
  • 1246 MV1 is the motion vector between blocks 1242 current CU and 1264 current CU.
  • 1260 refPic in List LI is a reference picture in the second list and includes 1262 reference block, 1264 current CU, and 1266 -MVdiff.
  • 1266 -MVdiff is the motion vector difference between 1262 reference block and 1264 current CU.
  • VVC the DMVR is applied to a CU that satisfies the following conditions:
  • one reference picture of the CU is in the past (i.e. with a POC smaller than the current picture POC) and another reference picture is in the future (i.e. with a POC greater than the current picture POC);
  • CU has more than 64 luma samples in size and the CU height is more than 8 luma samples
  • the refined MV derived by DMVR process is used to generate the inter prediction samples and also used in temporal motion vector prediction for future picture coding. While the original MV is used in deblocking process and also in spatial motion vector prediction for future CU coding.
  • BDOF bi-directional optical flow
  • BIO previously referred to as BIO
  • BIO was included in the JEM.
  • the BDOF in VTM5 is a simpler version that requires much less computation, especially in terms of number of multiplications and the size of the multiplier.
  • BDOF is controlling by a SPS flag sps_bdof_enabled_flag.
  • BDOF is used to refine the bi-prediction signal of a CU at the 4x4 sub-block level.
  • BDOF is applied to a CU if it satisfies the following conditions: 1) the CU’s height is not 4, and the CU is not in size of 4x 8, 2) the CU is not coded using affine mode or the ATMVP merge mode; 3) the CU is coded using “true” bi-prediction mode, i. e. , one of the two reference pictures is prior to the current picture in display order and the other is after the current picture in display order.
  • BDOF is only applied to the luma component.
  • the BDOF mode is based on the optical flow concept, which assumes that the motion of an object is smooth.
  • the BDOF adjusts the prediction sample value based on the gradient values of a current block to improve the coding efficiency.
  • BDOF/DMVR are always applied if its corresponding SPS control flag is enabled and some bi-prediction and size constrains are met for a regular merge candidate.
  • DMVR is applied to a regular merge mode when all the following conditions are true:
  • the pic width in luma samples and pic height in luma samples of the reference picture refPicLX associated with the refldxLX are equal to the pic width in luma samples and pic height in luma samples of the current picture, respectively
  • predFlagL0[ xSbldx ][ ySbldx ] and predFlagLl [ xSbldx ][ ySbldx ] are both equal to 1.
  • the pic width in luma samples and pic height in luma samples of the reference picture refPicLX associated with the refldxLX are equal to the pic width in luma samples and pic height in luma samples of the current picture, respectively
  • Lossless Coding in HEVC is achieved by simply bypassing transform, quantization, and in-loop filters (de-blocking filter, sample adaptive offset, and adaptive loop filter). The design is aimed to enable the lossless coding with minimum changes required to the regular HEVC encoder and decoder implementation for mainstream applications.
  • the lossless coding mode can be turned on or off at the individual CU level. This is done through a syntax cu transquant bypass flag signaled at CU level.
  • the cu transquant bypass flag syntax is not always signaled. It is signaled only when another syntax called transquant bypass enabled flag has a value of 1. In other words, the syntax transquant bypass enabled flag is used to turn on the syntax signaling of cu transquant bypass flag.
  • the syntax transquant bypass enabled flag is signaled in the picture parameter set (PPS) to indicate whether the syntax cu transquant bypass flag needs to be signaled for every CU inside a picture referring to the PPS. If this flag is set equal to 1, the syntax cu_transquant_bypass_flag is sent at the CU level to signal whether the current CU is coded with the lossless mode or not. If this flag is set equal to 0 in the PPS, cu transquant bypass flag is not sent, and all the CUs in the picture are encoded with transform, quantization, and loop filters involved in the process, which will generally result in a certain level of video quality degradation.
  • PPS picture parameter set
  • transquant bypass enabled flag 1 specifies that cu_transquant_bypass_flag is present.
  • transquant_bypass_enabled_flag 0 specifies that cu transquant bypass flag is not present.
  • cu_transquant_bypass_flag 1 specifies that the scaling and transform process as specified in clause 8.6 and the in-loop filter process as specified in clause 8.7 are bypassed. When cu_transquant_bypass_flag is not present, it is inferred to be equal to 0.
  • first, second, third, etc. may include used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may include termed as second information; and similarly, second information may also be termed as first information. As used herein, the term “if’ may be understood to mean “when” or “upon” or “in response to” depending on the context.
  • VVC Inefficiencies of Lossless Coding Modes in Video Coding
  • the maximum TU size is 64x64 and the VPDU is also set as 64x64.
  • the maximum block size for coefficients coding in VVC is 32x32 because of the coefficient zero- out mechanism for width/height greater than 32.
  • current transform skip only supports up to 32x32 TU so that the maximum block size for residual coding can be aligned with the maximum block size for coefficient coding, which is 32x32.
  • Another inefficiency associated with lossless coding support in VVC is how to choose the residual (or referred to as coefficient) coding scheme.
  • the selection of residual coding scheme is based on the transform skip flag of a given block (or CU). Therefore, if under lossless mode in VVC the transform skip flag is assumed to be 1 as in HEVC, the residual coding scheme used under transform skip mode would always be used for a lossless mode CU.
  • the current residual coding scheme used when the transform skip flag is true is designed mainly for screen content coding. It may not be optimal to be used for lossless coding of regular content (i.e., non-screen content). In this disclosure, several methods are proposed to select the residual coding for lossless coding mode.
  • a third inefficiency associated with lossless coding in the current VVC is that the selection of context model in transform coefficient coding is dependent on the scalar quantizer used. However, as the quantization process is disabled in lossless coding, it may not be optimal to select the context model according to the quantizer selection if the transform coefficient coding is applied for coding the residual block under lossless coding mode.
  • a fourth inefficiency is related to the maximum CCB for each TU under lossless coding mode.
  • current limitation under lossy coding mode is TB_zosize*1.75. It may not be optimal for lossless coding.
  • a fifth inefficiency is related to the coding of sign flag of transform coefficients.
  • the sign flag is signaled as context-coded bin in residual coding for transform skip block and as bypass coded bin in transform coefficient coding. This is because, in transform coefficient coding, it is assumed that the sign of transform coefficients has an almost equal probability of taking a value of 0 versus 1, and it is not so correlated with its neighboring transform coefficient values.
  • the sign of residual does show correlation with neighboring residual values.
  • the transform coefficient coding is applied to code the residual block, it can be expected that the sign of residual is also very likely to be correlated with neighboring residual values. In this case, coding them as bypass bins may not be optimal.
  • the transform skip mode can only be enabled for a residual block whose width and height are both smaller than or equal to 32, which means the maximum residual coding block size under transform skip mode is 32x32.
  • the maximum width and/or height of the residual block for a lossless CU is also set to be 32, with a maximum residual block size as 32x32.
  • the CU residual block is divided into multiple smaller residual blocks with a size of 32xN and/or Nx32 so that the width or height of the smaller residual blocks are not greater than 32.
  • a 128x32 lossless CU is divided into four 32x32 residual blocks for residual coding.
  • a 64x64 lossless CU is divided into four 32x32 residual blocks.
  • the width/height of maximum residual block for lossless CU is set to the VPDU size (e.g.
  • the CU residual block is divided into multiple smaller residual blocks with a size of 64xN and/or Nx64 so that the width or height of the smaller residual blocks are not greater than VPDU width and/or height.
  • a 128x128 lossless CU is divided into four 64x64 residual blocks for residual coding.
  • a 128x32 lossless CU is divided into two 64x32 residual blocks.
  • a lossless CU may use the same residual coding scheme as the one used by the transform skip mode CUs.
  • a lossless CU may use the same residual coding scheme as the one used by the non-transform skip mode CUs.
  • the residual coding scheme for lossless CUs is selected adaptively from the existing residual coding schemes based on certain conditions and/or pre-defmed procedures. Such conditions and/or pre-defmed procedures are followed by both the encoder and decoder, so that there is no signaling needed in the bitstream to indicate the selection.
  • a simple screen content detection scheme may be specified and utilized in both encoder and decoder. Based on the detection scheme, a current video block may be classified as screen content or regular content. In case it is screen content, the residual coding scheme used under transform skip mode is selected. Otherwise, the other residual coding scheme is selected.
  • a syntax is signaled in the bitstream to explicitly specify which residual coding scheme is used by a lossless CU.
  • Such a syntax may be a binary flag, with each binary value indicating the selection of one of the two residual coding schemes.
  • the syntax can be signaled at different levels. For example, it may be signaled in an sequence parameter set (SPS), picture parameter set (PPS), slice header, tiles group header, or tile. It may also be signaled at CTU or CU level. When such a syntax is signaled, all the lossless CUs at the same or lower level would use the same residual coding scheme indicated by the syntax.
  • the syntax indicating residual coding scheme is conditionally signaled based on the lossless mode flag of the CU. For example, only when the lossless mode flag cu transquant bypass flag indicates that the current CU is coded in lossless mode, the syntax indicating residual coding scheme is signaled for the CU.
  • TU level it may be signaled in TU level.
  • a syntax at CU level to indicate if a CU is coded in lossless mode, such as the cu transquant bypass flag
  • a syntax for each TU of current lossless CU is signaled to indicate the selection of one of the two residual coding schemes.
  • a transform skip mode flag is signaled.
  • the selection of residual coding scheme for the CU is based on its transform skip mode flag.
  • the controlling of DMVR on/off is not defined for lossless coding mode.
  • it is proposed to control turn on/off DMVR in slice level by a 1-bit signaling slice_disable_dmvr_flag flag.
  • the slice disable dmvr flag flag is needed to be signaled if sps dmvr enabled flag is set equal to 1, and transquant bypass enabled flag flag is set equal to 0. If slice disable dmvr flag flag is not signaled, it is inferred to be 1. If slice_disable_dmvr_flag is equal to 1, DMVR is turned off. In this case, the signaling is as followed:
  • the cu level controlling for DMVR is as the following:
  • DMVR is applied to a regular merge mode when all the following conditions are true:
  • the pic width in luma samples and pic height in luma samples of the reference picture refPicLX associated with the refldxLX are equal to the pic width in luma samples and pic height in luma samples of the current picture, respectively
  • the controlling of BDOF on/off is not defined for lossless coding mode.
  • it is proposed to control turn on/off BDOF by a 1-bit signaling slice disable bdof flag flag.
  • the slice disable bdof flag flag is signaled if sps bdof enabled flag is set equal to 1 or transquant bypass enabled flag flag is set equal to 0. If slice disable bdof flag flag is not signaled, it is inferred to be 1. If slice disable bdof flag flag is equal to 1, BDOF is disable.
  • the signaling is illustrated as follows:
  • the cu level controlling for BDOF is as the following:
  • predFlagL0[ xSbldx ][ ySbldx ] and predFlagLl [ xSbldx ][ ySbldx ] are both equal to 1.
  • the pic width in luma samples and pic height in luma samples of the reference picture refPicLX associated with the refldxLX are equal to the pic width in luma samples and pic height in luma samples of the current picture, respectively.
  • both of BDOF and DMVR are always applied for decoder-side refinement to improve coding efficiency and controlled by each SPS flag and condition of some bi-prediction and size constrains are met for a regular merge candidate.
  • QState QState
  • locSumAbsPassl a sum of a maximum of five previously partially reconstructed absolute level
  • a constant QState value is always used in selecting the context model for coding residual block if the transform coefficient coding scheme is applied for coding the residual block under lossless coding.
  • Such a constant QState value may be chosen as 0.
  • such a constant QState value may be chosen as a non-zero value as well, e.g., 1, 2 or 3.
  • FIG. 13 shows a method of prediction refinement with optical flow (PROF) for decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a plurality of CUs that may include a lossless CU.
  • the decoder may acquire at least one partially reconstructed absolute level in a local neighborhood of the lossless CU.
  • the decoder may select a context model independent of a scalar quantizer state and based on the at least one partially reconstructed absolute level.
  • a constant QState value is always used in selecting the context model for coding a residual block if the transform coefficient coding scheme is applied for coding the residual block under lossless coding.
  • a different QState constant value may be used in selecting the context model when coding a different block, or a different slice, or a different frame, etc. under lossless coding.
  • FIG. 14 shows a method of prediction refinement with optical flow (PROF) for decoding a video signal in accordance with the present disclosure. The method may be, for example, applied to a decoder.
  • the decoder may obtain a plurality of CUs that may include a lossless CU.
  • the decoder may acquire a transform block (TB) based on the lossless CU.
  • the decoder may acquire a maximum number of CCB for the TB. The maximum number of CCB may be greater than a number of samples within the TB after coefficient zero-out times a preset value.
  • the maximum number of context-coded bin ofluma and chroma is set as TB_zosize*4 for lossless coding. In another example, the maximum number of context-coded bin ofluma and chroma is set as TB_zosize*8 for lossless coding.
  • a new context model may be designed and added for coding the sign flag if the transform coefficient coding scheme is applied in lossless coding to code the residual block.
  • such anew context model may be designed and operated in the same way as the context model used for sign flag coding in the residual coding for transform skip mode.
  • an existing context model may be shared and used.
  • the current context model for sign flag coding in the residual coding for transform skip mode may be shared and used for coding the sign flag when the transform coefficient coding is applied in lossless coding to code residual block.
  • FIG. 15 shows a method of prediction refinement with optical flow (PROF) for decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a plurality of CUs that may include a lossless CU.
  • the decoder may determine that a transform coefficient coding scheme is applied to code a residual block based on the lossless CU.
  • the decoder may signal a sign flag of transform coefficients as context- coded bin (CCB) using the transform coefficient coding scheme.
  • CCB context- coded bin
  • the sign flag of residual in transform skip block is signal with context-coded bin.
  • the residual blocks may be rotated only if its width or height is less than one pre-defmed threshold.
  • the residual blocks may be rotated only if its width and height are equal.
  • the residual blocks may be rotated only if its width and height are not equal.
  • the residual block may be rotated for certain video components, i.e., luma component or chroma component.
  • FIG. 16 shows a method of prediction refinement with optical flow (PROF) for decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a plurality of CUs.
  • the decoder may acquire a residual block based on the plurality of CUs.
  • the decoder may adaptively rotate the residual block based on predefined procedures.
  • the predefined procedures are followed by both an encoder and decoder.
  • the residuals of one coding block in one dimension (e.g., horizontal or vertical) based on the decision that whether the corresponding size of the dimension fulfills the pre-defmed threshold. For instance, for coding blocks whose width is equal to or less than the pre-defmed threshold while its height is larger than the threshold, the residuals of the block may be rotated only in the horizontal direction (i.e., horizontal flip). For coding blocks whose height is equal to or less than the threshold while its width is larger than the threshold, the residuals of the block may be rotated only in vertical direction (i.e., vertical flip).
  • the residuals of the coding block may be rotated in both horizontal and vertical directions. Otherwise, i.e., both its width and its height are larger than the threshold, no rotation is applied to the residuals of the block.
  • the residual blocks may be rotated only if its prediction mode is intra or intra block copy mode. In another example, the residual blocks may be rotated only if its prediction mode is intra or inter mode.
  • the residual rotation flag may be signaled or not. For instance, in one embodiment of the disclosure, it is proposed to only enable the residual rotation for coding block which contains N or less than N samples, where N is a pre-defmed value. Then, for coding blocks which contain more than N samples, the residual rotation flag is not signaled and always inferred to be 0 (i.e., without residual rotation). Otherwise, for coding blocks which contain N or less than N samples, the flag is signaled to indicate whether the residuals need to be rotated or not.
  • syntax is signaled in the bitstream to explicitly specify if a residual block is rotated for a TU.
  • a syntax may be a binary flag.
  • the syntax can be signaled at different levels. For example, it may be signaled in sequence parameter set (SPS), picture parameter set (PPS), slice header, tiles group header, or tile. It may also be signaled at CTU, CU, or TU level. When such a syntax is signaled, for all the TUs at the same or lower level, residual rotation would be performed according to the indication of the syntax value.
  • the syntax is signaled at SPS level, residual rotation decision is shared among all the residual blocks of TUs in the sequence.
  • the syntax is signaled at PPS level, residual rotation decision is shared among all the residual block of TUs in a picture using that PPS.
  • the syntax is signaled at TU level, so each TU has its own decision about whether residual rotation should be performed.
  • both the residual coding used for transform skip mode and the transform coefficient coding scheme can be applied to code the residual blocks under lossless mode. If the residual coding designed for transform skip mode is applied it codes the residual block samples based on the scan order from top-left of the block to bottom-right of the block. If the transform coefficient coding scheme is applied it codes the residual block samples based on the exactly reversed scan order from bottom-right of the block to top-left of the block. In one or more embodiments, it is proposed to align the scanning order of both coding schemes under lossless coding. In one example, if the transform coefficient coding scheme is applied for lossless coding, the scanning and coding order of samples is the same as that used for residual coding under transform skip mode, i.e.
  • FIG. 17 shows a method of prediction refinement with optical flow (PROF) for decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a plurality of CUs that may include a lossless CU.
  • the decoder may determine that a transform coefficient coding scheme is applied based on the lossless CU.
  • the decoder may set a scanning order of residual block samples in the transform coefficient coding scheme to a scanning order used in residual coding scheme under transform skip mode in order to align the scanning order of both coding schemes.
  • binarization of last non-zero coefficient position coding is based on reduced TU size (i.e. the TU size after coefficient zero-out operation) while the context model selection for the last non-zero coefficient position coding is determined by the original TU size. Therefore, the context model selection and the binarization for last non-zero coefficient position depend on different control logics. In one or more embodiments, it is proposed to select the context model for coding the position of last non-zero coefficient based on the reduced TU size.
  • the context selection for signaling the position of its last non-zero coefficient is based on its reduced TU size of 32x32 instead of the original size of 64x64. It also means that it shares the context with actual 32x32 TUs which do not have coefficient zero-out operation performed.
  • FIG. 18 shows a method of prediction refinement with optical flow (PROF) for decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a plurality of CUs.
  • the decoder may obtain a last non-zero coefficient based on a coefficient zero-out operation applied to the plurality of CUs.
  • the decoder may select a context model for coding a position of the last non-zero coefficient based on a reduced TU pixel size in order to reduce a total number of contexts used for coding last non-zero coefficient.
  • a 32x32 TU may be zero-out to a reduced size of 16x16.
  • the context selection for signaling the position of its last non-zero coefficient is based on its reduced TU size of 16x16, and it also shares the context with actual 16x16 TUs.
  • the reduced TU size is min(TUWidth, 32)*min(TUHeight, 32) for the TUs where DCT-II transform, and min(TUWidth, 16)*min(TUHeight, 16) for the TUs where DCT-VIII and DST-VII are applied.
  • FIG. 19 shows a computing environment 1910 coupled with a user interface 1960.
  • the computing environment 1910 can be part of a data processing server.
  • the computing environment 1910 includes processor 1920, memory 1940, and I/O interface 1950.
  • the processor 1920 typically controls overall operations of the computing environment 1910, such as the operations associated with the display, data acquisition, data communications, and image processing.
  • the processor 1920 may include one or more processors to execute instructions to perform all or some of the steps in the above-described methods.
  • the processor 1920 may include one or more modules that facilitate the interaction between the processor 1920 and other components.
  • the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip machine, a GPU, or the like.
  • the memory 1940 is configured to store various types of data to support the operation of the computing environment 1910.
  • Memory 1940 may include predetermine software 1942. Examples of such data include instructions for any applications or methods operated on the computing environment 1910, video datasets, image data, etc.
  • the memory 1940 may be implemented by using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory a magnetic memory
  • flash memory a flash memory
  • the I/O interface 1950 provides an interface between the processor 1920 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like.
  • the buttons may include but are not limited to, a home button, a start scan button, and a stop scan button.
  • the I/O interface 1950 can be coupled with an encoder and decoder.
  • non-transitory computer-readable storage medium comprising a plurality of programs, such as comprised in the memory 1940, executable by the processor 1920 in the computing environment 1910, for performing the above-described methods.
  • the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device or the like.
  • the non-transitory computer-readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, where the plurality of programs when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.
  • the computing environment 1910 may be implemented with one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field- programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, micro controllers, microprocessors, or other electronic components, for performing the above methods.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field- programmable gate arrays
  • GPUs graphical processing units
  • controllers micro controllers, microprocessors, or other electronic components, for performing the above methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne des procédés, des appareils et des supports de stockage non transitoires, lisibles par ordinateur, permettant de décoder un signal vidéo. Un décodeur obtient une pluralité d'unités de codage (CU) comprenant une CU sans perte. Le décodeur acquiert au moins un niveau absolu partiellement reconstruit dans un voisinage local de la CU sans perte. Le décodeur sélectionne également un modèle de contexte indépendant d'un état de quantification scalaire et dépendant dudit niveau absolu partiellement reconstruit au moins.
PCT/US2020/051326 2019-09-17 2020-09-17 Procédés et appareils pour modes de codage sans perte dans un codage vidéo WO2021055640A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080054161.2A CN114175653B (zh) 2019-09-17 2020-09-17 用于视频编解码中的无损编解码模式的方法和装置

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962901768P 2019-09-17 2019-09-17
US62/901,768 2019-09-17
US201962902956P 2019-09-19 2019-09-19
US62/902,956 2019-09-19

Publications (1)

Publication Number Publication Date
WO2021055640A1 true WO2021055640A1 (fr) 2021-03-25

Family

ID=74883491

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/051326 WO2021055640A1 (fr) 2019-09-17 2020-09-17 Procédés et appareils pour modes de codage sans perte dans un codage vidéo

Country Status (2)

Country Link
CN (1) CN114175653B (fr)
WO (1) WO2021055640A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120082231A1 (en) * 2010-10-01 2012-04-05 Qualcomm Incorporated Zero-out of high frequency coefficients and entropy coding retained coefficients using a joint context model
US20160330479A1 (en) * 2013-12-30 2016-11-10 Qualcomm Incorporated Simplification of delta dc residual coding in 3d video coding
US20180234681A1 (en) * 2017-02-10 2018-08-16 Intel Corporation Method and system of high throughput arithmetic entropy coding for video coding
WO2018194189A1 (fr) * 2017-04-18 2018-10-25 삼성전자 주식회사 Procédé de codage/décodage d'image et dispositif associé
US20190149816A1 (en) * 2017-01-19 2019-05-16 Google Llc Dc coefficient sign coding scheme
US20190281304A1 (en) * 2012-06-26 2019-09-12 Velos Media, Llc Modified Coding for Transform Skipping

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100488254C (zh) * 2005-11-30 2009-05-13 联合信源数字音视频技术(北京)有限公司 一种基于上下文的熵编码方法及解码方法
US9025661B2 (en) * 2010-10-01 2015-05-05 Qualcomm Incorporated Indicating intra-prediction mode selection for video coding
US20130016789A1 (en) * 2011-07-15 2013-01-17 General Instrument Corporation Context modeling techniques for transform coefficient level coding
US9215464B2 (en) * 2013-09-19 2015-12-15 Blackberry Limited Coding position data for the last non-zero transform coefficient in a coefficient group

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120082231A1 (en) * 2010-10-01 2012-04-05 Qualcomm Incorporated Zero-out of high frequency coefficients and entropy coding retained coefficients using a joint context model
US20190281304A1 (en) * 2012-06-26 2019-09-12 Velos Media, Llc Modified Coding for Transform Skipping
US20160330479A1 (en) * 2013-12-30 2016-11-10 Qualcomm Incorporated Simplification of delta dc residual coding in 3d video coding
US20190149816A1 (en) * 2017-01-19 2019-05-16 Google Llc Dc coefficient sign coding scheme
US20180234681A1 (en) * 2017-02-10 2018-08-16 Intel Corporation Method and system of high throughput arithmetic entropy coding for video coding
WO2018194189A1 (fr) * 2017-04-18 2018-10-25 삼성전자 주식회사 Procédé de codage/décodage d'image et dispositif associé

Also Published As

Publication number Publication date
CN114175653A (zh) 2022-03-11
CN114175653B (zh) 2023-07-25

Similar Documents

Publication Publication Date Title
US11770534B2 (en) Method and device for encoding/decoding images
KR102424240B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR20220127801A (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
WO2018026887A1 (fr) Filtrage en boucle adaptatif à base de transformation géométrique
EP3804315A1 (fr) Systèmes et procédés de partitionnement de blocs vidéo dans une tranche d'interprédiction de données vidéo
WO2015196322A1 (fr) Décisions de codeur basées sur des résultats d'une mise en correspondance de blocs basée sur un algorithme de hachage
KR20210063258A (ko) 적응적 루프내 필터링 방법 및 장치
CN113545051A (zh) 使用块大小限制的视频数据块的重构
US20230291936A1 (en) Residual and coefficients coding for video coding
EP3922029A1 (fr) Codage vidéo à l'aide d'un mode de codage de sous-partition intra
KR20210088697A (ko) 인코더, 디코더 및 대응하는 디블록킹 필터 적응의 방법
WO2020264529A1 (fr) Modes de codage sans perte pour un codage vidéo
CA3203828A1 (fr) Codage residuel et de coefficients pour un codage video
EP4248651A1 (fr) Codage résiduel et de coefficients pour un codage vidéo
CN113728632A (zh) 视频编码、解码方法及系统
CN114586347A (zh) 用于基于交叉分量相关性来减小视频编码中的重构误差的系统和方法
EP4346215A2 (fr) Procédé de codage/décodage de signal vidéo et dispositif associé
US20220248031A1 (en) Methods and devices for lossless coding modes in video coding
WO2021055640A1 (fr) Procédés et appareils pour modes de codage sans perte dans un codage vidéo
WO2019233997A1 (fr) Prédiction de paramètres sao
WO2019234000A1 (fr) Prédiction de paramètres de sao
WO2023158765A1 (fr) Procédés et dispositifs de réordonnancement de modes de division de mode de partitionnement géométrique avec un ordre de modes prédéfini
WO2023154574A1 (fr) Procédés et dispositifs pour mode de partitionnement géométrique avec mélange adaptatif
WO2023023174A1 (fr) Amélioration du codage dans un décalage adaptatif d'échantillon inter-composants
WO2019233998A1 (fr) Codage et décodage vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20865601

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20865601

Country of ref document: EP

Kind code of ref document: A1