CN113853785B - Context modeling for residual coding - Google Patents

Context modeling for residual coding Download PDF

Info

Publication number
CN113853785B
CN113853785B CN202080035935.7A CN202080035935A CN113853785B CN 113853785 B CN113853785 B CN 113853785B CN 202080035935 A CN202080035935 A CN 202080035935A CN 113853785 B CN113853785 B CN 113853785B
Authority
CN
China
Prior art keywords
codec
residual
codec mode
context
transform block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080035935.7A
Other languages
Chinese (zh)
Other versions
CN113853785A (en
Inventor
张莉
张凯
刘鸿彬
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202311511035.7A priority Critical patent/CN117560489A/en
Publication of CN113853785A publication Critical patent/CN113853785A/en
Application granted granted Critical
Publication of CN113853785B publication Critical patent/CN113853785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/18Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Abstract

A method of visual processing includes determining, during a transition between a block of video and a bitstream representation of video, whether a switch from a first residual codec technique to a second residual codec technique exists based on a number of context codec bits per cell used in the first residual codec technique. The unit is included in a block and coefficients of the unit are encoded in the bitstream representation in multiple passes. The method also includes performing a conversion based on the determination.

Description

Context modeling for residual coding
Cross Reference to Related Applications
The present application is intended to claim in time the priority and benefit of international patent application No. pct/CN2019/086814 filed on 5 months 14 days 2019, in accordance with applicable patent laws and/or in accordance with rules of the paris convention. The entire disclosure of the foregoing application is incorporated by reference herein as part of the disclosure of this application for all purposes in accordance with law.
Technical Field
This patent document relates to video codec technology, apparatus, and systems.
Background
Currently, efforts are underway to improve the performance of current video codec technology to provide better compression ratios or to provide video coding and decoding schemes that allow for lower complexity or parallel implementations. Industry professionals have recently proposed several new video codec tools and are currently under test to determine their effectiveness.
Disclosure of Invention
Apparatuses, systems, and methods related to digital video codec, and in particular, to motion vector management are described. The described methods may be applied to existing video codec standards (e.g., high Efficiency Video Codec (HEVC) or multi-function video codec) and future video codec standards or video codecs.
In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes determining, during a transition between a block of video including one or more units and a bitstream representation of the video, whether to switch from a first residual codec technique to a second residual codec technique based on a number of context codec bits for each unit used in the first residual codec technique. The coefficients of the unit are encoded in the bitstream representation in multiple passes using either a first residual codec technique or a second residual codec technique. The method also includes performing a conversion based on the determination.
In another representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes performing a conversion between a block of video and a bitstream representation of the video. The block includes one or more encoded code groups, and the current block is encoded in the bitstream representation based on a constraint on a maximum number of context encoded binary bits for each encoded code group.
In another representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes performing a transition between a current block of video and a bitstream representation of the video. The current block is encoded in the bitstream representation based on a constraint on a maximum number of context codec bits for each syntax element or each codec pass associated with the current block.
In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes performing a transition between a current video unit and a bitstream representation of the current video unit, wherein the transition includes contextually modeling the current video unit based on applying a constraint on a maximum number of contextual codec bits for each Coding Group (CG) associated with the current video unit, wherein contextually modeled information is included in the bitstream representation of the current video unit.
In another representative aspect, the disclosed techniques can be used to provide another video processing method. The method includes performing a transition between a current video unit and a bitstream representation of the current video unit, wherein the transition includes contextually modeling the current video unit based on applying a constraint on a maximum number of contextual codec bits for each syntax element or each codec pass associated with the current video unit, wherein contextually modeled information is included in the bitstream representation of the current video unit.
In another representative aspect, the disclosed techniques can be used to provide another video processing method. The method includes performing a conversion between the current video unit and a bitstream representation of the current video unit, wherein the conversion includes one or more residual codec steps such that each residual codec step is associated with a number of context codec bits for each codec unit; and switching from the first residual codec step to the second residual codec step during the converting based at least in part on the first number of contextual codec bits for each of the codec units in the first step and the second number of contextual codec bits for each of the codec units in the second step.
Further, in a representative aspect, an apparatus in a video system is disclosed that includes a processor and a non-transitory memory having instructions thereon. The instructions, when executed by a processor, cause the processor to implement any one or more of the disclosed methods.
Furthermore, a computer program product stored on a non-transitory computer readable medium is disclosed, the computer program product comprising program code for performing any one or more of the disclosed methods.
The above and other aspects and features of the disclosed technology are described in more detail in the accompanying drawings, description and claims.
Drawings
Fig. 1 shows an example of a block diagram of an encoder.
Fig. 2 shows an example of intra direction mode.
Fig. 3 shows an example of Affine Linear Weighted Intra Prediction (ALWIP) of a 4 x 4 block.
Fig. 4 shows an example of Affine Linear Weighted Intra Prediction (ALWIP) of 8×8 blocks.
Fig. 5 shows an example of Affine Linear Weighted Intra Prediction (ALWIP) for an 8×4 block.
Fig. 6 shows an example of Affine Linear Weighted Intra Prediction (ALWIP) for a 16 x 16 block.
Fig. 7 shows an example of reference lines adjacent to a prediction block.
Fig. 8 shows an example of division of blocks.
Fig. 9 shows an example of the division of blocks with exceptions.
Fig. 10 shows an example of the secondary transformation.
Fig. 11 shows an example of a simplified quadratic transformation (RST).
Fig. 12 shows an example of a forward reduced transform and an inverse reduced transform.
Fig. 13 shows an example of a positive RST.
Fig. 14 shows an example of RST scanning.
Fig. 15 shows an example of a sub-block transform mode.
Fig. 16 shows an example of a scanning sequence.
Fig. 17 shows another example of a scanning sequence.
FIG. 18 illustrates an example template for selecting a probability model.
Fig. 19 shows an example of a scalar quantizer.
Fig. 20 shows an example of a state transition machine associated with a scalar quantizer.
Fig. 21 is a block diagram of an example of a hardware platform for implementing the visual media decoding or visual media encoding techniques described in this document.
Fig. 22 shows a flowchart of an example method for video encoding and decoding.
FIG. 23 is a block diagram of an example video processing system in which the disclosed techniques may be implemented.
Fig. 24 shows a flowchart of an example method for video processing in accordance with the present technology.
Fig. 25 shows a flow chart of another example method for video processing in accordance with the present technology.
Fig. 26 shows a flow chart of yet another example method for video processing in accordance with the present technology.
Detailed Description
Video codec in HEVC/H.265
Video codec standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. ITU-T specifies h.261 and h.263, ISO/IEC specifies MPEG-1 and MPEG-4 visualizations, and these two organizations jointly specify h.262/MPEG-2 video and h.264/MPEG-4 advanced video codec (Advanced Video Coding, AVC) and h.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, where temporal prediction plus transform coding is utilized. To explore future video codec technologies beyond HEVC, VCEG and MPEG have combined in 2015 to form a joint video exploration group (Joint Video Exploration Team, jfet). Thereafter, jfet takes many new approaches and places it in reference software called joint exploration model (Joint Exploration Model, JEM). In month 4 2018, the joint video expert group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) holds true to address the VVC (Versatile Video Coding, versatile video codec) standard with the goal of a 50% bit rate reduction compared to HEVC.
2.1. Codec flow for a typical video codec
Fig. 1 shows an example of an encoder block diagram of a VVC, which contains three in-loop filter blocks: deblocking Filter (DF), sample Adaptive Offset (SAO), and ALF. Unlike DF using a predefined filter, SAO and ALF signal the offset and the codec side information of the filter coefficients with the original samples of the current picture, reducing the mean square error between the original samples and the reconstructed samples by adding the offset and by applying a Finite Impulse Response (FIR) filter, respectively. ALF is located at the last processing stage of each picture and can be seen as a tool that attempts to capture and repair artifacts (artifacts) created by previous stages.
Intra-frame codec in vvc
2.2.1. Intra mode codec with 67 intra prediction modes
To capture any edge direction presented in natural video, the number of directional intra modes extends from 33 used in HEVC to 65. The additional direction mode is depicted as the red dashed arrow in fig. 2, and the planar mode and DC mode remain unchanged. These dense directional intra prediction modes are applicable to all block sizes as well as luminance and chrominance intra predictions.
The conventional angular intra prediction direction is defined as from 45 degrees to-135 degrees in the clockwise direction, as shown in fig. 2. In VTM2, for non-square blocks, several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes. The alternative pattern is signaled using the original method and remapped to the index of the wide angle pattern after parsing. The total number of intra prediction modes is unchanged, e.g., 67, and the intra mode codec is unchanged.
In HEVC, each intra-coding block has a square shape, with each side having a length that is a power of 2. Therefore, no division operation is required to generate intra prediction values using the DC mode. In VVV2, the blocks may have a rectangular shape, which generally requires division operations to be used for each block. To avoid division of the DC prediction, only the longer sides are used to calculate the average of the non-square blocks.
In addition to 67 intra prediction modes, the Wide Angle Intra Prediction (WAIP) and position dependent intra prediction combining (PDPC) method of non-square blocks is enabled for a particular block. The PDPC is applied without signaling to the following intra modes: plane, DC, horizontal, vertical, left bottom angle mode, and eight adjacent angle modes thereof, and right top angle mode, and eight adjacent angle modes thereof.
2.2.2. Affine linear weighted intra prediction (ALWIP, also known as matrix-based intra prediction)
2.2.2.1. Generating a reduced prediction signal by matrix vector multiplication
Adjacent reference points are first downsampled via averaging to generate a reduced reference signal bdry red . Then, the reduced prediction signal pred red By computing a matrix vector product and adding an offset:
pred red =A·bdry red +b
here, a is a matrix, if w=h=4, then W is present red ·H red Rows and 4 columns, and in all other cases 8 columns. b is a size W red ·H red Is a vector of (a).
2.2.2.2. Description of the entire ALWIP Process
The overall process of averaging, matrix vector multiplication and linear interpolation is shown for different shapes in fig. 3-6. Note that the remaining shape is considered one of the described cases.
1. Given a 4 x 4 block, as shown in fig. 3, the ALWIP takes two averages along each axis of the boundary. The four input samples obtained enter matrix vector multiplication. The matrix is taken from a set S 0 . After adding the offset, this yields 16 final predicted samples. Linear interpolation is not necessary for generating the prediction signal. Thus, each sample is performed a total of (4.16)/(4.4) =4 timesAnd (5) multiplying.
2. Given an 8 x 8 block, as shown in fig. 4, the ALWIP averages four along each axis of the boundary. The eight input samples obtained enter matrix vector multiplication. The matrix is taken from a set S 1 . This produces 16 samples at odd positions of the prediction block. Thus, a total of (8·16)/(8·8) =2 multiplications per sample are performed. After adding the offset, these samples are vertically interpolated using a simplified top boundary. Horizontal interpolation is performed using the original left boundary.
3. Given an 8 x 4 block, as shown in fig. 5, ALWIP takes four averages along the horizontal axis of the boundary and four raw boundary values on the left boundary. The eight input samples obtained enter matrix vector multiplication. The matrix is taken from a set S 1 . This produces 16 samples at odd levels and at each vertical position of the prediction block. Thus, a total of (8·16)/(8·4) =4 multiplications per sample are performed. After adding the offset, these samples are interpolated horizontally by using the original left boundary. The transposed case is treated accordingly.
4. Given a 16 x 16 block, as shown in fig. 6, the ALWIP averages four along each axis of the boundary. The eight input samples obtained enter matrix vector multiplication. The matrix is taken from a set S 2 . This produces 64 samples at odd positions of the prediction block. Thus, a total of (8·64)/(16·16) =2 multiplications per sample are performed. After adding the offset, these samples are interpolated vertically by using the eight averages of the top boundary. Horizontal interpolation is performed using the original left boundary. In this case, the interpolation process does not add any multiplications. Thus, two multiplications per sample are associated with calculating ALWIP predictions.
For larger shapes, the process is essentially the same and it is easy to check the number of multiplications per sample less than four.
For W x 8 blocks (where W > 8), only horizontal interpolation is required, as the samples are given at odd levels and at each vertical position.
Finally, for a W4 block (where W > 8), let A_kbe be the matrix that appears such that each row corresponding to an odd entry is omitted along the horizontal axis of the downsampled block. Therefore, the output size is 32, and again, only horizontal interpolation remains performed.
The transposed case is treated accordingly.
2.2.2.3. Grammar and semantics
The following bold and underlined sections indicate proposed modifications to the standard.
7.3.6.5 codec unit syntax
2.2.3. Multiple Reference Line (MRL)
Multiple Reference Line (MRL) intra prediction uses more reference lines for intra prediction. In fig. 7, an example of 4 reference lines is depicted, where the samples of segments a and F are not extracted from the reconstructed neighboring samples, but are filled with the closest samples from segments B and E, respectively. HEVC intra picture prediction uses the nearest reference line (e.g., reference line 0). In the MRL, 2 additional lines (reference line 1 and reference line 3) are used.
The index (mrl _idx) of the selected reference line is signaled and used to generate an intra-prediction value. For reference line indexes greater than 0, only additional reference line modes are included in the MPM list, and only the MPM indexes are signaled without the remaining modes. The reference line index is signaled before the intra prediction mode, and in case the non-zero reference line index is signaled, the plane mode and the DC mode are excluded from the intra prediction mode.
The MRL is disabled for blocks of the first line within the CTU to prevent the use of extended reference samples outside the current CTU line. Furthermore, the PDPC is disabled when an additional line is used.
2.2.4. Intra-frame sub-block partitioning (ISP)
In some embodiments, the ISP is used to divide the luminance intra prediction block vertically or horizontally into 2 or 4 sub-partitions according to the block size, as shown in table 1. Fig. 8 and 9 show examples of two possibilities. All sub-partitions satisfy the condition of having at least 16 samples. For block sizes, 4xN or Nx4 (where N > 8), if allowed, there may be 1xN or Nx1 sub-division.
Table 1: the number of sub-divisions according to block size (maximum transform size represented by maxtfsize)
For each of these sub-partitions, a residual signal is generated by entropy decoding the coefficients transmitted by the encoder, followed by inverse quantization and inverse transformation. Then, the sub-partitions are intra-predicted, and finally the corresponding reconstructed samples are obtained by adding the residual signal to the prediction signal. Thus, the reconstructed value of each sub-partition will be available to generate the next prediction, which will repeat the process, and so on. All sub-partitions share the same intra mode.
Table 2: according to the specifications of trTypeHor and trTypeVer of predModeintra
2.2.4.1. Grammar and semantics
The following bold and underlined sections indicate proposed modifications to the standard.
7.3.7.5 codec unit syntax
Intra_sub-partitions_mode_flag [ x0] [ y0] equal to 1 specifies that the current intra coding unit is partitioned into numintra sub-partitions of rectangular transform blocks [ x0] [ y0 ]. Intra_sub_modes_mode_flag [ x0] [ y0] equal to 0 specifies that the current intra coding unit is not partitioned into rectangular transform block sub-partitions.
When intra_sub_modes_flag [ x0] [ y0] does not exist, it is inferred to be equal to 0.
intra_sub_split_flag [ x0] [ y0] specifies whether the intra subdivision partition type is horizontal or vertical. When the intra_sub_split_flag [ x0] [ y0] is not present, it is inferred as follows:
-if cbHeight is greater than MaxTbSizeY, intra_sub_split_flag [ x0] [ y0] is inferred to be equal to 0.
Otherwise (cbWidth greater than MaxTbSizeY), intra_sub_split_flag [ x0] [ y0] is inferred to be equal to 1.
The variable intrasubpartitionsplit type specifies the type of partition for the current luma codec block as shown in table 3. The intrasubpartitionsplit type is derived as follows:
-if intra_sub_modes_flag [ x0] [ y0] is equal to 0, intra_sub_partitionsplit type is set equal to 0.
Otherwise, the intra_sub-partitionsplit type is set equal to 1+intra_sub-partitionsplit_flag [ x0] [ y0].
TABLE 3 name association of IntraIntSubPartification SplitType
IntraSubPartitionsSplitType Name of IntraParticSplitType
0 ISP_NO_SPLIT
1 ISP_HOR_SPLIT
2 ISP_VER_SPLIT
The variable numintra sub-partitions specifies the number of transform block sub-partitions into which the intra luma codec block is partitioned. NumIntraPartification is derived as follows:
-if the intrasubtitionsplit type is equal to isp_no_split, numintrasubtitions is set equal to 1.
Otherwise, numintrasub is set equal to 2 if one of the following conditions is true:
cbWidth is equal to 4, and cbHeight is equal to 8,
cbWidth is equal to 8 and cbHeight is equal to 4.
Otherwise, numintrasub is set equal to 4.
Transform codec in VVC
Multiple Transformation Sets (MTS) in VVC
2.3.1.1. Explicit Multiple Transform Set (MTS)
In some embodiments, where the block size is a maximum of 64 x 64 large block size transform, this is primarily used for higher resolution video, such as 1080p and 4K sequences. For transform blocks of size (width or height, or both) equal to 64, the high frequency transform coefficients are zeroed out so that only the low frequency coefficients remain. For example, for an mxn transform block, with M being the block width and N being the block height, only the left 32 columns of transform coefficients are reserved when M equals 64. Similarly, when N equals 64, only the top 32 rows of transform coefficients are retained. When the transform skip mode is used for large blocks, the entire block will be used without zeroing any value.
In addition to DCT-II, which has been employed in HEVC, a Multiple Transform Selection (MTS) scheme is used to residual codec inter and intra codec blocks. It uses a number of selected transforms from DCT8/DST 7. The newly introduced transformation matrices are DST-VII and DCT-VIII. Table 4 below shows the selected base functions of the DST/DCT.
Table 4: basis functions of transformation matrix used in VVC
In order to maintain orthogonality of the transform matrix, the transform matrix is quantized more precisely than in HEVC. In order to keep the intermediate values of the transform coefficients within a 16-bit range, all coefficients have 10 bits after horizontal and vertical transforms.
To control the MTS scheme, separate enable flags are specified at the SPS level for intra and inter frames, respectively. When MTS is enabled at SPS, CU level flags are signaled to indicate whether MTS is applied. Here, MTS is applicable only to brightness. The MTS CU level flag is signaled when the following condition is satisfied:
-both width and height are less than or equal to 32
-CBF flag equal to 1
If the MTS CU flag is equal to zero, DCT2 is applied in both directions. However, if the MTS CU flag is equal to 1, then additional signaling signals two other flags to indicate the type of transform in the horizontal and vertical directions, respectively. The transformation and signaling mapping table is shown in table 5. When referring to transform matrix accuracy, an 8-bit primary transform core is used. Thus, all transform cores used in HEVC remain the same, including 4-point DCT-2 and DST-7, 8-point, 16-point and 32-point DCT-2. In addition, other transform cores including 64-point DCT-2, 4-point DCT-8, 8-point, 16-point, 32-point DST-7, and DCT-8 use 8-bit primary transform cores.
Table 5: mapping of the decoded values of tu_mts_idx to corresponding transform matrices in the horizontal and vertical directions
To reduce the complexity of large-sized DST-7 and DCT-8, the high frequency transform coefficients are zeroed out for DST-7 and DCT-8 blocks of size (width or height, or both) equal to 32. Only coefficients in the 16x16 low frequency region are retained.
Besides the case of applying different transforms, VVC also supports a mode called Transform Skip (TS), similar to the concept of TS in HEVC. TS is considered a special case of MTS.
2.3.2. Simplified secondary transformation (RST)
2.3.2.1. Inseparable secondary transformation (NSST)
In some embodiments, the secondary transform is applied between the primary transform and quantization (at the encoder) and between dequantization and inverse primary transform (at the decoder side). As shown in fig. 10, 4x4 (or 8x 8) secondary transforms are performed according to block sizes. For example, for each 8x8 block, a 4x4 quadratic transform is applied to a small block (e.g., min (height) < 8), and an 8x8 quadratic transform is applied to a larger block (e.g., min (height) > 4).
The application of the inseparable transformations is described below using inputs as examples. To apply the inseparable transform, a 4X4 input block X
Is first expressed as a vector
The inseparable transformation is calculated asWherein->The transform coefficient vector is indicated, and T is a 16x16 transform matrix. The 16x1 coefficient vector is then used in the scan order (horizontal, vertical or diagonal) of the block>Reorganization is 4x4 blocks. Coefficients with smaller indices will be placed in a 4x4 coefficient block along with the smaller scan index. There are a total of 35 transform sets and each transform set uses 3 indivisible transform matrices (kernels). The mapping from intra prediction modes to transform sets is predefined. For each transform set, the selected inseparable secondary transform (NSST) candidates are further specified by explicitly signaled secondary transform indexes. After transforming the coefficients, the index is signaled once per intra CU in the bitstream.
2.3.2.2. Example simplified quadratic transformation (RST)
RST (also known as low frequency inseparable transform (LFNST)) is mapped using 4 transform sets instead of 35 transform sets. In some embodiments, 16x64 (further abbreviated as 16x 48) and 16x16 matrices are utilized. For ease of notation, the 16x64 (abbreviated 16x 48) transition is denoted as RST 8x8, and the 16x16 transition is denoted as RST 4x4. Fig. 11 shows an example of RST.
2.3.2.2.1.RST calculation
The main idea of the Reduced Transformation (RT) is to map N-dimensional vectors to R-dimensional vectors in different spaces, where R/N (R < N) is a reduction factor.
The RT matrix is an r×n matrix as follows:
where the R rows of the transform are R bases of the N-dimensional space. The inverse transform matrix of RT is the transpose of its positive transform. The positive RT and the reverse RT are depicted in fig. 12.
RST 8x8 with a reduction factor of 4 (1/4 size) may be applied. Thus, instead of 64x64, a 16x64 direct matrix is used, which is a conventional 8x8 indivisible transform matrix size. In other words, the core (primary) transform coefficients in the 8×8 left top region are generated using the 64×16 inverse RST matrix on the decoder side. Positive RST 8x8 uses a 16x64 (or 8x64 for 8x8 blocks) matrix such that it produces non-zero coefficients only in the top left 4x4 region within a given 8x8 region. In other words, if RST is applied, the 8×8 regions other than the top left 4×4 region will have only zero coefficients. For RST 4x4, 16x16 (or 8x16 for 4x4 blocks) direct matrix multiplication is applied.
The inverse RST is conditionally applied when the following two conditions are met:
-the block size is greater than or equal to a given threshold (W > =4 & & H > =4)
-the transform skip mode flag equals zero
If both the width (W) and the height (H) of the transform coefficient block are greater than 4, RST 8x8 is applied to the top left 8x8 region of the transform coefficient block. Otherwise, RST 4x4 is applied to the left top min (8,W) x min (8,H) region of the transform coefficient block.
If the RST index is equal to 0, then RST is not applied. Otherwise, RST is applied, whose core is selected along with the RST index. The RST selection method and the encoding and decoding of RST indexes will be explained later.
Furthermore, RST is applied to intra CUs in intra and inter slices, as well as luminance and chrominance. If dual trees are enabled, the RST indexes for luminance and chrominance are signaled separately. For inter-band (dual tree disabled), a single RST index is signaled and used for luma and chroma.
2.3.2.2.2. restriction of RST
When ISP mode is selected, RST is disabled and RST index is not signaled, because performance improvement is minimal even if RST is applied to every possible chunk. Furthermore, disabling RST for ISP prediction residues may reduce coding complexity.
2.3.2.2.3.RST selection
The RST matrix is selected from four transform sets, each consisting of two transforms. Which transform set to apply is determined from the intra prediction mode as follows:
(1) If one of the three CCLM modes is indicated, transform set 0 is selected.
(2) Otherwise, the transform set selection is performed according to the following table:
transform set selection table
IntraPredMode Transform set index
IntraPredMode<0 1
0<=IntraPredMode<=1 0
2<=IntraPredMode<=12 1
13<=IntraPredMode<=23 2
24<=IntraPredMode<=44 3
45<=IntraPredMode<=55 2
56<=IntraPredMode 1
The index of the access table, denoted IntraPredMode, range [ -14,83], is a transform mode index for wide-angle intra prediction.
2.3.2.2.4. RST matrix of simplified size
As a further simplification, a 16x48 matrix is applied instead of 16x64 with the same transform set configuration, each matrix taking 48 input data from three 4x4 blocks excluding the bottom right 4x4 block from the top left 8x8 block, such as shown in fig. 13.
Rst signaling notification of 2.3.2.5
Positive RST 8x8 (where r=16) uses a 16×64 matrix such that it produces non-zero coefficients only in the top left 4×4 region of a given 8×8 region. In other words, if RST is applied, only zero coefficients are generated for 8×8 regions except for the top left 4×4 region. As a result, when any non-zero elements are detected within an 8x8 block region (which is depicted in fig. 14) other than the top left 4x4, the RST index is not encoded, as this implies that RST is not applied. In this case, the RST index is inferred to be zero.
2.3.2.2.6. Zero return range
In general, any coefficient in a 4x4 sub-block may be non-zero before applying the inverse RST to the 4x4 sub-block. However, in some cases, it is constrained that some coefficients in the 4x4 sub-block are zero before the inverse RST is applied to the sub-block.
Let nonZeroSize be the variable. When any coefficient having an index not less than nonZeroSize is rearranged into a 1-D array before the RST is inverted, the any coefficient may be zero.
When nonZeroSize is equal to 16, the coefficients in the top left 4 x 4 sub-block have no zeroing constraint.
In some embodiments, when the current block size is 4×4 or 8×8, the nonZeroSize is set equal to 8. For other block sizes, the nonZeroSize is set equal to 16.
Description of RST.2.2.7
The following bold and underlined sections indicate proposed modifications to the standard.
7.3.2.3 sequence parameter set RBSP syntax
7.3.7.11 residual coding syntax
/>
7.3.7.5 codec unit syntax
/>
The sps_st_enabled_flag designation st_idx equal to 1 may exist in a residual codec syntax of an intra codec unit. The sps_st_enabled_flag equal to 0 specifies that st_idx is not present in the residual codec syntax of the intra codec unit.
st_idx [ x0] [ y0] specifies which secondary transform core is applied between two candidate cores in the selected transform set. St_idx [ x0] [ y0] equal to 0 specifies that no secondary transform is applied. The array indices x0, y0 specify the position (x 0, y 0) of the transform block under consideration relative to the left top sample of the picture.
When st_idx [ x0] [ y0] is not present, st_idx [ x0] [ y0] is inferred to be equal to 0.
The binary bits of st_idx are context-codec. More specifically, the following applies:
TABLE 5 syntax elements and associated binarization
TABLE 6 assignment of ctxInc to syntax element with context codec bits
Derivation process of ctxInc of 9.5.4.2.8 syntax element st_idx
The inputs to this process are the color component index cIdx, the luma or chroma position (x 0, y 0) of the left top sample of the current luma or chroma codec block relative to the left top sample of the current picture specified according to cIdx, the luma intra prediction mode IntraPredModeY [ x0] [ y0] specified in the tree type tretype 8.4.2, the intra prediction mode syntax element intra_chroma_pred_mode [ x0] [ y0] of the specified chroma sample point specified in the 7.4.7.5, and the multiple transform selection index tu_mts_idx [ x0] [ y0].
The output of this process is the variable ctxInc.
The variable intemodeCTx is derived as follows:
if cIdx is equal to 0, then interModeCTx is derived as follows:
intraModeCtx=(IntraPredModeY[x0][y0]<=1)?1:0
otherwise (cIdx is greater than 0), intermodecctx is derived as follows:
intraModeCtx=(intra_chroma_pred_mode[x0][y0]>=4)?1:0
the variable mtsCtx is derived as follows:
mtsCtx=(tu_mts_idx[x0][y0]==0&&treeType!=SINGLE_TREE)?
1:0
the variable ctxInc is derived as follows:
ctxInc=(binIdx<<1)+intraModeCtx+(mtsCtx<<2)
overview of RST use
The following bold and underlined sections indicate proposed modifications to the standard.
Only whenThe number of non-zero coefficients in a block is greater than 2 and 1 for a single tree and an individual tree, respectively When RST is enabled. Furthermore, when RST is enabled, a set of codecs that apply to RST may be appliedThe following limitation of the position of the non-zero Coefficients of (CG).
Table 7: RST use
2.3.3. Sub-block transform
For an inter prediction CU with cu_cbf equal to 1, the cu_sbt_flag may be signaled to indicate whether to decode the entire residual block or a sub-portion of the residual block. In the former case, the inter MTS information is further parsed to determine the transform type of the CU. In the latter case (e.g., SBT enabled), a portion of the residual block is encoded with the inferred adaptive transform and another portion of the residual block is zeroed. SBT is not applied to combine inter-intra mode and triangle prediction mode.
In the sub-block transform, a position dependent transform is applied to the luminance transform blocks in SBT-V and SBT-H (chrominance TB always uses DCT-2). The two positions of SBT-H and SBT-V are associated with different core transformations. More specifically, the horizontal and vertical transforms for each SBT position are specified in fig. 15. For example, the horizontal and vertical transforms for SBT-V position 0 are DCT-8 and DST-7, respectively. When one side of the residual TU is greater than 32, the corresponding transform is set to DCT-2. Thus, the sub-block transform jointly specifies the TU tiling (tiling), cbf, and horizontal and vertical transforms of the residual block, which can be considered as a syntax shortcut for the case where the main residual of the block is on one side of the block.
2.3.3.1. Syntax element
The following bold and underlined sections indicate proposed modifications to the standard.
7.3.7.5 codec unit syntax
/>
A cu_sbt_flag equal to 1 specifies that for the current codec unit, a sub-block transform is used. A cu_sbt_flag equal to 0 indicates that no sub-block transform is used for the current coding unit.
When the cu_sbt_flag does not exist, its value is inferred to be equal to 0.
Note that when sub-block transforms are used, the codec unit is divided into two transform units; one transform unit has residual data and the other has no residual data.
The cu_sbt_quad_flag equal to 1 specifies that for the current codec unit, the sub-block transform includes a 1/4-sized transform unit of the current codec unit. The cu_sbt_quad_flag equal to 0 specifies that for the current codec unit, the sub-block transform includes a transform unit of 1/2 size of the current codec unit.
When the cu_sbt_quad_flag does not exist, its value is inferred to be equal to 0.
The cu_sbt_horizontal_flag equal to 1 specifies that the current codec unit is horizontally divided into 2 transform units. The cu_sbt_horizontal_flag [ x0] [ y0] equal to 0 specifies that the current codec unit is vertically divided into 2 transform units.
When the cu_sbt_horizontal_flag is not present, its value is deduced as follows:
-if the cu_sbt_quad_flag is equal to 1, the cu_sbt_horizontal_flag is set equal to allowSbtHorQ.
Otherwise (cu_sbt_quad_flag is equal to 0), cu_sbt_horizontal_flag is set equal to allowSbtHorH.
The cu_sbt_pos_flag equal to 1 specifies that tu_cbf_luma, tu_cbf_cb, and tu_cbf_cr of the first transform unit in the current codec unit are not present in the bitstream. The cu_sbt_pos_flag equal to 0 specifies that tu_cbf_luma, tu_cbf_cb, and tu_cbf_cr of the second transform unit in the current codec unit are not present in the bitstream.
The variable SbtNumFourthsTb0 is derived as follows:
sbtMinNumFourths = cu_sbt_quad_flag ? 1 : 2 (7-117)
SbtNumFourthsTb0=cu_sbt_pos_flag?(4-sbtMinNumFourths):sbtMinNumFourths (7-118)
sps_sbt_max_size_64_flag equal to 0 specifies that the maximum CU width and height that allows sub-block transforms is 32 luma samples. The sps_sbt_max_size_64_flag equal to 1 specifies that the maximum CU width and height that allows sub-block transforms is 64 luma samples.
MaxSbtSize = sps_sbt_max_size_64_flag ? 64 : 32 (7-33)
2.3.4.4 quantized residual block differential pulse code modulation coding (QR-BDPCM)
In some embodiments, quantized residual domain BDPCM (hereinafter referred to as QR-BDPCM) is presented. Unlike BDPCM, intra prediction is performed on an entire block by sample copying in a prediction direction (horizontal or vertical prediction) similar to intra prediction. The residual is quantized and the delta between the quantized residual and its predicted value (horizontal or vertical) quantized value is encoded and decoded.
For a block of size M (row) ×N (column), let r be i,j 0.ltoreq.i.ltoreq.M-1, 0.ltoreq.j.ltoreq.N-1, is the prediction residual after intra prediction is performed using unfiltered samples from the top or left block boundary samples, either horizontally (copy left neighbor pixel values row by row across the prediction block) or vertically (copy top neighbor line to each line in the prediction block). Let Q (r) i,j ) 0.ltoreq.i.ltoreq.M-1, 0.ltoreq.j.ltoreq.N-1 represents residual error r i,j Wherein the residual is the difference between the original block and the predicted block value. The block DPCM is then applied to quantized residual samples, resulting in a block with elementsModified MXN array of (A)>When the vertical BDPCM is signaled:
for horizontal prediction, a similar rule applies, and residual quantization samples are obtained by
Residual quantization sampling pointIs transmitted to the decoder.
On the decoder side, the above calculations are reversed to produce Q (r i,j ) I is more than or equal to 0 and less than or equal to M-1, j is more than or equal to 0 and less than or equal to N-1. In the case of a vertical prediction,
in the case of a horizontal situation,
inverse quantization residual Q -1 (Q(r i,j ) To the intra block prediction value to produce reconstructed sample values.
Transform skipping is always used in QR-BDPCM.
2.4. Entropy coding and decoding of coefficients
2.4.1. Coefficient coding and decoding of transform application block
In HEVC, transform coefficients of a codec block are encoded using non-overlapping sets of coefficients (CG, or sub-blocks), and each CG contains coefficients of a 4x4 block of the codec block. The CG within the codec block and the transform coefficients within the CG are encoded and decoded according to a predefined scan order.
The CG within the codec block and the transform coefficients within the CG are encoded and decoded according to a predefined scan order. Both CG and coefficients within CG follow a diagonal top right scan order. Examples of 4x4 blocks and 8x8 scan sequences are depicted in fig. 16 and 17, respectively.
Note that the codec sequence is the reverse scan sequence (e.g., decoding from CG3 to CG0 in fig. 17), and when decoding one block, the coordinates of the last non-zero coefficient are decoded first.
The encoding and decoding of transform coefficient levels of a CG having at least one non-zero transform coefficient may be separated into multiple scan passes (pass). In VVC3, for each CG, a normal codec binary bit and a bypass (bypass) codec binary bit are separated in codec order; first, all regular codec bits of the sub-block are transmitted, after which bypass codec bits are transmitted. The transform coefficient level of the sub-block is encoded and decoded in five passes at the scan position as follows:
-pass 1: the encoding and decoding of importance (sig_flag), greater than 1 flag (gt 1_flag), parity (par_level_flag), and greater than 2 flag (gt 2_flag) are processed in the encoding and decoding order. If sig_flag is equal to 1, the gt1_flag is first encoded (whether the absolute level is greater than 1 is specified). If gt1_flag is equal to 1, additional codec is performed for par_flag (which specifies the absolute level minus 2 parity).
-pass 2: the codec of the remaining absolute level (remainder) is processed for all scan positions having a gt2_ flag equal to 1 or a gt1_ flag equal to 1. Binarizing the non-binary syntax element with a Golomb-Rice code, and encoding and decoding the generated binary bits in a bypass mode of an arithmetic codec engine.
-pass 3: the absolute level (absLevel) of the coefficient for which no sig_flag was encoded in the first pass (due to the constraint of the conventional codec bits being reached) is fully encoded using Golomb-Rice code in the bypass mode of the arithmetic codec engine.
-pass 4: coding and decoding of symbols (sign_flag) of all scan positions having sig_coeff_flag equal to 1. For the 4x4 sub-block, it is guaranteed that not more than 32 conventional codec bits (sig_flag, par_flag, gt1_flag, and gt 2_flag) are encoded or decoded. For a 2x2 chroma sub-block, the number of conventional codec bits is limited to 8.
The Rice parameter (ricePar) for encoding and decoding of the non-binary syntax element remainder (in pass 3) is derived similar to HEVC. At the beginning of each sub-block, ricePar is set equal to 0. After encoding and decoding the remainder of the syntax elements, the Rice parameter is modified according to a predefined equation. For encoding and decoding the non-binary syntax element absLevel (in pass 4), the sum of absolute values in the local template sumAbs is determined. The variables ricePar and posZero are determined by look-up tables based on dependent quantization and sumAbs. The intermediate variable codeValue is derived as follows:
-if absLevel [ k ] is equal to 0, codeValue is set equal to posZero;
-otherwise, if absLevel [ k ] is less than or equal to posZero, codeValue is set equal to absLevel [ k ] -1;
otherwise (absLevel [ k ] is greater than posZero), codeValue is set equal to absLevel [ k ].
The value of codeValue is encoded using a Golomb-Rice code with Rice parameter ricePar.
2.4.1.1. Context modeling for coefficient codec
The selection of the probabilistic model of the syntax element related to the absolute value of the transform coefficient level depends on the value of the absolute level in the local neighborhood or of the absolute level of the partial reconstruction. The template used is shown in fig. 18.
The probability model selected depends on the sum of the absolute levels in the local neighborhood (or absolute levels of partial reconstruction) and the number of absolute levels in the local neighborhood greater than 0 (given by the number of sig_coeff_flags equal to 1). Context modeling and binarization depend on the following measurements of the local neighborhood:
numSig: the number of non-zero levels in the local neighborhood,
sumAbs1: the sum of the absolute levels of partial reconstruction (absLevel 1) after the first pass in the local neighborhood,
submab: sum of absolute levels of reconstruction in local neighborhood, and
Diagonal position (d): the sum of the horizontal and vertical coordinates of the current scan position within the block is transformed.
Based on the values of numSig, sumAbs1 and d, probability models for encoding and decoding sig_flag, par_flag, gt1_flag, and gt2_flag are selected. The Rice parameter used to binarize abs_remain is selected based on the values of sumAbs and numSig.
2.4.1.2. Dependent Quantization (DQ)
Furthermore, the same HEVC scalar quantization is used with a new concept, called dependent scalar quantization. By scalar quantization is meant a method in which the set of allowable reconstruction values for a transform coefficient depends on the values of the transform coefficient levels preceding the current transform coefficient level in the reconstruction order. The main effect of this approach is to allow the reconstructed vector to be more densely packed in an N-dimensional vector space (N represents the number of transform coefficients in the transform block) than the traditional independent scalar quantization used in HEVC. This means that for a given average number of allowable reconstruction vectors per N-dimensional unit volume, the average distortion between the input vector and the nearest reconstruction vector is reduced. The scalar quantization-dependent method is implemented by: (a) Defining two scalar quantizers with different reconstruction levels, and (b) defining a switching process between the two scalar quantizers.
The two scalar quantizers used, denoted Q0 and Q1, are shown in fig. 19. The position of the available reconstruction level is uniquely specified by the quantization step size delta. The scalar quantizer (Q0 or Q1) used is not explicitly signaled in the bitstream. Instead, the quantizer for the current transform coefficient is determined by the parity of the transform coefficient level preceding the current transform coefficient in the codec/reconstruction order.
As shown in fig. 20, switching between the two scalar quantizers (Q0 and Q1) is implemented via a state machine having four states. The state may take four different values: 0. 1, 2 and 3. It is uniquely determined by the parity of the transform coefficient level preceding the current transform coefficient in the codec/reconstruction order. At the beginning of the inverse quantization of the transform block, the state is set equal to 0. The transform coefficients are reconstructed in scan order (e.g., in the same order in which they were entropy decoded). After the current transform coefficients are reconstructed, the state is updated, as shown in fig. 20, where k represents the value of the transform coefficient level.
2.4.1.3. Grammar and semantics
The following section presents the syntax design of residual (transform coefficient) codec.
7.3.7.11 residual coding syntax
/>
/>
/>
Coefficient codec for TS codec block and QR-BDPCM codec block
2.4.2.1. Coding and decoding pass
Modified transform coefficient level coding of TS residual. If each CG contains non-zero coefficients, multiple passes are applied to codec each CG:
pass 1: if necessary, coding and decoding a flag greater than 0 (sig_coeff_flag), a symbol flag (coeff_sign_flag), a flag greater than 1 (abs_level_gtx_flag [0 ]), and a parity flag (par_level_flag);
pass 2-5: for the j-th pass, encoding and decoding flags (abs_level_gtx_flag [ j-1 ]) greater than (2*j);
pass 6: encoding and decoding the remainder (abs_remainders) of the magnitudes of the coefficients
2.4.2.2. Overview of changes compared to non-TS residual codec
The residual codec of the TS includes the following changes with respect to the conventional residual codec case:
(1) There is no signaling of the last x/y position;
(2) When all previous flags are equal to 0, coded_sub_block_flag is encoded and decoded for each sub block except for the last sub block;
(3) Modeling by using sig_coeff_flag context of the simplified template;
(4) Single context models of abs_level_gtx_flag [0] and par_level_flag;
(5) Context modeling of symbol flags, appending flags greater than 5, 7, 9;
(6) Deriving modified Rice parameters of remainder binarization;
(7) And limiting the number of the context coding binary bits of each sample, wherein each sample in one block is 2 binary bits.
2.4.2.3. Grammar and semantics
The following bold and underlined sections indicate proposed modifications to the standard.
7.3.6.10 transform unit grammar
/>
/>
The number of context codec bits is limited to no more than 2 bits per sample for each CG.
TABLE 8 assignment of ctxInc to syntax element with context codec bits
TABLE 8 assignment of ctxInc to syntax element with context codec bits
2.4.2.4. Context modeling
The context modeling is defined as follows:
coded_sub_block_flag: two adjacent CG, upper and left (but not right and lower)
3. Disadvantages of the prior embodiments
The current design has the following problems:
1. to meet the throughput associated with context-adaptive binary arithmetic coding (CABAC), a maximum number of context codec bits per block/per sample is set, and a counter is used to record how many context codec bits are included within one block. Thus, the first few CGs (from top left to bottom right) can be efficiently encoded with context, while the last few CGs, which typically have higher energy (larger prediction error), can be encoded in bypass mode. Thus, performance may be suboptimal.
2. For coefficients in some CG, a bypass codec mode may be applied. However, it still requires the transmission of several syntax elements, which may be inefficient.
4. Example techniques and embodiments
The detailed embodiments described below should be considered as examples explaining the general concepts. These embodiments should not be construed narrowly. Furthermore, the embodiments may be combined in any manner.
1. Instead of adding a constraint of the maximum number of context codec bits per codec block or transform block, a constraint of adding the maximum number of codec bits per Coding Group (CG) is proposed.
a. In one example, a counter is used to record the number of context codec bits per CG.
i. Further, alternatively, the counter is reset to 0 when a new CG is encoded/decoded.
in one example, if the counter is equal to or greater than the maximum number, all binary bits in the CG that have not yet been encoded/parsed will be encoded/parsed using bypass encoding/parsing.
b. In one example, the maximum number of codec bits for different CGs may be different.
c. In one example, the maximum number of context codec bits per CG may depend on the CG's position relative to the entire block.
2. The maximum number of context codec bits (denoted maxCbinB) may be controlled in two steps. First, it may be signaled which CGs contain non-zero coefficients (e.g., coded sub block flag). Second, non-zero CG is signaled and the maximum number of context codec bits (denoted maxCbinCG) can be constrained for each non-zero CG.
a. In one example, assuming that N context codec bits are used for coding coded_sub_block_flag of CG that needs to signal sub_block flag, and there are K non-zero CGs, maxCbinCG may be set equal to floor (maxCbinB-N)/K.
i. The division here may be implemented by a look-up table.
b. In one example, maxCbinCG may be different for different CGs. For example, maxCbinCG may increase as the number of CGs increases.
3. Instead of adding a constraint of the maximum number of context codec bits per codec block or transform block, it is proposed to add a constraint of the maximum number of codec bits per syntax element or per codec pass.
a. In one example, multiple variables may be assigned.
i. In one example, one variable may correspond to a counter of the number of context codec bits used to record a particular syntax element/codec pass.
Further alternatively, the counter may be reset to 0 when encoding/decoding the block.
in one example, if the counter is equal to or greater than the maximum number, all binary bits in the syntax element or codec pass that have not yet been encoded/parsed will be encoded/parsed with bypass encoding/parsing.
b. In one example, the maximum value of different syntax elements/codec passes may be different.
c. In one example, the maximum number of contextual codec bits per syntax element may depend on the syntax element.
d. In one example, the maximum number of codec bits per codec pass may depend on the codec pass.
4. Multiple sets of residual codec methods (e.g., different syntax elements, different binarization methods, number of codec passes) may be used to codec a block, and switching between different methods may depend on the number of context codec bits. The maximum context codec binary bit for each unit (e.g., codec unit/transform unit/codec group) is represented by MaxCcBins.
a. In one example, the current design may be utilized when the number of context codec bits per cell is not greater than (MaxCcBins-TH 1) after the first syntax element (e.g., a symbol flag such as coeff_sign_flag) is encoded. Otherwise, a different codec method may be applied to codec the following syntax element or the next CG or the next sub-area. TH1 is an integer, for example 1, 2, 3, 4.
b. In one example, the current design may be utilized when the number of context codec bits per cell after the mth pass is encoded is no greater than (MaxCcBins-TH 1). Otherwise, a different codec method may be applied to codec the (m+n) th pass or the next CG or next sub-region, where n is an integer variable. TH1 is an integer, for example 1, 2, 3, 4.
i. In one example, m is set to 1, 2, 3, 4, 5.
c. In one example, the current design may be used when the number of context codec bits per cell after encoding and decoding the CG is no greater than (MaxCcBins-TH 1). Otherwise, a different codec method may be applied to codec the next CG or the next sub-area. TH1 is an integer, for example 1, 2, 3, 4.
d. In one example, a codec method with L (L |=6) passes may be applied.
i. In one example, L is set to 1, 2, 3, 4, 5.
in one example, pass 2-5 may be skipped.
in one example, pass 1-5 may be skipped.
in one example, all coefficients in the CG or block are scanned and/or encoded in each pass, if needed.
e. In one example, a codec method with different syntax elements may be applied.
i. In one example, the parity flag is not encoded in the kth pass. For example, k=1.
in one example, the symbol flag is not encoded in the kth pass. For example, k=1.
in one example, the coefficients may be directly binarized and encoded.
1) The value of one coefficient is denoted as x, and the value of (x >02x: 2x+1) or (x > =02 x: 2x+1) may be binarized and encoded.
in one example, the magnitude of a coefficient may be directly binarized and its sign value may be further encoded.
In one example, a run Cheng Jibie codec may be applied, where a "run" indicates how many consecutive zero coefficients are in a given scan order, and a "level" indicates the magnitude of the non-zero coefficients.
f. In one example, codec methods with different binarization methods may be applied.
i. In one example, rice parameters and/or EG codes and/or Golomb-Rice may be derived differently for codec residual magnitudes.
g. In one example, TH1 in items 4a, 4b, and 4c may be a predefined integer value.
i. In one example, TH1 may be zero or a positive value, such as 0, 1, 2, 3, 4.
in one example, TH1 may be zero or negative, such as 0, -1, -2, -3, -4.
in one example, TH1 may depend on quantization parameters/codec block mode/block size/stripe type/picture type, etc.
5. The above method may be applied to a TS codec block.
a. Further, they may alternatively be applied to QR-BDPCM codec blocks.
b. Furthermore, they may alternatively be applied to other blocks of the codec without applying the transform.
c. Furthermore, they may alternatively be applied to other blocks of the codec where the transform is applied.
d. Alternatively, the above method may be applied to a block having a lossless codec mode, e.g., the transform and quantization process is not applied to the block.
6. Enabling or disabling the above method may be signaled in DPS/SPS/PPS/APS/VPS/sequence header/picture header/slice group header/slice/CTU group, etc.
a. Alternatively, which method to use may be signaled in DPS/SPS/PPS/APS/VPS/sequence header/picture header/slice group header/slice/CTU group, etc.
b. Alternatively, whether the above method is enabled or disabled and/or which method is applied may depend on block size, virtual Pipeline Data Unit (VPDU), picture type, low latency check flag.
c. Alternatively, whether to enable or disable the above method and/or which method to apply may depend on color components, color formats, and the like.
d. Alternatively, whether the above methods are enabled or disabled and/or which method is applied may depend on whether QR-BDPCM or TS is applied.
7. In one example, the maximum number of context codec bits per block or per CG or per codec or per syntax element may depend on the color component.
8. In one example, the maximum number of context codec bits per block or per CG or per codec or per syntax element may depend on the slice/group/picture type.
9. In one example, the maximum number of contextual codec bits per block or per CG or per codec or per syntax element may be different for different profiles/levels (tier) in the standard.
10. In one example, the maximum number of context codec bits per block or per CG or per codec or per syntax element may be signaled in the video unit (such as in DPS/SPS/PPS/APS/VPS/sequence header/picture header/slice group header/slice/CTU group, etc.).
11. In one example, the maximum number of context codec bits per block (such as per CTU or CU or TU or CG) may depend on the size of the block (such as width and/or height of CTU or CU or TU or CG).
a. Further, alternatively, the maximum number of context codec bits per block (such as per CTU or CU or TU or CG) may depend on the resolution of the picture, such as the width and/or height of the picture.
5. Example implementations of the disclosed technology
Fig. 21 is a block diagram of a video processing apparatus 2100. The apparatus 2100 may be used to implement one or more methods described herein. The apparatus 2100 may be embodied in a smart phone, tablet, computer, internet of things (IoT) receiver, or the like. The apparatus 2100 may include one or more processors 2102, one or more memories 2104, and video processing hardware 2106. The processor(s) 2102 may be configured to implement one or more methods described in this document. The memory(s) 2104 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 2106 may be used to implement some of the techniques described in this document in hardware circuitry and may be part or all of processor 2102 (e.g., a graphics processor core GPU or other signal processing circuitry).
In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during the transition from a pixel representation of the video to a corresponding bit stream representation, and vice versa. The bitstream representation of the current video block may, for example, correspond to bits collocated or scattered in different places within the bitstream, as defined by syntax. For example, a video block may be encoded according to transform and codec error residual values and also using bits in the header and other fields in the bitstream. Here, the video block may be a logical unit corresponding to the processing operation to be performed, such as a codec unit, a transform unit, a prediction unit, or the like.
It should be appreciated that by allowing use of the techniques disclosed in this document, the disclosed methods and techniques will be beneficial to video encoder and/or decoder embodiments incorporated within video processing devices such as smartphones, laptops, desktops, and similar devices.
Fig. 22 is a flow chart of an example method 2200 of visual media processing. The method 2200 includes, at 2210, performing a transition between the current video unit and a bitstream representation of the current video unit, wherein the transition includes contextually modeling the current video unit based on applying a constraint on a maximum number of context codec bits for each Coding Group (CG) associated with the current video unit, wherein information of the contextually modeling is included in the bitstream representation of the current video unit.
Some embodiments may be described using the following clause-based format.
1. A method of visual media processing, comprising:
performing a transition between the current video unit and a bitstream representation of the current video unit, wherein the transition includes contextually modeling the current video unit based on applying a constraint on a maximum number of contextual codec bits for each Coding Group (CG) associated with the current video unit, wherein information of the contextually modeling is included in the bitstream representation of the current video unit.
2. The method of clause 1, wherein a counter is used to record the number of context codec bits per CG.
3. The method of clause 2, wherein the counter is reset to zero when a new CG is encoded or decoded.
4. The method of clause 2, further comprising:
when it is determined that the value of the counter is greater than the maximum number of context codec bits per CG, a bypass codec step is applied to one or more non-codec context codec bits.
5. The method of clause 1, wherein the maximum number of contextual codec bits per CG is different for different Coding Groups (CGs).
6. The method of clause 1, wherein the maximum number of contextual codec bits per CG is based at least in part on the location of the CG relative to the current video unit.
7. The method of clause 1, further comprising:
identifying one or more CGs having non-zero coefficients, wherein a constraint on a maximum number of context codec bits for each coding set (CG) is applied to the one or more CGs having non-zero coefficients; and
one or more CGs with non-zero coefficients are signaled in the bitstream representation.
8. A method of visual media processing, comprising:
performing a transition between the current video unit and a bitstream representation of the current video unit, wherein the transition includes contextually modeling the current video unit based on applying a constraint on a maximum number of contextual codec bits for each syntax element or each codec pass associated with the current video unit, wherein contextually modeled information is included in the bitstream representation of the current video unit.
9. The method of clause 8, wherein a counter is used to record the number of contextual codec binary bits per syntax element or per codec pass.
10. The method of clause 9, wherein the counter is reset to zero when a new syntax element or codec pass is encoded or decoded.
11. The method of clause 8, wherein the maximum number of contextual codec bits per syntax element or per codec pass is different for different syntax elements or codec passes.
12. The method of clause 8, wherein the maximum number of contextual codec bits per CG is based at least in part on a syntax element or codec pass associated with the current video unit.
13. A method of visual media processing, comprising:
performing a conversion between the current video unit and the bitstream representation of the current video unit, wherein the conversion comprises one or more residual codec steps such that each residual codec step is associated with a number of context codec bits for each codec unit; and
during the converting, switching from the first residual codec step to the second residual codec step based at least in part on the first number of context codec bits for each codec unit in the first step and the second number of context codec bits for each codec unit in the second step.
14. The method of clause 13, wherein in each residual codec step, the maximum number of contextual codec bits for each codec unit is constrained to an upper limit indicated in the bitstream representation.
15. The method of clause 14, further comprising:
applying a first residual coding step to the current video unit when a first number is determined to be less than or equal to a maximum number of context coding binary bits per coding unit after coding the syntax element of the current video unit; and
when it is determined that the first number exceeds the maximum number of context codec bits per codec unit, switching from the first residual codec step to the second residual codec step for other video units.
16. The method of clause 14, further comprising:
applying a first residual coding step to the current video unit when a first number is determined to be less than or equal to a maximum number of context coding binary bits per coding unit after the plurality of passes of the current video unit are coded; and
when it is determined that the first number exceeds the maximum number of context codec bits per codec unit, switching from the first residual codec step to the second residual codec step for other video units.
17. The method of clause 14, wherein each residual codec step is associated with a syntax element.
18. The method of clause 14, wherein the syntax elements of the one or more residual codec steps are different from each other.
19. The method of clause 18, wherein the syntax element comprises a parity flag, a sign flag, or a codec coefficient.
20. The method of clause 19, wherein the magnitudes and/or signs of the codec coefficients are expressed in binarized form.
21. The method of clause 20, wherein the one or more residual codec steps include a run Cheng Jibie codec such that the run indicates a number of consecutive zero coefficients in scan order and the level indicates a magnitude of the non-zero coefficient.
22. The method of clause 19, wherein the one or more residual codec steps comprise a binarization process.
23. The method of clause 19, wherein the one or more residual coding steps include rice parameters for coding the non-binary syntax elements.
24. The method of any one or more of clauses 1-23, wherein the current video unit is any one of: a TS codec block, a QR-BDPCM codec block, a block lacking a transformation step, a block associated with a transformation step, or a block associated with a lossless codec process.
25. The method of clause 24, wherein the lossless codec process comprises a lack of a transform step and/or a lack of a quantization step.
26. The method of any one or more of clauses 1-25, wherein the context modeling and/or one or more residual codec steps are selectively enabled or disabled.
27. The method of clause 26, wherein information about selectively enabling or disabling context modeling and/or one or more residual codec steps is included in the bitstream representation.
28. The method of clause 26, wherein selectively enabling or disabling the context modeling and/or the one or more residual codec steps is based at least in part on a condition.
29. The method of clause 28, wherein the condition is associated with any one or more of: the size of the current video unit, the picture type of the current video unit, the Virtual Pipeline Data Unit (VPDU) of the current video unit, or the low latency check flag of the current video unit, the color component or color format of the current video unit, or the codec step associated with the current video unit.
30. The method of clause 29, wherein the codec step associated with the current video unit is QR-BDPCM or TS.
31. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per CG is dependent on the color component of the current video unit.
32. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per syntax element or per codec pass depends on the color component of the current video unit.
33. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per codec unit depends on the color component of the current video unit.
34. The method of any one or more of clauses 1-30, wherein a maximum number of contextual codec bits per CG depends on a slice/group/picture type associated with the current video unit.
35. The method of any one or more of clauses 1-30, wherein a maximum number of contextual codec bits per syntax element or per codec pass depends on a slice/group/picture type associated with the current video unit.
36. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits for each codec unit depends on a slice/group/picture type associated with the current video unit.
37. The method of any one or more of clauses 1-30, wherein the maximum number of context codec bits per CG is dependent on the size of the current video unit or the resolution of the current video unit.
38. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per syntax element or per codec pass depends on the size of the current video unit or the resolution of the current video unit.
39. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per codec unit depends on the size of the current video unit or the resolution of the current video unit.
40. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per CG is dependent on the profile/level/hierarchy of the current video unit.
41. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per syntax element or per codec pass depends on the profile/level/hierarchy of the current video unit.
42. The method of any one or more of clauses 1-30, wherein the maximum number of contextual codec bits per codec unit depends on the profile/level/hierarchy of the current video unit.
43. The method of any one or more of clauses 1-30, wherein the bitstream representation is associated with a DPS/SPS/PPS/APS/VPS/sequence header/picture header/slice group header/slice/Codec Tree Unit (CTU) group.
44. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of clauses 1-42.
45. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any one of clauses 1 to 42.
Fig. 23 is a block diagram illustrating an example video processing system 2300 in which various techniques disclosed herein may be implemented. Various embodiments may include some or all of the components of system 2300. The system 2300 may include an input 2302 for receiving video content. The video content may be received in an original or uncompressed format, such as 8 or 10 bit multi-component pixel values, or may be in a compressed or encoded format. Input 2302 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, passive Optical Network (PON), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.
The system 2300 can include a codec component 2304 that can implement the various codec or encoding methods described in this document. Codec component 2304 may reduce the average bit rate of video from input 2302 to the output of codec component 2304 to produce a codec representation of the video. Codec techniques are therefore sometimes referred to as video compression or video transcoding techniques. The output of codec component 2304 can be stored or transmitted via a communication connection as represented by component 2306. The stored or communicatively transmitted bit stream (or codec) representation of the video received at input 2302 may be used by component 2308 to generate pixel values or to transmit displayable video to display interface 2310. The process of generating user-viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while a particular video processing operation is referred to as a "codec" operation or tool, it will be appreciated that a codec tool or operation is used at the encoder, and that a corresponding decoding tool or operation that inverts the codec results will be performed by the decoder.
Examples of the peripheral bus interface or the display interface may include a Universal Serial Bus (USB), or a High Definition Multimedia Interface (HDMI), or a display port (Displayport), or the like. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described in this document may be embodied in various electronic devices such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and/or video display.
Fig. 24 shows a flowchart of an example method 2400 of video processing in accordance with the present technique. The method 2400 includes, at operation 2410, determining whether to switch from a first residual codec technique to a second residual codec technique based on a number of context codec bits for each unit used in the first residual codec technique during a transition between a block of video including one or more units and a bitstream representation of the video. The coefficients of the unit are encoded in the bitstream representation in multiple passes using either a first residual codec technique or a second residual codec technique. The method 2400 also includes, at operation 2420, performing a conversion based on the determination.
In some embodiments, the unit includes a block, a transformed block within a block, a coded set of blocks, a syntax element of a block. In some embodiments, a first residual codec technique is used for the mth pass, m being a positive integer. In some embodiments, m is 1, 2, 3, 4, or 5. In some embodiments, in the event that the number of context codec bits per cell used in the first residual codec technique is equal to or less than a threshold, no switch is made from the first residual codec technique to the second residual codec technique. In some embodiments, where the number of context codec bits per cell used in the first residual codec is greater than a threshold, a switch to the second residual codec occurs in the (m+n) th pass, subsequent coding set, or subsequent subregion, n being a positive integer.
In some embodiments, a first residual codec technique is used for binary bits decoded in a context codec mode and a second residual codec technique is used for binary bits decoded in a bypass codec mode. In some embodiments, in the context codec mode, the binary bits are processed based on at least one context, and in the bypass codec mode, the binary bits are processed without using any context.
In some embodiments, converting includes processing a plurality of syntax elements of the block, the plurality of syntax elements including at least one symbol flag. In some embodiments, in the event that the number of context codec bits per cell used in the first residual codec technique is equal to or less than a threshold, no switch is made from the first residual codec technique to the second residual codec technique. In some embodiments, the second residual coding technique is used for one or more syntax elements, subsequent coding groups, or subsequent sub-regions of the plurality of syntax elements, in the event that the number of contextual coding bits per coding unit is greater than a threshold. In some embodiments, the first residual codec technique is to process at least the symbol flag if the number of context codec bits per cell used in the first residual codec technique is equal to or less than a threshold, and the second residual codec technique is to process at least the symbol flag if the number of context codec bits per cell used in the first residual codec technique is greater than the threshold.
In some embodiments, the threshold corresponds to a maximum context codec bin for each cell constrained to be the upper limit indicated in the bitstream representation. In some embodiments, the maximum context codec bit for each cell is represented as MaxCcBIns, and the threshold is (MaxCcBIns-TH 1), with TH1 being an integer. In some embodiments, TH1 is zero or a positive integer. In some embodiments, TH1 is zero or a negative integer. In some embodiments, TH1 is based on the codec characteristics associated with the block. In some embodiments, the codec characteristics include at least a quantization parameter, a codec block mode, a block size, a slice type, or a picture type.
In some embodiments, the first residual codec technique or the second residual codec technique includes L passes, L being a positive integer not equal to 6. In some embodiments, L is 1, 2, 3, 4, or 5. In some embodiments, pass 2 through 5 of the first residual codec technique or the second residual codec technique is skipped for encoding and decoding coefficients of the unit. In some embodiments, the 1 st to 5 th passes of the first residual codec technique or the second residual codec technique are skipped for encoding and decoding coefficients of the unit. In some embodiments, all coefficients in a cell are scanned.
In some embodiments, the first residual codec technique or the second residual codec technique includes processing a plurality of syntax elements of the block, the plurality of syntax elements including at least one of a parity flag, a sign flag, a significant flag indicating whether a codec coefficient is zero, an absolute level flag indicating whether an absolute value of the coefficient is greater than a threshold, or one or more binary bits of the codec coefficient. In some embodiments, the parity flag is not encoded in the kth pass. In some embodiments, the symbol flag is not encoded in the kth pass. In some embodiments, k=1.
In some embodiments, the coefficients of the cells are binarized in the bitstream representation. In some embodiments, the values of the coefficients are represented as x, and the coefficients are binarized in the bitstream representation as (x >02 x:2x+1) or (x > = 02 x:2x+1). In some embodiments, the absolute values of the coefficients of the cells are binarized in the bitstream representation, and the signed values of the coefficients are separately encoded and decoded in the bitstream representation. In some embodiments, the bitstream representation includes a first indicator indicating a number of consecutive zero coefficients in scan order and a second indicator indicating an absolute value of a non-zero coefficient. In some embodiments, the first residual codec technique or the second residual codec technique uses a different binarization technique for encoding and decoding coefficients of the unit. In some embodiments, the parameters used to codec the coefficients of the unit are derived differently for different binarization techniques. In some embodiments, the parameters include Rice parameters or Golomb-Rice codes.
Fig. 25 illustrates a flow chart of an example method 2500 of video processing in accordance with the present technique. Method 2500 includes, at operation 2510, performing a transition between a block of video and a bitstream representation of the video. The block includes one or more encoded code groups, and the current block is encoded in the bitstream representation based on a constraint on a maximum number of context encoded binary bits for each encoded code group.
In some embodiments, the method includes recording a number of context codec bits for the codec group using a counter. In some embodiments, the method includes resetting the counter prior to processing the subsequent encoded code sets. In some embodiments, the method includes applying the bypass step to one or more non-encoded contextual codec bits in the encoded code set if the counter is equal to or greater than a maximum number of contextual codec bits of the encoded code set.
In some embodiments, the maximum number of contextual codec bits per coding set is different for different coding sets. In some embodiments, the encoding groups are associated with encoding group indexes, and the maximum number of contextual encoding bits for each encoding group increases as the value of the corresponding encoding group index increases.
In some embodiments, the maximum number of context codec bits for each coding group is based on the position of the corresponding coding group relative to the current block. In some embodiments, the bitstream representation includes syntax flags indicating a number of encoded code groups each having at least one non-zero coefficient. The bitstream representation further comprises information of encoded code groups each having at least one non-zero coefficient. The maximum number of contextual codec bits per coding set is constrained for coding sets each having at least one non-zero coefficient. In some embodiments, N context codec bits are used to codec the syntax flags. The syntax flag indicates K encoding groups each having at least one non-zero coefficient, and a maximum number of context codec bits of each encoding group is constrained based on (maxCbinB-N)/K, which is a maximum number of context codec bits of the current block. In some embodiments, the maximum number of contextual codec bits for each coding set is determined using a lookup table.
Fig. 26 shows a flowchart of an example method 2600 of video processing in accordance with the present technology. The method 2600 includes, at operation 2610, performing a transition between a current block of video and a bitstream representation of the video. The current block is encoded in the bitstream representation based on a constraint on a maximum number of context codec bits for each syntax element or each codec pass associated with the current block.
In some embodiments, the method includes recording a plurality of counters that each correspond to a number of context codec bits for a syntax element or codec pass. In some embodiments, the method includes resetting the plurality of counters before performing the conversion on a subsequent block of video. In some embodiments, the method includes applying the bypass step to one or more non-encoded contextual codec bits associated with the syntax element or the codec pass if a counter corresponding to the syntax element or the codec pass is equal to or greater than a maximum number of contextual codec bits for each syntax element or the codec pass.
In some embodiments, the maximum number of context codec bits is different for different syntax elements or codec passes. In some embodiments, the maximum number of contextual codec bits per syntax element is based on characteristics of the syntax element. In some embodiments, the maximum number of contextual codec bits per codec pass is based on the characteristics of the codec pass.
In some embodiments, the block is encoded using transform skip residual encoding techniques. In some embodiments, the block is encoded using quantized residual domain block differential pulse code modulation encoding and decoding techniques. In some embodiments, a block is encoded without applying any transform to the block. In some embodiments, a block is encoded without applying at least one transform to the block. In some embodiments, the blocks are encoded using lossless encoding and decoding techniques that do not apply transform or quantization procedures.
In some embodiments, the bitstream representation includes an indicator indicating a manner in which the first residual codec technique or the second residual codec technique is applied. In some embodiments, the bitstream representation includes an indicator indicating which of the first residual codec technique or the second residual codec technique is applied. In some embodiments, the indicator is included in a dependency parameter set, a sequence parameter set, a picture parameter set, an adaptive parameter set, a video parameter set, a sequence header, a slice group header, a slice, or a codec tree unit group.
In some embodiments, which of the first residual codec technique or the second residual codec technique is applied or the manner in which the residual codec technique is applied is determined based on the characteristics of the block. In some embodiments, the characteristics include a block size, a virtual pipeline data unit, a picture type, or a low latency check flag for the block. In some embodiments, the characteristics include a color component or color format of the block. In some embodiments, the characteristics include whether the block is encoded using a transform skip residual codec technique or a quantized residual domain block differential pulse code modulation codec technique.
In some embodiments, the maximum number of contextual codec bits per cell is based on the color component of the block. In some embodiments, the maximum number of contextual codec bits per unit is based on a slice, group of slices, or picture type of the block. In some embodiments, the maximum number of contextual codec bits per cell is different for different profiles, levels, hierarchies associated with the video. In some embodiments, the maximum number of context codec bits per unit is signaled in the bitstream representation in a dependency parameter set, a sequence parameter set, a picture parameter set, an adaptive parameter set, a video parameter set, a sequence header, a picture header, a slice group header, a slice, or a codec tree unit group. In some embodiments, the blocks are codec tree units, codec units, transform units, or code groups, and the maximum number of context codec bits per block is based on the size of the block. In some embodiments, the blocks are a codec tree unit, a codec unit, a transform unit, or a coding set, and the maximum number of context codec bits for each block is based on the resolution of the picture in which the block is located.
In some embodiments, performing the conversion includes generating a bitstream representation based on the blocks of video. In some embodiments, performing the conversion includes generating blocks of video from the bitstream representation.
Some embodiments of the disclosed technology include making decisions or determinations to enable video processing tools or modes. In an example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of video blocks, but not necessarily modify the generated bitstream based on the use of the tool or mode. That is, when a video processing tool or mode is enabled based on a decision or determination, the video processing tool or mode will be used by the transition from the block of video to the bitstream representation of the video. In another example, when the video processing tool or mode is enabled, the decoder will process the bitstream with knowledge that the bitstream has been modified based on the video processing tool or mode. That is, the conversion from the bitstream representation of the video to the blocks of the video will be performed using a video processing tool or mode that is enabled based on the decision or the determination.
Some embodiments of the disclosed technology include making a decision or determination to disable a video processing tool or mode. In an example, when a video processing tool or mode is disabled, the encoder will not use the tool or mode in the conversion of blocks of video into a bitstream representation of video. In another example, when the video processing tool or mode is disabled, the decoder will process the bitstream with the knowledge that the bitstream has not been modified using the video processing tool or mode enabled based on the decision or the determination.
The disclosed and other solutions, examples, embodiments, modules, and functional operations described in this document may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, such as one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a combination of materials affecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. In addition to hardware, an apparatus may include code that creates a runtime environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disk; CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
Although this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Furthermore, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only some embodiments and examples are described and other embodiments, enhancements, and variations may be made based on what is described and shown in this patent document.

Claims (20)

1. A method of processing video data, comprising:
during a transition between a current transform block of a video and a bitstream of the video, determining whether to switch from a first residual codec mode to a second residual codec mode based on a number of context codec bits of the current transform block used in the first residual codec mode, wherein the current transform block is encoded using a transform skip mode in which a transform process is skipped for a prediction residual, and wherein coefficients of the current transform block are encoded in the bitstream in multiple passes using the first residual codec mode or the second residual codec mode; and
the conversion is performed based on the determination,
wherein the first residual codec mode is used for the first m passes, and in case the number of context codec bits of the current transform block used in the first residual codec mode is greater than a threshold, a switch to the second residual codec mode occurs in an (m+n) th pass, m and n being positive integers, and
Wherein the threshold depends on the width and height of the current transform block.
2. The method of claim 1, wherein m is equal to 5.
3. The method of claim 1, wherein in case the number of context codec bits of the current transform block used in the first residual codec mode is equal to or less than a threshold value, no switch is made from the first residual codec mode to the second residual codec mode.
4. The method of claim 1, wherein the first residual codec mode is for binary bits decoded in a context codec mode where the binary bits are processed based on at least one context, and the second residual codec mode is for binary bits decoded in a bypass codec mode where the binary bits are processed without using any context.
5. The method of claim 1, wherein the converting comprises processing a plurality of syntax elements of the current transform block, the plurality of syntax elements including at least one symbol flag indicating a symbol of a transform coefficient level of a scan position.
6. The method of claim 5, wherein the first residual codec mode is for processing a portion of the at least one symbol flag if a number of context codec binary bits of the current transform block used in the first residual codec mode is equal to or less than a threshold; and is also provided with
Wherein the second residual codec mode is for processing another portion of the at least one symbol flag in case the number of context codec bits of the current transform block used in the first residual codec mode is greater than the threshold.
7. The method of claim 1, wherein the first residual codec mode or the second residual codec mode comprises a plurality of syntax elements that process the current transform block, the plurality of syntax elements comprising at least one of a parity flag indicating parity of a transform coefficient level of a scan position, a sign flag indicating sign of the transform coefficient level of the scan position, a significant flag indicating whether the transform coefficient of the scan position is zero, or an absolute level flag indicating whether an absolute value of the transform coefficient of the scan position is greater than a threshold.
8. The method of claim 7, wherein the plurality of syntax elements comprises at least one syntax element encoded with a Golomb-Rice code.
9. The method of claim 1, wherein a codec unit including the current transform block uses a differential codec mode in which a difference between a quantized residual derived in an intra prediction mode of the codec unit and a prediction of the quantized residual is included in the bitstream.
10. The method of claim 1, wherein the converting comprises encoding the current transform block into the bitstream.
11. The method of claim 1, wherein the converting comprises decoding the current transform block from the bitstream.
12. A device for processing video data, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform operations comprising:
during a transition between a current transform block of a video and a bitstream of the video, determining whether to switch from a first residual codec mode to a second residual codec mode based on a number of context codec bits of the current transform block used in the first residual codec mode, wherein the current transform block is encoded using a transform skip mode in which a transform process is skipped for a prediction residual, and wherein coefficients of the current transform block are encoded in the bitstream in multiple passes using the first residual codec mode or the second residual codec mode; and
The conversion is performed based on the determination,
wherein the first residual codec mode is used for the first m passes, and in case the number of context codec bits of the current transform block used in the first residual codec mode is greater than a threshold, a switch to the second residual codec mode occurs in an (m+n) th pass, m and n being positive integers, and
wherein the threshold depends on the width and height of the current transform block.
13. The apparatus of claim 12, wherein m is equal to 5.
14. The apparatus of claim 12, wherein the first residual codec mode is for binary bits decoded in a context codec mode where the binary bits are processed based on at least one context, and the second residual codec mode is for binary bits decoded in a bypass codec mode where the binary bits are processed without using any context.
15. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform operations comprising:
during a transition between a current transform block of a video and a bitstream of the video, determining whether to switch from a first residual codec mode to a second residual codec mode based on a number of context codec bits of the current transform block used in the first residual codec mode, wherein the current transform block is encoded using a transform skip mode in which a transform process is skipped for a prediction residual, and wherein coefficients of the current transform block are encoded in the bitstream in multiple passes using the first residual codec mode or the second residual codec mode; and
The conversion is performed based on the determination,
wherein the first residual codec mode is used for the first m passes, and in case the number of context codec bits of the current transform block used in the first residual codec mode is greater than a threshold, a switch to the second residual codec mode occurs in an (m+n) th pass, m and n being positive integers, and
wherein the threshold depends on the width and height of the current transform block.
16. The non-transitory computer-readable storage medium of claim 15, wherein m is equal to 5, and
wherein the first residual codec mode is for binary bits decoded in a context codec mode where the binary bits are processed based on at least one context, and the second residual codec mode is for binary bits decoded in a bypass codec mode where the binary bits are processed without using any context.
17. A method for storing a bitstream of video, comprising:
determining whether to switch from a first residual codec mode to a second residual codec mode based on a number of context codec binary bits of a current transform block of the video used in the first residual codec mode, wherein the current transform block is encoded using a transform skip mode in which a transform process is skipped for prediction residuals, and wherein coefficients of the current transform block are encoded in the bitstream in multiple passes using the first residual codec mode or the second residual codec mode;
Generating the bitstream based on the determination; and
the bit stream is stored in a non-transitory computer readable recording medium,
wherein the first residual codec mode is used for the first m passes, and in case the number of context codec bits of the current transform block used in the first residual codec mode is greater than a threshold, a switch to the second residual codec mode occurs in an (m+n) th pass, m and n being positive integers, and
wherein the threshold depends on the width and height of the current transform block.
18. The method of claim 17, wherein m is equal to 5, and
wherein the first residual codec mode is for binary bits decoded in a context codec mode where the binary bits are processed based on at least one context, and the second residual codec mode is for binary bits decoded in a bypass codec mode where the binary bits are processed without using any context.
19. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 3, 5-11, 17-18.
20. A computer readable medium storing code which, when executed by a processor, causes the processor to carry out the method according to any one of claims 3, 5-11, 17-18.
CN202080035935.7A 2019-05-14 2020-05-14 Context modeling for residual coding Active CN113853785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311511035.7A CN117560489A (en) 2019-05-14 2020-05-14 Context modeling for residual coding

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2019086814 2019-05-14
CNPCT/CN2019/086814 2019-05-14
PCT/CN2020/090194 WO2020228762A1 (en) 2019-05-14 2020-05-14 Context modeling for residual coding

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202311511035.7A Division CN117560489A (en) 2019-05-14 2020-05-14 Context modeling for residual coding

Publications (2)

Publication Number Publication Date
CN113853785A CN113853785A (en) 2021-12-28
CN113853785B true CN113853785B (en) 2024-04-16

Family

ID=73289799

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202080035935.7A Active CN113853785B (en) 2019-05-14 2020-05-14 Context modeling for residual coding
CN202311511035.7A Pending CN117560489A (en) 2019-05-14 2020-05-14 Context modeling for residual coding

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202311511035.7A Pending CN117560489A (en) 2019-05-14 2020-05-14 Context modeling for residual coding

Country Status (3)

Country Link
US (2) US11425380B2 (en)
CN (2) CN113853785B (en)
WO (1) WO2020228762A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202106014A (en) * 2019-06-20 2021-02-01 日商索尼股份有限公司 Image processing device and image processing method
WO2020262992A1 (en) * 2019-06-25 2020-12-30 한국전자통신연구원 Image encoding/decoding method and apparatus
US20220264118A1 (en) * 2019-08-31 2022-08-18 Lg Electronics Inc. Video or image coding method and device therefor
US20220377345A1 (en) * 2019-10-07 2022-11-24 Lg Electronics Inc. Method and apparatus for deriving rice parameter in video/image coding system
US11785219B2 (en) * 2020-04-13 2023-10-10 Qualcomm Incorporated Coefficient coding for support of different color formats in video coding
CN116803077A (en) * 2021-01-04 2023-09-22 北京达佳互联信息技术有限公司 Residual and coefficient coding for video coding
WO2023092019A1 (en) * 2021-11-19 2023-05-25 Bytedance Inc. Method, apparatus, and medium for video processing

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107211136A (en) * 2015-01-30 2017-09-26 联发科技股份有限公司 The method and apparatus of the entropy code of source sample with big alphabet

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1214649C (en) 2003-09-18 2005-08-10 中国科学院计算技术研究所 Entropy encoding method for encoding video predictive residual error coefficient
US9215470B2 (en) 2010-07-09 2015-12-15 Qualcomm Incorporated Signaling selected directional transform for video coding
US10499059B2 (en) * 2011-03-08 2019-12-03 Velos Media, Llc Coding of transform coefficients for video coding
US9756353B2 (en) 2012-01-09 2017-09-05 Dolby Laboratories Licensing Corporation Hybrid reference picture reconstruction method for single and multiple layered video coding systems
US9363510B2 (en) * 2012-03-02 2016-06-07 Qualcomm Incorporated Scan-based sliding window in context derivation for transform coefficient coding
US9088770B2 (en) 2012-08-15 2015-07-21 Intel Corporation Size based transform unit context derivation
CA2908115C (en) 2013-07-08 2019-02-12 Mediatek Singapore Pte. Ltd. Method of simplified cabac coding in 3d video coding
US9456210B2 (en) * 2013-10-11 2016-09-27 Blackberry Limited Sign coding for blocks with transform skipped
KR20160102067A (en) 2013-12-30 2016-08-26 퀄컴 인코포레이티드 Simplification of delta dc residual coding in 3d video coding
WO2015100522A1 (en) 2013-12-30 2015-07-09 Mediatek Singapore Pte. Ltd. Methods for inter-component residual prediction
WO2015194185A1 (en) * 2014-06-20 2015-12-23 Sharp Kabushiki Kaisha Efficient palette coding for screen content codings
US9936201B2 (en) 2015-01-27 2018-04-03 Qualcomm Incorporated Contexts for large coding tree units
KR102060871B1 (en) 2015-04-08 2019-12-30 에이치에프아이 이노베이션 인크. Palette Mode Context Coding and Binarization in Video Coding
US10574993B2 (en) * 2015-05-29 2020-02-25 Qualcomm Incorporated Coding data using an enhanced context-adaptive binary arithmetic coding (CABAC) design
US10368072B2 (en) 2015-05-29 2019-07-30 Qualcomm Incorporated Advanced arithmetic coder
US10246348B2 (en) 2015-06-08 2019-04-02 Rayvio Corporation Ultraviolet disinfection system
US10142627B2 (en) 2015-06-18 2018-11-27 Qualcomm Incorporated Intra prediction and intra mode coding
WO2017041271A1 (en) 2015-09-10 2017-03-16 Mediatek Singapore Pte. Ltd. Efficient context modeling for coding a block of data
US10440399B2 (en) 2015-11-13 2019-10-08 Qualcomm Incorporated Coding sign information of video data
US10827186B2 (en) 2016-08-25 2020-11-03 Intel Corporation Method and system of video coding with context decoding and reconstruction bypass
US10721489B2 (en) * 2016-09-06 2020-07-21 Qualcomm Incorporated Geometry-based priority for the construction of candidate lists
US10616582B2 (en) 2016-09-30 2020-04-07 Qualcomm Incorporated Memory and bandwidth reduction of stored data in image/video coding
US10609414B2 (en) 2017-05-08 2020-03-31 Qualcomm Incorporated Context modeling for transform coefficient coding
JP2020522960A (en) * 2017-06-09 2020-07-30 エレクトロニクス アンド テレコミュニケーションズ リサーチ インスチチュートElectronics And Telecommunications Research Institute Image encoding/decoding method, device, and recording medium storing bitstream
US11831910B2 (en) * 2017-08-21 2023-11-28 Electronics And Telecommunications Research Institute Method and apparatus for encoding/decoding video, and recording medium storing bit stream
KR102595689B1 (en) * 2017-09-29 2023-10-30 인텔렉추얼디스커버리 주식회사 Method and apparatus for encoding/decoding image and recording medium for storing bitstream
CN116866561A (en) * 2017-09-29 2023-10-10 Lx 半导体科技有限公司 Image encoding/decoding method, storage medium, and image data transmission method
US10791341B2 (en) * 2017-10-10 2020-09-29 Qualcomm Incorporated Binary arithmetic coding with progressive modification of adaptation parameters
US11039143B2 (en) 2017-11-20 2021-06-15 Qualcomm Incorporated Memory reduction for context initialization with temporal prediction
WO2020108591A1 (en) 2018-12-01 2020-06-04 Beijing Bytedance Network Technology Co., Ltd. Parameter derivation for intra prediction
EP3895429A4 (en) 2019-01-31 2022-08-10 Beijing Bytedance Network Technology Co., Ltd. Context for coding affine mode adaptive motion vector resolution
EP3932058A4 (en) 2019-04-01 2022-06-08 Beijing Bytedance Network Technology Co., Ltd. Using interpolation filters for history based motion vector prediction
WO2020221372A1 (en) 2019-05-01 2020-11-05 Beijing Bytedance Network Technology Co., Ltd. Context coding for matrix-based intra prediction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107211136A (en) * 2015-01-30 2017-09-26 联发科技股份有限公司 The method and apparatus of the entropy code of source sample with big alphabet

Also Published As

Publication number Publication date
CN113853785A (en) 2021-12-28
US11425380B2 (en) 2022-08-23
WO2020228762A1 (en) 2020-11-19
CN117560489A (en) 2024-02-13
US20220070459A1 (en) 2022-03-03
US20220377337A1 (en) 2022-11-24

Similar Documents

Publication Publication Date Title
CN113812162B (en) Context modeling for simplified quadratic transforms in video
CN113853785B (en) Context modeling for residual coding
CN114208190B (en) Matrix selection for downscaling secondary transforms in video coding
WO2020244656A1 (en) Conditional signaling of reduced secondary transform in video bitstreams
CN113785576B (en) Use of secondary transforms in codec video
CN114223208A (en) Contextual modeling of side information for reduced quadratic transforms in video
CN113853787B (en) Using transform skip mode based on sub-blocks
CN113966611B (en) Important coefficient signaling in video coding and decoding
CN113841410B (en) Coding and decoding of multiple intra prediction methods
CN113826398B (en) Interaction between transform skip mode and other codec tools
WO2020253874A1 (en) Restriction on number of context coded bins
CN113728631B (en) Intra sub-block segmentation and multiple transform selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant