CN115066899A - Scalable secondary transform processing of coded video - Google Patents

Scalable secondary transform processing of coded video Download PDF

Info

Publication number
CN115066899A
CN115066899A CN202080083999.4A CN202080083999A CN115066899A CN 115066899 A CN115066899 A CN 115066899A CN 202080083999 A CN202080083999 A CN 202080083999A CN 115066899 A CN115066899 A CN 115066899A
Authority
CN
China
Prior art keywords
block
video
transform
matrix
sst
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080083999.4A
Other languages
Chinese (zh)
Inventor
张凯
张莉
刘鸿彬
许继征
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
ByteDance Inc
Original Assignee
Douyin Vision Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Douyin Vision Co Ltd, ByteDance Inc filed Critical Douyin Vision Co Ltd
Publication of CN115066899A publication Critical patent/CN115066899A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Abstract

A video processing method, comprising: for a transition between a video unit of video and a bitstream representation of the video, it is determined whether a separable quadratic transform (SST) tool is enabled or disabled for the video unit. The method also includes performing a conversion based on the determination. In some embodiments, the determination is based on a syntax structure associated with the video unit. In some embodiments, the determination is based on characteristics of the video unit.

Description

Scalable secondary transform processing of coded video
Cross Reference to Related Applications
Under applicable patent laws and/or regulations in accordance with the paris convention, the present application is intended to claim in time the priority and benefit of international patent application PCT/CN2019/122366 filed on 2.12.2019. The entire disclosure of the above application is incorporated by reference as part of the disclosure of the present application for all purposes of law.
Technical Field
This patent document relates to video encoding and decoding techniques, devices, and systems.
Background
Despite advances in video compression, digital video still accounts for the largest bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for pre-counting the use of digital video will continue to grow.
Disclosure of Invention
This document describes various embodiments and techniques for using a quadratic transform (also referred to as a low-frequency undifferentiated transform) during decoding or encoding of video or images.
In one example aspect, a video processing method is disclosed. The method includes determining, for a transition between a video unit of video and a bitstream representation of the video, whether a Separable Secondary Transform (SST) tool is enabled for the video unit. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method comprises determining, for a transition between a video unit of the video and a bitstream representation of the video, a manner of indicating usage of or a transformation matrix used by a transformation tool based on a bottom right position (SRx, SRy) of the scanned area. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining a transform matrix for use in a quadratic transform (SST) separable tool based on characteristics of blocks for a transition between the blocks of video and a bitstream representation of the video. SST tools provide a set of transformation matrices that are available. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining a constraint rule for selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block, and performing the transition by applying the quadratic transform having the reduced dimension in accordance with the constraint rule. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. The quadratic transform with reduced dimensions is applied in a specific order during the conversion together with the main transform.
In another example aspect, another video processing method is disclosed. The method includes determining a constraint rule for selectively applying a quadratic transform having a reduced dimension during a transition between a bit stream representation of a current video block and a neighbor video region and a pixel of the current video block and a pixel of the neighbor region, and performing the transition by applying the quadratic transform having the reduced dimension in accordance with the constraint rule. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block and the neighborhood video region. The quadratic transform with reduced dimensions is applied in a specific order during the conversion together with the main transform.
In yet another example aspect, another video processing method is disclosed. The method includes determining a zeroing rule for selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block, and performing the transition by applying the quadratic transform having the reduced dimension according to the zeroing rule. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. The zeroing rule specifies the maximum number of coefficients used by a quadratic transform with reduced dimensionality.
In yet another example aspect, another video processing method is disclosed. The method includes determining a zeroing rule for selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block, and performing the transition by applying the quadratic transform having the reduced dimension according to the zeroing rule. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. The zeroing rule specifies the maximum number of coefficients used by a quadratic transform with reduced dimensionality.
In yet another example aspect, another video processing method is disclosed. The method includes determining conditions for selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block, and performing the transition by applying the quadratic transform having the reduced dimension according to the conditions. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. The condition is signaled in a bit stream representation.
In yet another example aspect, another video processing method is disclosed. The method includes selectively applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension according to a condition. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. The converting includes selectively applying a location dependent intra prediction combining (PDPC) based on a coexistence rule.
In yet another example aspect, another video processing method is disclosed. The method includes applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension according to a condition. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. This application controls the use of neighborhood samples for intra prediction during the transition.
In yet another example aspect, another video processing method is disclosed. The method includes selectively applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension according to a condition. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. Selectively applying controls uses of the quantization matrix during the conversion.
In yet another example aspect, another video processing method is disclosed. The method includes, for a transition between a current video block of the video and a bitstream representation of the video, determining whether to use a separable quadratic transform (SST) for the transition based on codec conditions; and performing a conversion in accordance with the determination.
In yet another example aspect, a video encoder is disclosed. The video encoder includes a processor configured to implement one or more of the above-described methods.
In yet another example aspect, a video decoder is disclosed. The video decoder includes a processor configured to implement one or more of the above-described methods.
In yet another example aspect, a computer-readable medium is disclosed. The medium includes code stored thereon for implementing one or more of the above-described methods.
These and other aspects are described in this document.
Drawings
Fig. 1 shows an example of a block diagram of an encoder.
Fig. 2 shows an example of 67 intra prediction modes.
Fig. 3A-3B illustrate examples of reference samples for wide-angle intra prediction.
FIG. 4 is an exemplary illustration of the discontinuity problem in the case of directions exceeding 45 degrees.
5A-5D show an exemplary illustration of samples used by PDPC applied to diagonal and adjacent angle intra modes.
Fig. 6 is an example of 4x8 and 8x4 block splitting.
Fig. 7 is an example of the splitting of all blocks except 4 × 8, 8 × 4, and 4 × 4.
Fig. 8 splits a block of 4x8 samples into two separate decodable regions.
Fig. 9 illustrates an example order of processing pixel rows to maximize throughput for a 4xN block with a vertical predictor.
Fig. 10 shows an example of quadratic transformation.
Fig. 11 shows an example of the proposed reduced quadratic transform (RST).
Fig. 12 shows examples of forward and inverse (or inverse) downscaling transforms.
Fig. 13 shows an example of a forward RST8x8 process with a 16x48 matrix.
Fig. 14 shows an example of non-zero elements of scan positions 17 to 64.
FIG. 15 is an illustration of sub-block transform modes SBT-V and SBT-H.
FIG. 16 is a block diagram of an example hardware platform for implementing the techniques described in this document.
FIG. 17 is a flow diagram of an example method of video processing.
Fig. 18A illustrates an example of coefficient coding and decoding based on a scan region.
Fig. 18B illustrates another example of scanning area-based coefficient coding and decoding.
FIG. 19 is a block diagram of an example video processing system in which the disclosed technology may be implemented.
FIG. 20 is a flow diagram of an example method of video processing in accordance with the present technology.
FIG. 21 is a flow diagram of another example method of video processing in accordance with the present technology.
FIG. 22 is a flow diagram of another example method of video processing in accordance with the present technology.
Fig. 23 is a block diagram illustrating an exemplary video codec system.
Fig. 24 is a block diagram illustrating an encoder in accordance with some embodiments of the present disclosure.
Fig. 25 is a block diagram illustrating a decoder according to some embodiments of the present disclosure.
Detailed Description
Section headings are used in this document to facilitate understanding and do not limit the embodiments disclosed in a section to only that section. Furthermore, although certain embodiments are described with reference to general video codecs or other specific video codecs, the disclosed techniques are also applicable to other video codecs techniques. Furthermore, while some embodiments describe video encoding steps in detail, it should be understood that the corresponding decoding steps of the de-encoding will be implemented by the decoder. Furthermore, the term video processing includes video encoding or compression, video decoding or decompression, and video transcoding, where video pixels are represented from one compression format to another compression format or at different compression bit rates.
Summary of the invention
This patent document relates to video coding and decoding techniques. In particular, it is a relevant transform in video codecs. It can be applied to existing Video codec standards (such as HEVC), or standards to be finalized (Versatile Video Coding). It is also applicable to future video codec standards or video codecs.
2. Preliminary discussion
The video codec standard has evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T produces H.261 and H.263, ISO/IEC produces MPEG-1 and MPEG-4 Visual, and both organizations jointly produce the H.262/MPEG-2 Video and the H.264/MPEG-4 Advanced Video Coding (AVC) and the H.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, in which temporal prediction plus transform coding is utilized. In order to explore future Video coding and decoding technologies beyond HEVC, VCEG and MPEG united in 2015 to form Joint Video Exploration Team (jfet). Thereafter, JFET adopted many new methods and placed them into a reference software named Joint Exploration Model (JEM). In month 4 of 2018, Joint Video Expert Team (jviet) creation between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) focused on the VVC standard with the goal of reducing the bit rate by 50% compared to HEVC.
2.1 color space and chroma subsampling
A color space, also called a color model (or color system), is an abstract mathematical model that simply describes a range of colors as a tuple of numbers, typically 3 or 4 values or color components (e.g., RGB). Basically, the color space is a comprehensive description of the coordinate system and the subspace.
For video compression, the most common color spaces are YCbCr and RGB.
YCbCr, Y 'CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y' CBCR, are color space families used as part of color image pipelines in video and digital photography systems. Y' is the luminance component and CB and CR are the blue-difference and red-difference chrominance components. Y' (with prime) is different from Y, which is luminance, meaning that the light intensity is based on the gamma corrected RGB primary non-linear encoding.
Chroma subsampling is the practice of encoding an image by: by using the lower acuity of the human visual system to color difference than to brightness, a lower resolution is achieved for chrominance information than for brightness information.
2.1.1 Format 4:4
Each of the three Y' CbCr components has the same sampling rate and therefore no chrominance subsampling. This solution is sometimes used in high-end film scanners and in post-production of motion pictures.
2.1.2 Format 4:2
The two chrominance components are sampled at half the luminance sampling rate: the horizontal chrominance resolution is halved. This reduces the bandwidth of the uncompressed video signal by a factor of three with little visual impairment.
2.1.3 Format 4:2:0
In 4:2:0, the horizontal sampling is doubled compared to 4:1:1, but the vertical resolution is halved since the Cb and Cr channels only sample on every alternate line in this scheme. The data rates are therefore the same. Cb and Cr are sub-sampled by a factor of 2 both horizontally and vertically. There are three variations of the 4:2:0 scheme, with different horizontal and vertical bit selection.
In MPEG-2, Cb and Cr are concatenated horizontally. Cb and Cr are selected between pixels in the vertical direction (bit selection in a space position).
In JPEG/JFIF, H.261, and MPEG-1, Cb and Cr are bit-selected intermittently, midway between the alternating luminance samples.
In 4:2:0 DV, Cb and Cr are concatenated horizontally. In the vertical direction, they are juxtaposed on alternate lines.
2.2 codec flow for a typical video codec
Fig. 1 shows an example of a block diagram of an encoder for VVC, which contains three loop filter blocks: deblocking Filter (DF), Sample Adaptive Offset (SAO), and ALF. Unlike DF using predefined filters, SAO and ALF reduce the mean square error between original and reconstructed samples with the original samples of the current picture by adding offsets and by applying Finite Impulse Response (FIR) filters, respectively, with the side information of the codec signaling the offsets and filter coefficients. ALF is located at the final processing stage of each picture and can be viewed as a tool that attempts to capture and fix artifacts created by previous stages.
2.3 Intra mode codec with 67 Intra prediction modes
To capture any edge direction present in natural video, the number of directional intra modes extends from 33 to 65 used in HEVC. The additional directional modes are depicted as dashed arrows in fig. 2, while the planar and DC modes remain the same. These denser directional intra prediction modes are applicable to all block sizes as well as luma and chroma intra prediction.
The conventional angular intra prediction direction is defined as from 45 degrees to-135 degrees in the clockwise direction as shown in fig. 2. In VTM2, several conventional angular intra prediction modes are adaptively replaced by wide angular intra prediction modes for non-square blocks. The alternative mode is signaled using the original method and remapped to the index of the wide-angle mode after parsing. The total number of intra prediction modes (e.g., 67), and the intra mode codec is unchanged.
In HEVC, each intra coded block has a square shape with the length of each side being a power of 2. Therefore, no division operation is required to generate the intra predictor using the DC mode. In VVV2, the chunks may have a rectangular shape, which requires the use of a division operation for each chunk in the general case. To avoid division operations for DC prediction, only the longer edges are used to calculate the average of non-square blocks.
2.4 Wide Angle Intra prediction of non-Square blocks
The conventional angular intra prediction direction is defined as from 45 degrees to-135 degrees in the clockwise direction. In VTM2, several conventional angular intra prediction modes are adaptively replaced by wide angular intra prediction modes for non-square blocks. The alternative mode is signaled using the original method and remapped to the index of the wide-angle mode after parsing. The total number of intra prediction modes (e.g., 67) for a block is unchanged, and the intra mode codec is unchanged.
To support these prediction directions, a top reference of length 2W +1 and a left reference of length 2H +1 are defined as shown in FIGS. 3A-3B.
The number of modes of the alternative mode among the wide angle direction modes depends on the aspect ratio of the block. The alternative intra prediction modes are shown in table 1.
Table 1 intra prediction mode for wide angle mode replacement
Condition Replacement intra prediction modes
W/H==2 Modes 2,3,4,5,6,7
W/H>2 Modes 2,3,4,5,6,7,8,9,10,11
W/H==1 Is free of
H/W==1/2 Modes 61,62,63,64,65,66
H/W<1/2 Modes 57,58,59,60,61,62,63,64,65,66
As shown in fig. 4, in the case of wide-angle intra prediction, two vertically adjacent prediction samples may use two non-adjacent reference samples. Thus, a low-pass reference sample filter and edge smoothing are applied to wide-angle prediction to reduce the increased gap Δ p α The negative effects of (c).
2.5 location dependent intra prediction combining
In VTM2, the intra prediction result of the planar mode is further modified by a position dependent intra prediction combination (PDPC) method. PDPC is an intra prediction method that invokes a combination of unfiltered boundary reference samples and HEVC style intra prediction with filtered boundary reference samples. PDPC applies without signaling to the following intra modes: a planar, DC, horizontal, vertical, bottom left angle mode and its eight adjacent angle modes, and an upper right angle mode and its eight adjacent angle modes.
The predicted samples pred (x, y) are predicted using a linear combination of reference samples and intra prediction modes (DC, plane, angle) according to the following formula:
pred(x,y)=(wL×R -1,y +wT×R x,-1 –wTL×R -1,-1 +(64–wL–wT+wTL)×pred(x,y)+32)>>6
wherein R is x,-1 、R -1,y Respectively, representing reference samples located at the top and to the left of the current sample (x, y), and R -1,-1 Representing reference samples located at the top left corner of the current block.
If PDPC is applied to DC, planar, horizontal and vertical intra modes, no additional boundary filter is needed, which is necessary in case of HEVC DC mode boundary filter or horizontal/vertical mode edge filter.
FIGS. 5A-5D illustrate reference samples (R) of PDPC applied over various prediction modes x,-1 ,R -1,y And R -1,-1 ) The definition of (1). The prediction samples pred (x ', y') are located at (x ', y') within the prediction block. Reference sample R x,-1 The coordinate x of (a) is given by: x ═ x '+ y' +1, and reference sample point R -1,y Is similarly given by: y ═ x '+ y' + 1.
Fig. 5A to 5D provide definitions of samples used by PDPC applied to diagonal and adjacent angular intra modes.
The PDPC weights depend on the prediction mode, as shown in table 2.
Table 2 example of PDPC weights according to prediction mode
Figure BDA0003675962060000081
Figure BDA0003675962060000091
2.6 Intra sub-block partitioning (ISP)
In some embodiments, the ISP is proposed to split the luma intra prediction block vertically or horizontally into 2 or 4 sub-partitions depending on the block size dimension, as shown in table 3. Fig. 6 and 7 show examples of two possibilities. All sub-partitions satisfy the condition of having at least 16 samples.
TABLE 3 number of sub-partitions depending on block size
Figure BDA0003675962060000092
Fig. 6 shows examples of the splitting of 4x8 and 8x4 blocks.
Fig. 7 shows an example of splitting of all blocks except 4 × 8, 8 × 4, and 4 × 4.
For each of these sub-partitions, a residual signal is generated by entropy decoding the coefficients sent by the encoder, and then inverse vectorizing and inverse transforming them. Then, intra-frame prediction is carried out on the sub-partitions, and finally, a residual signal is added to a prediction signal to obtain a corresponding reconstruction sampling point. Thus, the reconstructed value of each sub-partition will be available to generate a prediction for the next sub-partition, which will repeat the process, and so on. All sub-partitions share the same intra mode.
Based on the utilized intra mode and partition, two different categories of processing orders are used, referred to as normal order and reverse order. In the normal order, the first sub-partition to be processed is the sub-partition containing the CU's left top samples, and then continues down (horizontal partition) or right (vertical partition). As a result, the reference sample points used for generating the sub-divided prediction signals are located only on the upper left side of the line. On the other hand, the reverse processing order starts from the sub-partition containing the CU bottom-left sample and continues upwards, or starts from the sub-partition containing the CU top-right sample and continues to the left.
2.7 Block differential pulse-code modulation coding (BDPCM)
Since the left (a) (corresponding top (B)) pixel is used to predict the shape of the horizontal (corresponding vertical) predictor of the current pixel, the most throughput-efficient way to process a block is to process all pixels of one column (corresponding line) in parallel and to process these columns (corresponding lines) sequentially. To increase throughput, the following procedure is introduced: splitting a width 4 block into two halves with horizontal edges when the predictor selected on the block is vertical; when the predictor selected on this block is horizontal, the height 4 block is split into two halves with vertical edges.
When a block is split, samples from one region do not allow the prediction to be computed using pixels from another region: if this happens, the prediction pixel is replaced by the reference pixel in the prediction direction. This is shown in fig. 8 for different positions of the current pixel X in the vertically predicted 4X8 block.
Fig. 8 shows an example of splitting a block of 4x8 samples into two independently decodable regions.
Due to this property, it is now possible to process one 4x4 block in 2 cycles, one 4x8 or 8x4 block in 4 cycles, and so on, as shown in fig. 9.
Fig. 9 shows an example of an order in which pixel rows are processed to maximize throughput for a 4xN block with a vertical predictor.
Table 4 summarizes the number of cycles required to process a block, depending on the block size. It is easy to show that any block with two dimensions greater than or equal to 8 can be processed with 8 pixels per cycle or with more cycles.
TABLE 4 worst case throughput for 4xN, Nx4 sized blocks
Figure BDA0003675962060000101
2.8 quantized residual Domain BDPCM
In some embodiments, a quantized residual domain BDPCM (denoted as rbcbm below) is proposed. Similar to intra prediction, intra prediction is performed on an entire block by sampling point copying in the prediction direction (horizontal or vertical prediction). The residual is quantized and the delta between the quantized residual and its predictor (horizontal or vertical) quantization value is coded.
Let r be for a block of size M (rows) x N (columns) i,j (0 ≦ i ≦ M-1, 0 ≦ j ≦ N-1) is the prediction residual after performing intra prediction horizontally (copy left neighbor pixel values of the prediction block line-by-line) or vertically (copy top neighbor lines to each line in the prediction block) using unfiltered samples from above or left block boundary samples. Let Q (r) i, ) (i is more than or equal to 0 and less than or equal to M-1, and j is more than or equal to 0 and less than or equal to-1) represents residual error r i, Wherein the residual is the difference between the original block and the prediction block value. The block DPCM is then applied to the quantized residual samples, resulting in samples having elements
Figure BDA0003675962060000111
Modified MxN array
Figure BDA0003675962060000112
When signaling vertical BDPCM:
Figure BDA0003675962060000113
for horizontal prediction, a similar rule is applied, and residual quantized samples are obtained by:
Figure BDA0003675962060000114
quantizing the residual error to sample points
Figure BDA0003675962060000115
To the decoder.
At the decoder side, the above calculations are reversed to produce Q (r) i, ) (i is more than or equal to 0 and less than or equal to M-1, and j is more than or equal to 0 and less than or equal to-1). For the case of a vertical prediction,
Figure BDA0003675962060000116
in the case of the horizontal case,
Figure BDA0003675962060000117
inverse quantized residual Q -1 (Q(r i, ) Is added to the intra block prediction value to generate reconstructed sample values.
The main benefit of this scheme is that the inverse DPCM can be done on-the-fly during coefficient parsing by adding the predictor only when the coefficients are parsed (which can also be performed after parsing).
The transform skips the residual domain BDPCM that is always used for quantization.
Multiple Transform Sets (MTS) in VVC
In VTM4, large block-size transforms up to 64 × 64 sizes are enabled, which are mainly used for higher resolution video, such as 1080p and 4K sequences. For a transform block with a size (width or height, or both) equal to 64, the high frequency transform coefficients are zeroed out, leaving only the lower frequency coefficients. For example, for an M × N transform block, M is the block width and N is the block height, and when M equals 64, only the left 32 columns of transform coefficients are held. Similarly, when N equals 64, only the top 32 rows of transform coefficients are retained. When using the transform skip mode for large blocks, the entire block is used without zeroing any values.
In addition to DCT-II, which has been adopted in HEVC, Multiple Transform Selection (MTS) schemes are also used for residual coding of inter and intra coded blocks. It uses multiple selected transforms from DCT8/DST 7. The newly introduced transformation matrices are DST-VII and DCT-VIII.
The table below shows the basis functions of the selected DST/DCT.
Figure BDA0003675962060000121
To preserve the orthogonality of the transform matrices, the quantization of the transform matrices is more accurate than the transform matrices in HEVC. In order to keep the median value of the transformed coefficients in the 16-bit range, all coefficients will have 10 bits after the horizontal and vertical transforms.
To control the MTS scheme, separate enable flags are specified for intra and inter frames, respectively, at the SPS level. When MTS is enabled at SPS, CU level flag is signaled to indicate whether MTS is applied. Here, MTS is only applicable to luminance. The MTS CU level flag is signaled when the following conditions are met.
-width and height both less than or equal to 32
-CBF flag equal to 1
If the MTS CU flag is equal to 0, then DCT2 applies in both directions. However, if the MTS CU flag is equal to 1, two other flags are additionally signaled to indicate the transform type in the horizontal and vertical directions, respectively. The transformation and signaling mapping tables are shown in tables 3-10. In terms of transform matrix precision, an 8-bit primary transform core is used. Thus, all transform kernels used in HEVC remain the same, including 4-point DCT-2 and DST-7, 8-point, 16-point, and 32-point DCT-2. In addition, other transform cores including 64-point DCT-2, 4-point DCT-8, 8-point, 16-point, 32-point DST-7 and DCT-8 use an 8-bit primary transform core.
Figure BDA0003675962060000131
To reduce the complexity of large sizes of DST-7 and DCT-8, the high frequency transform coefficients are zeroed out for DST-7 and DCT-8 blocks of size (width or height, or both width and height) equal to 32. Only the coefficients in the lower frequency region of 16x16 are retained.
As with HEVC, the residual of a block may be coded using a transform skip mode. To avoid redundancy of syntax coding, the transform skip flag is not signaled when the CU level MTS _ CU _ flag is not equal to 0. The block size restriction of the transform skip is the same as MTS in JEM4, indicating that the transform skip applies to CUs where both the block width and height are equal to or less than 32.
2.10 example Reduced Secondary Transform (RST)
2.10.1 example undifferentiated Secondary Transform (NSST)
In some embodiments, a quadratic transform, also referred to as an indivisible transform, is applied between the forward principal transform and the quantization (at the encoder) and between the dequantization and the inverse principal transform (at the decoder side). As shown in fig. 10, performing a 4x4 (or 8x8) quadratic transform depends on the block size. For example, every 8x8 block applies a 4x4 quadratic transform for small blocks (e.g., min (width, height) <8), and an 8x8 quadratic transform for large blocks (e.g., min (width, height) > 4).
Fig. 10 shows an example of quadratic transformation in JEM.
The following describes the application of the indivisible transform using the input as an example. To apply the indivisible transform, 4X4 input block X
Figure BDA0003675962060000132
First expressed as a vector
Figure BDA0003675962060000133
Figure BDA0003675962060000141
The indivisible transformation is calculated as
Figure BDA0003675962060000142
Wherein
Figure BDA0003675962060000143
Indicating a vector of transform coefficients, and T is a 16x16 transform matrix. The 16x1 coefficient vector is then scanned using the scan order (horizontal, vertical, or diagonal) of the block
Figure BDA0003675962060000144
Reorganized into 4x4 chunks. The coefficients with the smaller indices will be placed in a 4x4 coefficient block with the smaller scan indices. There are a total of 35 transform sets, each using 3 indivisible transform matrices (kernels). The mapping from intra prediction mode to transform set is predefined. For each variationIn another set, the selected indistinguishable quadratic transform candidates are further specified by explicitly signaled quadratic transform indices. After transforming the coefficients, the index is signaled once in the bitstream per intra CU.
2.10.2 example reduced quadratic transform (RST)/Low Frequency inseparable transform (Low Frequency Voltage) Non- Separable Transform,LFNST)
The reduced second order transform (RST), also known as the low frequency indivisible transform (LFNST), is introduced as a 4 transform set (instead of 35 transform sets) map. In some embodiments, 16x64 (which may be further reduced to 16x48) and 16x16 matrices are used for the 8x8 and 4x4 blocks, respectively. For convenience, the 16x64 (which may be further reduced to 16x48) transform is denoted RST8x8, and the 16x16 transform is denoted RST4x 4. Fig. 11 shows an example of RST.
Fig. 11 shows an example of the proposed reduced quadratic transform (RST).
RST calculation
The main idea of the Reduced Transform (RT) is to map N-dimensional vectors to R-dimensional vectors in different spaces, where R/N (R < N) is the reduction factor.
The RT matrix is an R × N matrix as follows:
Figure BDA0003675962060000145
where the R rows of the transform are R bases of the N-dimensional space. The inverse transform matrix of RT is the transpose of its forward transform. Examples of forward and inverse RT are depicted in fig. 12.
Fig. 12 shows examples of forward and reverse downscaling transforms.
In some embodiments, RST8x8 with a reduction factor of 4 (1/4 size) is applied. Therefore, instead of the conventional 64x64 of 8x8 inseparably transformable matrix size, a 16x64 direct matrix is used. In other words, a 64 × 16 inverse RST matrix is used at the decoder side to generate the core (primary) transform coefficients in the 8 × 8 top-left region. The forward RST8x8 uses a 16x64 (or 8x64 for an 8x8 block) matrix such that it produces non-zero coefficients only in the top left 4x4 region within a given 8x8 region. In other words, if RST is applied, the 8 × 8 area other than the top left 4 × 4 area will only have zero coefficients. For RST4x4, 16x16 (or 8x16 for 4x4 blocks) direct matrix multiplication is applied.
The reverse RST is conditionally applied when the following two conditions are met:
a. the block size is greater than or equal to a given threshold (W > ═ 4& & H > ═ 4)
b. Transform skip mode flag to zero
If both the width (W) and height (H) of the transform coefficient block are greater than 4, RST8x8 is applied to the top left 8x8 region of the transform coefficient block. Otherwise, RST4x4 is applied to the top left min (8, W) × min (8, H) region of the transform coefficient block.
If the RST index is equal to 0, then RST is not applied. Otherwise, the RST is applied and the RST index is used to select the core. The RST selection method and codec of the RST index will be explained later.
Furthermore, RST applies to intra CUs in intra and inter slices, as well as to both luma and chroma. If dual tree is enabled, the RST indices for luma and chroma are signaled separately. For inter-frame stripes (dual tree disabled), a single RST index is signaled and used for both luma and chroma.
In some embodiments, Intra Sub-partitioning (ISP) is adopted as a new Intra prediction mode. When ISP mode is selected, RST is disabled and RST index is not signaled, since the performance improvement is slight even if RST is applied to every feasible segment. Furthermore, disabling RST for the residuals of ISP prediction may reduce encoding complexity.
RST selection
The RST matrix is selected from four sets of transforms, each set of transforms consisting of two transforms. Which transform set to apply is determined by the intra prediction mode, as follows:
(1) if one of the three CCLM modes is indicated, transform set0 is selected.
(2) Otherwise, transform set selection is performed according to the following table:
transformation set selection table
Figure BDA0003675962060000151
Figure BDA0003675962060000161
The index of the access table (denoted IntraPredMode) ranges from [ -14,83], which is the transform mode index for wide-angle intra prediction.
Reduced-dimension RST matrix
As a further simplification, 16x48 matrices were applied instead of 16x64 with the same transform set configuration, each matrix taking 48 input data from three 4x4 blocks (excluding the bottom right 4x4 block) out of the top left 8x8 block (fig. 13).
Fig. 13 shows an example of a forward RST8x8 process with a 16x48 matrix.
RST signaling
The forward RST8x8 (where R ═ 16) uses a 16x64 matrix such that it produces non-zero coefficients only in the top left 4x4 region within a given 8x8 region. In other words, if the RST is applied, the 8 × 8 area except for the top left 4 × 4 area generates only zero coefficients. As a result, the RST index is not coded when any non-zero elements are detected within the 8x8 block region except the top left 4x4 (depicted in fig. 14), since this implies that RST is not applied. In this case, the RST index is inferred to be zero.
Fig. 14 shows an example of non-zero elements of scan positions 17 to 64.
Return to zero range
In general, any coefficients in the 4 × 4 sub-block may be non-zero before applying the inverse RST to the 4 × 4 sub-block. However, in some cases, it is constrained that some coefficients in the 4 × 4 sub-block must be zero before reverse RST is applied to the sub-block.
Let nonZeroSize be a variable. Any coefficient requiring an index not smaller than nonZeroSize must be zero when rearranged into a one-dimensional array before reverse RST.
When nonZeroSize equals 16, the coefficients in the top left 4x4 sub-block have no zeroing constraint.
In some embodiments, when the current block size is 4 × 4 or 8 × 8, nonZeroSize is set equal to 8. For other block dimensions, nonZeroSize is set equal to 16.
Example description of RST
In the tables and descriptions below, bold italic text is used to indicate that changes can be made to the current syntax to accommodate certain embodiments described in this document.
Sequence parameter set RBSP syntax
Figure BDA0003675962060000162
Figure BDA0003675962060000171
Residual coding syntax
Figure BDA0003675962060000172
Coding/decoding unit syntax
Figure BDA0003675962060000181
Figure BDA0003675962060000191
Sequence parameter set RBSP semantics
……
Figure BDA0003675962060000192
……
Coding and decoding unit semantics
……
Figure BDA0003675962060000193
Transformation process for scaling transform coefficients
In general
The inputs to this process are:
-a luminance position (xTbY, yTbY) specifying a top left luma sample of the current luma transform block relative to a top left luma sample of the current picture,
a variable nTbW specifying the width of the current transform block,
a variable nTbH specifying the height of the current transform block,
a variable cIdx specifying the color component of the current block,
-scaling an (nTbW) x (nTbH) array d [ x ] [ y ] of transform coefficients, wherein x ═ 0.. nTbW-1, y ═ 0.. nTbH-1.
The output of this process is an (nTbW) x (nTbH) array r [ x ] [ y ] of residual samples, where x ═ 0.. nTbW-1, y ═ 0.. nTbH-1.
Figure BDA0003675962060000201
Figure BDA0003675962060000211
Second order transformation process
Figure BDA0003675962060000212
Figure BDA0003675962060000221
Quadratic transform matrix derivation process
Figure BDA0003675962060000222
Figure BDA0003675962060000223
Figure BDA0003675962060000224
Figure BDA0003675962060000231
Clipping for dequantization in 2.11 HEVC
In HEVC, the scaled transform coefficient d 'is calculated as d' ═ Clip3(coeffMin, coeffMax, d), where d is the scaled transform coefficient before clipping.
For the luminance component, coeffMin ═ CoeffMinY; coeffMax ═ CoeffMaxY. For chroma components, coeffMin ═ CoeffMinC; coeffMax ═ CoeffMaxC; wherein
CoeffMinY=-(1<<(extended_precision_processing_flagMax(15,BitDepthY+6):15))
CoeffMinC=-(1<<(extended_precision_processing_flagMax(15,BitDepthC+6):15))
CoeffMaxY=(1<<(extended_precision_processing_flagMax(15,BitDepthY+6):15))-1
CoeffMaxC=(1<<(extended_precision_processing_flagMax(15,BitDepthC+6):15))–1
extended _ precision _ processing _ flag is a syntax element signaled in the SPS.
2.12 Affine linear weighted intra prediction (ALWIP, also known as Matrix-based intra prediction (MIP))
In some embodiments, two tests are performed. In test 1, the ALWIP was designed with a memory limit of 8 kbytes and a maximum of 4 multiplications per sample point. Test 2 is similar to test 1, but the design is further simplified in terms of memory requirements and model architecture.
Matrix of all block shapes and a single set of offset vectors.
Reduce the number of patterns for all block shapes to 19.
Reduce memory requirements to 5760 10-bit values, i.e., 7.20 kilobytes.
Linear interpolation of the predicted samples was performed in a single step (replacing the iterative interpolation in the first test) in each direction.
2.13 sub-block transformations
For inter-predicted CUs with CU cbf equal to 1, CU sbt flag may be signaled to indicate whether to decode the entire residual block or a sub-portion of the residual block. In the former case, the inter-frame MTS information is further parsed to determine the transform type of the CU. In the latter case, one part of the residual block is coded with the inferred adaptive transform, while another part of the residual block is zeroed. SBT is not applicable to the combined inter-frame intra mode.
In the sub-block transform, a location dependent transform is applied to the luminance transform blocks in SBT-V and SBT-H (chroma TB always uses DCT-2). The two locations of SBT-H and SBT-V are associated with different core transformations. More specifically, the horizontal and vertical transforms for each SBT position are specified in fig. 15. For example, the horizontal and vertical transforms of SBT-V location 0 are DCT-8 and DST-7, respectively. When one side of the residual TU is greater than 32, the corresponding transform is set to DCT-2. Thus, the sub-block transform jointly specifies the TU slices, cbf, and horizontal and vertical transforms of the residual block, which may be considered a syntax shortcut for the case where the main residual of the block is located on one side of the block.
FIG. 15 is an illustration of sub-block transform modes SBT-V and SBT-H.
Separable quadratic transformation in 2.14 AVS
In some embodiments, if the primary transform is DCT2, a 4x4 Separable Secondary Transform (SST) is applied to all luma blocks that are coded in intra mode after the primary transform.
When SST is applied on the blocks of the encoder, the top left 4x4 sub-block (denoted L) of the main transformed transform block is further transformed into L '═ T' × lxt,
where T is the quadratic transform matrix.
L' is then quantized together with the other parts of the transform block.
When SST is applied to a block at the decoder, the top-left 4x4 sub-block (denoted M) of the transform block after dequantization is further inverse transformed
M'=S'×M×S,
Where S is the inverse quadratic transform matrix. Specifically, S' ═ T.
M' is then input to the main inverse transform along with the other parts of the transform block.
2.15 scanning area based coefficient coding decoding (SRCC)
SRCC has been adopted by AVS-3. In the SRCC case, the bottom right position (SRx, SRy) signaled as shown in fig. 18A-18B is scanned and signaled, and coefficients only within the rectangle (e.g., scan area) having four corners (0,0), (SRx,0), (0, SRy), (SRx, SRy). All coefficients outside the rectangle are zero.
3. Examples of problems addressed by embodiments
The current designs have the following problems:
(1) the clipping and shift/rounding operations in the MTS/RST may not be optimal.
(2) The RST applied to two adjacent 4x4 blocks may be costly.
(3) RST may be performed in different ways for different color components.
(4) RST may not be applicable to screen content codec.
(5) The interaction of RST with other codec tools is unclear.
(6) The transformation matrix of the RST can be stored more efficiently.
(7) How to apply the quantization matrix on the RST is unclear.
4. Example embodiments and techniques
The embodiments listed below should be considered as examples to explain the general concept. These examples should not be construed in a narrow manner. Furthermore, the embodiments may be combined in any manner.
In the following description, the coding information may include a prediction mode (e.g., intra/inter/IBC mode), a motion vector, a reference picture, an inter prediction direction, an intra prediction mode, a combined intra prediction (CIIP) mode, an ISP mode, an affine intra mode, a transform core employed, a transform skip flag, and the like, for example, information required when coding a block.
In the discussion that follows, SatShift (x, n) is defined as
Figure BDA0003675962060000251
Shift (x, n) is defined as Shift (x, n) ═ x + offset0) > > n.
In one example, offset0 and/or offset1 are set to (1< < n) > >1 or (1< < n-1). In another example, offset0 and/or offset1 are set to 0.
In another example, offset0 ═ offset1 ═ ((1< < n) > >1) -1 or ((1< (n-1))) -1.
Clip3(min, max, x) is defined as
Figure BDA0003675962060000261
1. After reversing the RST, the output value should be clipped to the range of [ MinCoef, MaxCoef ] (inclusive), where MinCoef and/or MaxCoef are two integer values that may vary.
a. In one example, assuming that the coefficients after dequantization are clipped to the range of [ QMinCoef, qmaxcooef ] (inclusive), MinCoef may be set equal to QMinCoef and/or MaxCoef may be set equal to qmaxcooef.
b. In one example, MinCoef and/or MaxCoef may depend on the color component.
i. In one example, MinCoef and/or MaxCoef may depend on the bit depth of the corresponding color component.
c. In one example, MinCoef and/or MaxCoef may depend on block shape (e.g., square or non-square) and/or block dimensions.
d. In one example, the selection of the values or candidate values for MinCoef and/or MaxCoef may be signaled, such as in SPS, PPS, slice header/slice group header/CTU/CU.
e. In one example, for the luminance component, MinCoef and/or MaxCoef may be derived as:
MinCoef=-(1<<(extended_precision_processing_flagMax(15,BitDepthY+6):15))
MaxCoef=(1<<(extended_precision_processing_flagMax(15,BitDepthY+6):15))–1
where BitDepthY is the bit depth of the luma component and the extended _ precision _ processing _ flag may be signaled, such as in SPS.
f. In one example, for one component, MinCoef and/or MaxCoef may be derived as:
MinCoef=-(1<<(extended_precision_processing_flagMax(15,BitDepthC+6):15))
MaxCoef=(1<<(extended_precision_processing_flagMax(15,BitDepthC+6):15))–1,
where BitDepthC is the bit depth of the chroma component and the extended _ precision _ processing _ flag may be signaled, such as in SPS.
g. In some embodiments, MinCoef is- (1< <15) and MaxCoef is (1< <15) -1.
h. In one example, the consistent bitstream should satisfy that the transform coefficients after the forward RST should be within a given range.
2. It is proposed that the way of applying forward RST and/or reverse RST on the M × N sub-blocks of coefficients may depend on the number of sub-blocks to which forward RST and/or reverse RST are applied, e.g., M-N-4.
a. In one example, the zeroing range may depend on the subblock index to which RST is applied.
i. Alternatively, the return-to-zero range may depend on the number of sub-blocks to which RST is applied.
b. In one example, when there are S sub-blocks to which forward RST and/or reverse RST are applied, the manner in which forward RST and/or reverse RST are applied to the first and second sub-blocks of coefficients in the entire coefficient block may be different, where S >1, e.g., S ═ 2.
For example, the first mxn sub-block may be the top left mxn sub-block.
i. In one example, the nonZeroSize described in section 2.10 may be different for the first M × N sub-block of coefficients (denoted nonZeroSize0) and the second M × N sub-block of coefficients (denoted nonZeroSize 1).
1) In one example, nonZeroSize0 may be larger than nonZeroSize 1. For example, nonZeroSize0 ═ 16 and nonZeroSize1 ═ 8.
in one example, the nonZeroSize as described in section 2.10 may be different when only one mxn sub-block is to be applied with forward RST and/or reverse RST, or when more than one mxn sub-block is to be applied with forward RST and/or reverse RST.
1) In one example, the nonZeroSize may be equal to 8 if there is more than one mxn sub-block to which forward RST and/or reverse RST are to be applied.
3. It is proposed that if the current block size is 4 × H or W × 4 (where H >8 and W >8), the forward RST and/or reverse RST are applied only to one M × N sub-block of coefficients (e.g., the top left M × N sub-block). For example, M ═ N ═ 4.
a. In one example, if H > T1 and/or W > T2, then forward RST and/or reverse RST are applied to only one mxn subblock of coefficients. For example, T1-T2-16.
b. In one example, if H < T1 and/or W < T2, then forward RST and/or reverse RST are applied to only one mxn subblock of coefficients. For example, T1-T2-32.
c. In one example, for all H >8 and/or W >8, the forward RST and/or reverse RST are applied to only one mxn sub-block of coefficients.
d. In one example, if the current block size is M × H or W × N (where H > ═ N and W > ═ M), then forward RST and/or reverse RST are applied only to one M × N sub-block (e.g., the top left M × N sub-block). For example, M ═ N ═ 4.
RST may be applied to non-square areas. Assume that the region size is represented by K × L, where K is not equal to L.
a. Alternatively, in addition, zeroing may be applied to the transform coefficients after the forward RST so that the maximum number of non-zero coefficients is met.
i. In one example, a transform coefficient may be set to 0 if the transform coefficient is located outside the top-left MxM region, where M is not greater than K and M is not greater than L.
5. It is proposed that the coefficients in two adjacent mxn sub-blocks may be contained in a single forward RST and/or reverse RST. For example, M-N-4.
a. In one example, one or more of the following operations may be performed at an encoder. The operations may be performed in sequence.
i. The coefficients in two adjacent M × N sub-blocks are rearranged into a one-dimensional vector having 2 × M × N elements.
Apply forward RST of a transform matrix with 2 × M × N columns and M × N rows (or M × N columns and 2 × M × N rows) to the one-dimensional vector.
Rearranging the transformed one-dimensional vector having M x N elements into a first M x N sub-block (such as the top left sub-block).
All coefficients in the second mxn sub-block may be set to zero.
b. In one example, one or more of the following operations may be performed at a decoder.
The operations may be performed in sequence.
i. The coefficients in the first mxn sub-block (such as the top left sub-block) are rearranged into a one-dimensional vector having mxn elements.
Applying the inverse RST of a transformation matrix having M × N columns and 2 × M × N rows (or 2 × M × N columns and M × N rows) to the one-dimensional vector.
Rearranging the transformed one-dimensional vector having 2 x M x N elements into two adjacent M x N sub-blocks.
c. In one example, a block may be divided into K (K >1) sub-blocks, and primary and secondary transforms may be performed at the sub-block level.
6. The zeroing range (e.g., nonZeroSize described in section 2.10) may depend on the color components.
a. In one example, the range of luma and chroma components may be different for the same block dimension.
7. The zeroing range (e.g., nonZeroSize described in section 2.10) may depend on the codec information.
a. In one example, it may depend on the codec mode, such as intra or non-intra mode.
b. In one example, it may depend on the codec mode, such as intra or inter or IBC mode.
c. In one example, it may depend on reference picture/motion information.
8. The zeroing range (e.g., nonZeroSize described in section 2.10) that suggests a particular block dimension may depend on the Quantization Parameter (QP).
a. In one example, assume that nonZeroSize is equal to nonZeroSize a when QP is equal to QPA, and nonZeroSize is equal to nonZeroSize b when QP is equal to QPB. If QPA is not less than QPB, then nonZeroSizeA is not greater than nonZeroSizeB.
b. Different transform/inverse transform matrices may be used for different nonZeroSize.
9. It is proposed that the zeroing range (e.g., nonZeroSize described in section 2.10) may be signaled, such as in SPS, PPS, picture header, slice header, CTU row, CTU, CU, or any video data unit.
a. Alternatively, multiple ranges may be defined. And an indication of which candidate nonZeroSize to select may be signaled, such as in SPS, PPS, picture header, slice header, CTU row, CTU, and CU.
10. Whether and/or how to apply RST may depend on the color format and/or the use of separate planar codecs and/or color components.
a. In one example, RST may not be applied to chroma components (such as Cb and/or Cr).
b. In one example, RST may not be applied to the chroma components if the color format is 4:0: 0.
c. In one example, RST may not be applied to the chroma components if a separate planar codec is used.
d. In one example, the nonZeroSize of a particular block dimension may depend on the color component.
i. In one example, for the same block dimension, the non zerosize on the chroma component may be smaller than the non zerosize on the luma component.
11. It is proposed that RST control information (such as whether RST is applied and/or which set of transform matrices is selected) can be signaled separately for luma and chroma components when the components are coded using a single coding structure tree.
12. Whether and how RST is applied may depend on the coding information (such as the coding mode) of the current block and/or the neighbor block.
a. In one example, the RST cannot be used for one or more specific intra prediction modes.
i. For example, RST cannot be used in LM mode.
RST cannot be used in LM-T mode, for example.
RST cannot be used in LM-A mode, for example.
RST cannot be used for wide-angle intra prediction mode, for example.
v. for example, RST cannot be used for BDPCM mode or/and DPCM mode or/and RBDPCM mode.
For example, RST cannot be used in the ALWIP mode.
RST cannot be used for certain specific angular intra prediction modes (such as DC, planar, vertical, horizontal, etc.), for example.
RST can be used for the luma component but not the chroma component in LM mode or/and LM-T mode or/and LM-a mode, for example.
For example, RST may not be used for the chroma component when applying joint chroma residual coding.
b. If the RST cannot be applied, a syntax element indicating information related to the RST in the current block may not be signaled.
13. It is proposed that RST can be applied to non-intra coded blocks.
a. In one example, the RST can be applied to the inter-coded.
b. In one example, RST may be applied to a block of an Intra Block Copy (IBC) codec.
c. In one example, the RST may be applied to a block that is coded with combined inter-frame intra prediction (CIIP).
14. It is proposed that RST can be controlled at different levels.
a. For example, information indicating whether RST (such as a control flag) is applicable may be signaled in PPS, slice header, picture header, slice, CTU row, CTU.
Whether rst is applicable may depend on the standard configuration file/level/hierarchy.
15. Proposing whether to apply location-dependent intra prediction combining (PDPC) may depend on whether RST is applied.
a. In one example, if the current block applies RST, PDPC may not be applied.
b. In one example, if the current block applies the RST, then PDPC may be applied.
c. Alternatively, whether RST is applied may depend on whether PDPC is applied.
i. In one example, RST is not applied when PDPC is applied.
if the RST cannot be applied, then a syntax element indicating information related to the RST in the current block may not be signaled.
16. The proposition of whether to filter neighborhood samples for intra prediction may depend on whether RST is applied.
a. In one example, if the current block applies RST, the neighborhood samples may not be filtered.
b. In one example, if the current block applies the RST, the neighborhood samples may be filtered.
c. Alternatively, whether RST is applied may depend on whether neighborhood samples used for intra prediction are filtered.
i. In one example, RST is not applied when neighborhood samples for intra prediction are filtered.
in one example, RST is not applied when neighborhood samples for intra prediction are not filtered.
if the RST cannot be applied, then a syntax element indicating information related to the RST in the current block may not be signaled.
17. It is proposed that RST can be applied when the current block is coded with transform skip.
a. For example, the primary transform is skipped, but the secondary transform may still be applied.
b. The quadratic transform matrix used in the transform skip mode may be different from that used in the non-transform skip mode.
18. It is proposed that the transformation matrix for RST can be stored with a bit width of less than 8. For example, the transformation matrix for RST may be stored with a bit width of 6 or 4.
19. It is proposed that the transformation matrix for RST can be stored in a predictive manner.
a. In one example, a first element in the first transform matrix for RST may be predicted by a second element in the first transform matrix for RST.
i. For example, the difference between two elements may be stored.
For example, the difference may be stored with a bit width of less than 8 (such as 6 or 4).
b. In one example, a first element in a first transformation matrix for RST may be predicted by a second element in a second transformation matrix for RST.
i. For example, the difference between two elements may be stored.
For example, the difference may be stored with a bit width of less than 8 (such as 6 or 4).
20. It is proposed that a first transformation matrix for RST can be derived from a second transformation matrix for RST.
a. In one example, a portion of elements of a second transformation matrix for RST can be picked to construct a first transformation matrix for RST.
b. In one example, the first transformation matrix for RST is derived by rotating or flipping all or a portion of the second transformation matrix for RST.
c. In one example, a first transformation matrix for RST is derived by downsampling or upsampling a second transformation matrix for RST.
21. It is proposed that a syntax element indicating RST-related information in the current block can be signaled before signaling the residual (which can be transformed).
a. In one example, the signaling of information related to the RST may not depend on non-zero or zero coefficients counted when the residual is resolved.
b. In one example, non-zero or zero coefficients may not be calculated when the residual is resolved.
c. In one example, a codec block flag (cbf) flag for a sub-block set to all zeros by RST may not be signaled and inferred to be 0.
d. In one example, a valid flag for a coefficient set to zero by RST may not be signaled and is inferred to be 0.
e. The order of scanning to resolve the residual block may depend on whether and how RST is applied.
i. In one example, the coefficients set to zero by RST may not be scanned.
f. The arithmetic codec context of resolving the residual block may depend on whether and how RST is applied.
22. The proposing whether and how to apply the quantization matrix may depend on whether and how to apply the RST.
a. In one example, a different quantization matrix may be applied whether or not RST is applied.
b. Alternatively, whether and how RST is applied may depend on whether and how the quantization matrix is applied.
i. In one example, RST may not be applied when applying the quantization matrix to the block.
23. It is proposed that RST can be applied to the quantized coefficients/residuals.
a. In one example, when using transform skipping, RST can be applied to the residual.
b. In one example, RST may be applied to the quantized transform coefficients of the block.
24. It is proposed that RST can be applied to the sub-block transform block.
a. In one example, RST may be applied to the top left coefficient generated by the sub-block transform.
25. How and/or whether RST is applied may be proposed to depend on the number of TUs in the CU.
a. For example, how and/or whether RST is applied may depend on whether the number of TUs in the CU is greater than 1.
i. In one example, RST is not applied if the number of TUs in the CU is greater than 1.
in one example, RST is applied to only one of the plurality of TUs in the CU if the number of TUs in the CU is greater than 1.
1) In one example, RST is only applied to the first TU in the CU if the number of TUs in the CU is greater than 1.
2) In one example, RST is only applied to the last TU in the CU if the number of TUs in the CU is greater than 1.
in one example, RST is independently applied to each TU of the CU if the number of TUs in the CU is greater than 1.
1) Alternatively, when the number of TUs in the CU is greater than 1, whether to apply the RST to the first TU of the CU may be determined independently of whether to apply the RST to the second TU of the CU.
2) In one example, when the number of TUs in a CU is greater than 1, whether to apply RST to the TUs of the CU may depend on the number of non-zero coefficients (denoted as NZ) of the TUs, but not on the number of non-zero coefficients of other TUs of the CU.
a) In one example, if NZ is less than a threshold T (e.g., T ═ 2), then RST is not applied to the TU.
b) If it is determined that RST is not applied to the TUs, syntax element(s) indicating whether RST is applied may not be signaled for the TUs of the CU.
b. For example, how and/or whether RST is applied may depend on whether TU size is equal to CU size.
i. In one example, RST is disabled when the CU size is greater than the TU size.
c. It is proposed to use decoding information of the first TU or the last TU in the decoding order of a CU to decide the use of RST and/or signaling of syntax elements related to RST.
i. In one example, RST is not applied to the CU if the number of non-zero coefficients of the first or last TU is less than a threshold T (e.g., T ═ 2).
in one example, RST is not applied to the CU if the number of non-zero coefficients of a sub-region (e.g., top-left 4x4) within the first or last TU is less than a threshold T (e.g., T ═ 2).
26. A flag with TU is proposed to control whether RST is applied or not.
a. Whether RST is applied to the TU may depend on a flag of the TU.
i. When a flag is not present or derived, it may be derived as false.
Alternatively, when a flag is not present or not derived, it may be derived as true.
b. When a CU contains only one TU, the flag for TU may be equal to the CU RST flag, which may be derived on the fly (e.g., based on coefficient information).
c. When the number of TUs in a CU is greater than 1, the flag for the last TU in the CU may be derived from the CU RST flag, which may be derived on the fly (e.g., based on coefficient information), and the flags for all other TUs may be set to false.
i. Alternatively, when the number of TUs in a CU is greater than 1, the flag for the last TU in the CU may be derived from the CU RST flag, and the flags for all other TUs may be set to true.
27. It is proposed that whether and/or how to apply RST on a first component of a block may be different than whether and/or how to apply RST on a second component of the block when the number of components is greater than 1 and a single coding tree is used. That is, separate control of RST is applied to the different color components.
a. It is proposed that when the number of components is greater than 1 and a single coding tree is used, whether to apply RST to a first component of the block can be determined independently of whether to apply RST to a second component of the block.
i. In one example, when the number of components is greater than 1 and a single codec tree is used, whether to apply RST to a component of the block may depend on the decoding information (e.g., the number of non-zero coefficients (denoted as NZ)) of that component of the block, but not on the decoding information of any other component of the block.
1) In one example, if NZ is less than a threshold T (e.g., T ═ 2), RST is not applied to the components of the block.
2) If it is determined not to apply RST to the components of the block, syntax element(s) indicating whether RST is to be applied may not be signaled for the components of the block.
b. In one example, for the case of a single tree, whether to enable RST and/or how to apply RST may be determined independently for the luma and chroma components.
28. It is proposed that when the number of components is greater than 1 and a single coding tree is used, it can be determined by the second component of the block whether to apply RST to the first component of the block.
a. In one example, when the number of components is greater than 1 and a single codec tree is used, whether to apply RST to the first component of the block may be determined by the number of non-zero coefficients of the second component of the block.
i. In one example, RST is not applied to the first component of the block if NZ (e.g., the second component of the block or the number of non-zero coefficients of a sub-region of the block (e.g., top left 4x 4)) is less than a threshold T (e.g., T ═ 2).
if it is determined not to apply RST to the first component of the block, syntax element(s) indicating whether RST is applied may not be signaled for the component of the block.
in one example, the first component is Cb or Cr and the second component is Y.
in one example, the first component is R or B and the second component is G.
29. In one example, whether or not to apply the bullet 25 and/or the bullet 26 and/or the bullet 27 may depend on the width and height (denoted W and H) of the CU and/or TU and/or block and/or the maximum transform block size.
a. In one example, bullets 25 and/or 26 and/or 27 are only applied when W > T or H > T. In one example, T may be equal to 64. In an alternative example, T may be equal to the maximum transform size.
b. In one example, bullets 25 and/or 26 and/or 27 are applied only when W > T and H > T. In one example, T may be equal to 64. In an alternative example, T may be equal to the maximum transform size.
c. In one example, bullets 25 and/or 26 and/or 27 are applied only when W > T and H > T. In one example, T may be equal to 64. In an alternative example, T may be equal to the maximum transform size.
Improvements in or relating to double convertible (SST)
30. In one example, SST may be determined to be enabled or disabled for a video unit.
a. For example, the determination may be made based on signaling in a video syntax structure associated with the video unit.
i. In one example, a signaling (such as a flag) may be coded with at least one context in an arithmetic codec.
in one example, signaling may be conditionally skipped based on coding/decoding information such as block dimension, coding block flag (cbf), and coding mode of the current block.
1) In one example, signaling may be skipped when cbf is equal to 0.
b. For example, the determination may be made based on inference without signaling associated with the video unit.
i. The inference may depend on information of the video unit, such as the codec mode, intra prediction mode, type of main transform, and dimension or size of the video unit.
c. For example, a video unit may be a block, such as a codec block or a transform block. The video syntax structure may be a Coding Unit (CU) or a Transform Unit (TU).
d. For example, a video unit may be a picture. The video syntax structure may be a picture header or a PPS.
e. For example, the video units may be stripes. The video syntax structure may be a slice header.
f. For example, the video units may be stripes. The video syntax structure may be a sequence header or an SPS.
g. The video syntax structure may be VPS/DPS/APS/slice/CTU line/CTU.
31. In one example, whether SST is disabled or enabled may be based on block dimensions.
h. SST may be disabled, for example, if at least one of the block width or height is less than (or not greater than) Tmin.
i. For example, SST may be disabled if both the block width and height are less than Tmin.
j. For example, SST may be disabled if at least one of the block width or height is greater than (or not less than) Tmax.
k. For example, SST may be disabled if both the block width and height are greater than (or not less than) Tmax.
For example, Tmin may be 2 or 4.
For example, Tmax may be 32, 64 or 128.
In one example, SST may be disabled based on a block width or/and height of the first color component.
i. For example, the first color component may be a luminance color component.
For example, the first color component may be an R color component.
In one example, SST may be disabled based on block width or/and height of all color components.
p. alternatively, and in addition, when the SST is disabled, the indication of use of the SST and/or related signaling of other side information is omitted.
In one example, SST may be enabled for a first color component and SST may be disabled for a second color component based on a block dimension.
32. In one example, a set of SSTs may be employed, and the selection of an SST matrix for a block may depend on the decoded information, such as the block dimension.
Alternatively, also, the same decoded/signaled SST index or the same on/off control flag may be interpreted in a different way, such as different matrices corresponding to different block dimensions.
s. for example, different SSTs in a set may have different dimensions, such as 4x4 SST, 8x8 SST or 16x16 SST,
for example, 4 × 4 SST may be applied to the block in the case of the condition C4, and 8 × 8 SST may be applied to the block in the case of the condition C8.
i. Alternatively, also 4 × 4 SST may be applied to the blocks in the case of the condition C4, 8 × 8 SST may be applied to the blocks in the case of the condition C8.
In one example, condition C4 is that at least one of the tile width and height is equal to 4.
v. in one example, condition C4 is that both the tile width and height are equal to 4.
In one example, condition C4 is that the smaller of the tile width and height is equal to 4.
In one example, the condition C8 is that the smaller value of the block width and height is not less than 8.
y. in one example, condition C8 is that at least one of the tile width and height is equal to 8.
z. in one example, condition C8 is that both the tile width and height are equal to 8.
In one example, condition C8 is that at least one of the block width and height is greater than or equal to 8.
In one example, condition C8 is that both the tile width and height are greater than or equal to 8.
cc. in one example, the condition CN is that at least one of the block width and the height is equal to N.
dd. in one example, the condition CN is that both the block width and height are equal to N.
ee. in one example, the condition CN is that at least one of the tile width and height is greater than or equal to N.
ff. in one example, the condition CN is that both the block width and height are greater than or equal to N.
gg. in one example, nxn SST may be applied to the top left nxn sub-block of a transform block.
hh. in one example, SST can be applied horizontally or vertically or both horizontally and vertically, depending on the block dimensions.
in one example, different SST matrices may be selected for different color components.
i. For example, the above rules may be applied independently to different color components.
jj. in one example, one and the same SST matrix may be selected for all color components.
i. For example, the above rule may apply to the first color component, while the selected SST matrix may apply to all color components.
1) In one example, the first color component may be a luminance component.
2) In one example, the first color component may be a Cb or Cr component.
3) Alternatively, further, if the selected SST matrix is not applicable to the second color component, SST is disabled for the second color component.
kk. in one example, SST may be allowed only if the SST matrix of the selection of all color components is the same (by applying the above rules independently to different color components).
i. Alternatively, and in addition, if the SST is not allowed, the signaling of indications of use of the SST and/or other side information related thereto is omitted.
33. In one example, NxN SST may be applied to at least one NxN sub-block that is not the same as the top left NxN sub-block.
ll. for example, NxN SST may be applied to the top left NxN subblock to the right of the neighboring NxN subblock.
For example, NxN SST may be applied to the NxN sub-block adjacent to the bottom of the top left NxN sub-block.
34. In one example, a first SST may be applied as a horizontal transform and a second SST may be applied as a vertical transform to a transform block, where the first SST and the second SST may be different.
nn. for example, the first SST and the second SST may have different dimensions.
oo. assuming that a first SST is an N SST, a second SST is an M SST, and the transformed block dimension is W H, the following rules may be applied:
i. if W is equal to W1, then N is set equal to W1, where W1 is an integer, such as 4 or 8.
N is set equal to W2 if W is greater than or not less than W2, where W2 is an integer, such as 4 or 8.
if H is equal to H1, then M is set equal to H1, where H1 is an integer, such as 4 or 8.
if H is greater than or not less than H2, then M is set equal to H2, where H2 is an integer, such as 4 or 8.
35. In one example, one SST in a set of SSTs may be used for a block, where there are multiple SSTs in the set that have the same dimension.
pp. in one example, a notification message is signaled to indicate which SST is selected for use.
qq. in one example, it is inferred without signaling which SST is selected. The inference may be dependent on
i. The block dimension.
intra prediction mode.
Transformed quantized/unquantized coefficients.
Color components.
v. type of primary transform.
36. In one example, if the primary transformation is different, a different SST may be applied.
rr. for example, the SST used in association with DCT2 may be different from the SST used in association with DST 7.
37. In one example, SST may be applied to the chroma components.
ss. in one example, different SST matrices may be applied to different color components, such as Y, Cb and Cr.
tt. in one example, different rules of whether and how SST is applied may follow different color components.
uu. in one example, separate control of two color components may be applied.
i. In one example, an indication of the use of SST and/or an indication of a matrix may be signaled for each of the two color components.
An indication of the use of SST and/or an indication of SST matrix may be signaled according to a conditional check of the bottom right location of the scan area. The right base orientation is represented by (SRx, SRy), such as depicted in fig. 18A-B.
vv. in one example, the indication of use of SST and/or the indication of SST matrix may be omitted when SRx is greater than or not less than Kx and/or when SRy is greater than or not less than Ky.
ww. in one example, the indication of use of SST and/or the indication of SST matrix may be omitted when SRx is less than or not greater than K 'x and/or when SRy is less than or not greater than K' y.
xx. alternatively, further, when no indication is signaled, it may be inferred that SST is disabled.
yy. alternatively, further, when no indication is signaled, a default SST may be inferred.
i. In one example, the default SST may be set to the K x L transform.
in one example, the default SST may be determined from decoded information (such as block dimensions).
zz. alternatively, and in addition, the above method may be applied to other indivisible quadratic/principal transformations.
Fig. 1600 is a block diagram of the video processing apparatus 1600. Apparatus 1600 may be used to implement one or more of the methods described herein. The apparatus 1600 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1600 may include one or more processors 1602, one or more memories 1604, and video processing hardware 1606. The processor(s) 1602 may be configured to implement one or more of the methods described in this document. The memory(s) 1604 may be used for storing data and code for implementing the methods and techniques described herein. Video processing hardware 1606 may be used to implement some of the techniques described in this document in hardware circuits.
Fig. 17 is a flow diagram of an example method 1700 of video processing. The method 1700 includes determining (1702) a constraint rule for selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block. The method 1700 includes performing (1704) the transformation by applying a quadratic transformation with reduced dimensions according to a constraint rule. The quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block. The quadratic transform with reduced dimensions is applied in a specific order along with the primary transform during the conversion.
Additional embodiments and techniques are described in the following examples.
1. A video processing method, comprising: determining a constraint rule for selectively applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension in accordance with the constraint rule; wherein the quadratic transform having a reduced dimension has a dimension reduced from the dimension of the current video block, and wherein the quadratic transform having a reduced dimension is applied in a specific order along with the primary transform during the conversion.
2. The method of example 1, wherein converting comprises encoding the current video block into a bitstream representation, and wherein the specific order comprises first applying a primary transform in a forward direction, and then selectively applying a quadratic transform with reduced dimensions in the forward direction, and then quantizing an output of the quadratic transform with reduced dimensions in the forward direction.
3. The method of example 1, wherein converting comprises decoding the current video block from the bitstream representation, and wherein the particular order comprises first applying dequantization to the bitstream representation, then selectively applying a quadratic transform having a reduced dimension in an inverse direction, and then applying a primary transform to an output of the quadratic transform having the reduced dimension in the inverse direction.
4. The method of any of examples 1-3, wherein a constraint rule specifies clipping a range of an output of a quadratic transform with reduced dimensionality in an inverse direction to a range of [ MinCoef, MaxCoef ] (inclusive), wherein MinCoef and/or MaxCoef are two integer values that are a function of a condition of a current video block.
5. The method of example 4, wherein the condition of the current video block is a type of color or luma component represented by the current video block.
6. The method of example 1, wherein the constraint rule specifies applying a quadratic transform with a reduced dimension to one or more MxN sub-blocks of the current video block and zeroing remaining sub-blocks of the current video block.
7. The method of example 1, wherein the constraint rule specifies that a quadratic transform with a reduced dimension is to be applied differently to different sub-blocks of the current video block.
8. The method of any of examples 1-5, wherein the constraint rule specifies that since the size of the current video block is 4xH or Wx4, a quadratic transform with a reduced dimension is applied to exactly one MxN subblock of the current video block, wherein H is a height of an integer pixel and W is a width of an integer pixel.
9. The method of example 8, wherein H >8 or W > 8.
10. The method of any of examples 1 to 9, wherein the current video block is a non-square region of the video.
11. The method of example 2 or 3, wherein the constraint rule specifies zeroing transform coefficients of the primary transform in a forward direction or padding zero coefficients to outputs of the secondary transform in a reverse direction.
Other embodiments of examples 1-5 are described in item 1 of section 4. Other embodiments of examples 6-7 are described in item 2 of section 4. Other embodiments of examples 8-9 are described in item 3 of section 4. Other embodiments of examples 10-11 are described in item 4 of section 4.
12. A video processing method, comprising: determining a constraint rule for selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and a neighborhood video region and pixels of the current video block and pixels of the neighborhood region, and performing the transition by applying the quadratic transform having the reduced dimension in accordance with the constraint rule; wherein the quadratic transform with reduced dimensions has dimensions reduced from the dimensions of the current video block and the neighborhood video region, and wherein the quadratic transform with reduced dimensions is applied in a specific order together with the main transform during the conversion.
13. The method of example 12, wherein the neighborhood video region comprises a top left block of the current video block.
14. The method of example 12, wherein the current video block and the neighborhood video region correspond to sub-blocks of a parent video block.
Other embodiments of examples 12-14 are described in item 5 of section 4.
15. A video processing method, comprising: determining a zeroing rule for selectively applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension according to the zeroing rule; wherein the quadratic transform having a reduced dimension has a dimension reduced from the dimension of the current video block; wherein the zeroing rule specifies a maximum number of coefficients used by a quadratic transform with reduced dimensionality.
16. The method of example 15, wherein the maximum number of coefficients is a function of a component identification of the current video block.
17. The method of example 16, wherein the maximum number of coefficients is different for a luminance video block and a chrominance video block.
18. The method of any of examples 15-17, wherein the zeroing rule specifies a range of zeroing as a function of codec information of the current video block.
19. The method of any of examples 15-17, wherein the zeroing rule specifies a range of zeroing as a function of a quantization parameter of the current video block.
20. The method according to any of examples 15 to 19, wherein the zeroing range is indicated in the bitstream representation by a field included at a sequence parameter set level, or a picture header or a slice header, or a codec tree unit row, or a codec tree unit, or a codec unit, or at a video data unit level.
Other embodiments of examples 15-17 are described in item 6 of section 4. Other embodiments of example 18 are described in item 7 of section 4. Other embodiments of example 19 are described in item 8 of section 4. Other embodiments of example 20 are described in item 9 of section 4.
21. A video processing method, comprising: determining a condition for selectively applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension according to the condition; wherein the quadratic transform having a reduced dimension has a dimension reduced from the dimension of the current video block; and wherein the condition is signaled in a bit stream representation.
22. The method of example 21, wherein the condition is a color format or use of a separate planar codec or based on a color identification of the current video block.
Other embodiments of examples 21-22 are described in item 10 of section 4.
23. The method of any of examples 21 to 22, wherein the condition is signaled separately for chroma and luma components in the bitstream representation.
Other embodiments of example 23 are described in item 11 of section 4.
24. The method of any of examples 21 to 23, wherein the condition depends on codec information of the current video block and a neighborhood video region.
25. The method of example 24, wherein the condition excludes application of a current video block that is coded using a particular intra-prediction mode.
Other embodiments of examples 24-25 are described in item 12 of section 4.
26. The method of example 24, wherein the condition specifies an application to the inter-coded current video block.
27. The method of example 24, wherein the condition specifies an application to a current video block that is coded using intra block copy mode.
Other embodiments of examples 25-26 are described in item 13 of section 4.
28. The method of example 21, wherein the condition is signaled in the bitstream representation at a level such that all blocks within the level meet the condition, wherein the level is a sequence parameter set level, or a picture header, or a slice header, or a codec tree element row, or a codec tree element, or a codec element, or at a video data element level.
Other embodiments of example 28 are described in item 14 of section 4.
29. The method of example 21, wherein the condition is that the current video block is coded using a transform skip mode.
Other embodiments of example 29 are described in item 17 of section 4.
30. A video processing method, comprising: selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block, and performing the transition by applying the quadratic transform having the reduced dimension according to a condition; wherein the quadratic transform having a reduced dimension has a dimension reduced from the dimension of the current video block; and wherein the converting comprises selectively applying a location dependent intra prediction combining (PDPC) based on the coexistence rule.
31. The method of example 30, wherein the coexistence rule excludes applying PDPC to the current video block due to applying a quadratic transform.
32. The method of example 30, wherein the coexistence rule specifies applying PDPC to the current video block as a result of applying a quadratic transform.
33. The method of example 30, wherein selectively applying the quadratic transform is performed on a current video block using the PDPC.
Other embodiments of examples 30-33 are described in item 15 of section 4.
34. A video processing method, comprising: applying a quadratic transform having a reduced dimension during a conversion between a bitstream representation of a current video block and pixels of the current video block, and performing the conversion by applying the quadratic transform having the reduced dimension according to a condition; wherein the quadratic transform having a reduced dimension has a dimension reduced from the dimension of the current video block; and wherein the application controls the use of neighborhood samples for intra prediction during the transition.
Other embodiments of example 34 are described in item 16 of section 4.
35. A video processing method, comprising: selectively applying a quadratic transform having a reduced dimension during a transition between a bitstream representation of a current video block and pixels of the current video block, and performing the transition by applying the quadratic transform having the reduced dimension in accordance with a condition; wherein the quadratic transform having a reduced dimension has a dimension reduced from the dimension of the current video block; and wherein selectively applying controls uses of the quantization matrix during the conversion.
36. The method of example 35, wherein the use of the quantization matrix occurs only as a result of applying a quadratic transform.
Other embodiments of examples 35-36 are described in item 22 of section 4.
37. The method of any of examples 1-36, wherein the primary and secondary transforms are stored as transform matrices having bit widths less than 8.
38. The method of any of examples 1-36, wherein the primary transform and the secondary transform are stored as a predictive transform matrix.
39. The method of any of examples 1-36, wherein the primary transform is derivable from the secondary transform using a first rule, or wherein the secondary transform is derivable from the primary transform using a second rule.
40. The method of any of examples 1-36, wherein the bitstream representation includes information about a quadratic transform or a primary transform prior to residual information of the current video block.
Other embodiments of examples 37-40 are described in items 18, 19, 20, and 21 of section 4.
41. The method of example 1, wherein a constraint rule for selectively applying a quadratic transform depends on a number of transform units in a codec unit of a current video block.
42. The method of example 41, wherein the constraint rule specifies that the quadratic transform is to be applied because a number of transform units in a codec unit is greater than one.
43. The method of example 1, wherein a flag in the bitstream representation indicates whether a quadratic transform with reduced dimensions is to be applied to the conversion.
44. The method of example 1, wherein the current video block comprises more than one component video block, and wherein the constraint rule specifies that a quadratic transform with a reduced dimension is applicable differently for different component video blocks.
45. The method of example 44, wherein the constraint rule specifies applicability of a quadratic transform having a reduced dimension to a first component video block based on how the constraint rule applies to a second component video block.
46. The method of any of examples 44-45, wherein the constraint rule is further dependent on a dimension of the current video block.
Other embodiments of examples 47-53 are described in, for example, items 30 through 38 of section 4.
47. A video processing method, comprising: determining whether to use a sub-separable transform (SST) for a transition between a current video block of a video and a bitstream representation of the video based on codec conditions; and performing a conversion in accordance with the determination.
48. The method of example 47, wherein the codec condition corresponds to a syntax element in the bitstream representation.
49. The method of example 48, wherein the coding conditions comprise a size of the current video block.
50. A method as in any one of examples 47-49, wherein, when it is determined to use SST, switching to use a selected SST, the selected SST selected from a set of SSTs based on another codec condition.
51. The method of example 50, wherein the other codec condition comprises a dimension of the current video block.
52. The method of any of examples 1-51, wherein converting comprises decoding and parsing the bitstream representation to generate the video.
53. The method of any of examples 1-51, wherein converting comprises encoding the video into a bitstream representation.
54. A video processing apparatus comprising a processor configured to implement any one or more of examples 1 to 53.
55. A computer-readable medium having code stored thereon, the code, when executed by a processor, causes the processor to implement the method of any one or more of examples 1 to 53.
It should be appreciated that the disclosed techniques may be embodied in a video encoder or decoder to improve compression efficiency using techniques that include using a quadratic transform with a reduced dimensionality.
Fig. 19 is a block diagram illustrating an example video processing system 1900 in which various techniques disclosed herein may be implemented. Various implementations may include some or all of the components of system 1900. The system 1900 may include an input 1902 for receiving video content. The video content may be received in an original or uncompressed format (e.g., 8 or 10 bit multi-component pixel values), or may be received in a compressed or encoded format. Input 1902 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, Passive Optical Networks (PONs), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.
The system 1900 can include a codec component 1904 that can implement various codecs or encoding methods described in this document. The codec component 1904 may reduce the average bit rate of the video from the input 1902 to the output of the codec component 1904 to produce a codec representation of the video. Thus, codec techniques are sometimes referred to as video compression or video transcoding techniques. The output of the codec component 1904 may be stored or transmitted via a connected communication (as represented by component 1906). The bitstream (or codec) representation of the stored or communicated video received at input 1902 may be used 1908 by the component for generating pixel values or displayable video sent to display interface 1910. The process of generating a user viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "codec" operations or tools, it should be understood that codec tools or operations are used at the encoder, while corresponding decoding tools or operations that reverse the codec results will be performed by the decoder.
Examples of a peripheral bus interface or display interface may include Universal Serial Bus (USB) or High Definition Multimedia Interface (HDMI) or Displayport, among others. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described in this document may be embodied in various electronic devices such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and/or video display.
FIG. 20 is a flow diagram of another example method of video processing in accordance with the present technology. The method 2000 includes, at operation 2010, determining whether a separable quadratic transformation (SST) tool is enabled for a video unit for a transition between the video unit of the video and a bitstream representation of the video. The method 2000 includes, at operation 2020, performing a conversion based on the determination.
In some embodiments, the determination is based on a syntax structure associated with the video unit. In some embodiments, the syntax structure is included in the bitstream representation and the syntax structure is coded with at least one context used in the arithmetic coding.
In some embodiments, syntax structures are omitted from the bitstream representation based on characteristics of the video unit. In some embodiments, the characteristics of the video unit include at least a dimension of the video unit, a codec mode of the video unit, or a syntax flag associated with the video unit. In some embodiments, in the event that a codec block flag associated with a video unit is equal to zero, syntax structures are omitted from the bitstream representation.
In some embodiments, the video unit comprises a codec block or a transform block, and the syntax structure comprises a codec unit or a transform unit. In some embodiments, the video unit comprises a picture and the syntax structure comprises a picture header or a picture parameter set. In some embodiments, the video unit comprises a slice, and the syntax structure comprises a slice header, a sequence header, or a sequence parameter set. In some embodiments, the syntax structure comprises a video parameter set, a decoder parameter set, an adaptive parameter set, a slice group, a slice, a row of coding tree units, or a coding tree unit.
In some embodiments, the determination is based on characteristics of the video unit. In some embodiments, the characteristic of the video unit includes at least a codec mode of the video unit, an intra prediction mode of the video unit, a type of a primary transform of the video unit, or a dimension of the video unit. In some embodiments, the video unit comprises a block of video, and the characteristic of the video unit comprises a dimension of the block. In some embodiments, an SST tool is disabled if at least one of the width or height of the block is less than or equal to a threshold Tmin. In some embodiments, Tmin is equal to 2 or 4. In some embodiments, an SST tool is disabled if at least one of the width or height of a block is greater than or equal to a threshold Tmax. In some embodiments, Tmax is equal to 32, 64, or 128. In some embodiments, the determination is based on the width of the block and the height of the color component of the block. In some embodiments, the color component comprises a luminance component or an R color component. In some embodiments, the determination is based on the width of the block and the height of all color components of the block.
In some embodiments, signaling the use of or information related to the SST tool is omitted from the bitstream representation in the event that the SST tool is determined to be disabled. In some embodiments, SST is enabled for a first color component of a video unit and SST is disabled for a second color component of the video unit.
FIG. 21 is a flow diagram of another example method of video processing in accordance with the present technology. The method 2100 includes, at operation 2110, determining a manner of indicating usage of a transformation tool or a transformation matrix used by the transformation tool based on a bottom right location (SRx, SRy) of the scan area for a transition between a video unit of the video and a bitstream representation of the video. The method 2100 further includes, at operation 2120, performing a conversion based on the determination.
In some embodiments, the transformation means comprises at least a Separable Secondary Transformation (SST), an inseparable secondary transformation or a primary transformation. In some embodiments, the indication of use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation in case SRx is greater than or equal to the first threshold and/or SRy is greater than or equal to the second threshold. In some embodiments, the indication of use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation in case SRx is less than or equal to the third threshold and/or SRy is less than or equal to the fourth threshold. In some embodiments, the transformation tool is considered disabled in case the use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation. In some embodiments, a default transformation matrix is used in case the use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation. In some embodiments, coefficients located outside the scan area are zero.
FIG. 22 is a flow diagram of another example method of video processing in accordance with the present technology. The method 2200 includes, at operation 2210, determining, for a transition between a block of video and a bitstream representation of the video, a transform matrix for use in a quadratic transform (SST) separable tool based on characteristics of the block. SST tools provide a set of available transformation matrices. The method 2200 also includes, at operation 2220, performing a transformation based on the determination.
In some embodiments, the characteristics of the block include at least a dimension of the block, an intra prediction mode of the block, quantized/unquantized coefficients after applying the transform, a color component of the block, or a type of a main transform of the block. In some embodiments, the same syntax elements associated with the use of SST tools indicate different transform matrices for blocks of different dimensions. In some embodiments, the set of available transform matrices includes at least a 4x4 matrix, an 8x8 matrix, … …, or an N x N matrix, where N is an integer. The transformation matrix is determined according to conditions relative to the characteristics of the block. In some embodiments, where the condition specifies that at least one of the width or height of the block is equal to 4, the transform matrix is determined to be a 4x4 matrix. In some embodiments, where the condition specifies that at least one of the width or height of the block is equal to 8, the transform matrix is determined to be an 8x8 matrix.
In some embodiments, where the condition specifies that at least one of the width or height of the block is greater than or equal to 8, the transform matrix is determined to be an 8x8 matrix. In some embodiments, where the condition specifies that at least one of the width or the height of the block is equal to N, the transform matrix is determined to be an N × N matrix. In some embodiments, the transform matrix is determined to be an nxn matrix if the condition specifies that at least one of the width or the height of the block is greater than or equal to N. In some embodiments, the transform matrix is determined to be an nxn matrix to be applied to a top-left nxn sub-block of the block. In some embodiments, the transform matrix is determined to be an N × N matrix to be applied to an N × N sub-block of the block that is different from the top left N × N sub-block. In some embodiments, the nxn sub-block is right-adjacent or bottom-adjacent to the top nxn sub-block.
In some embodiments, the manner in which the SST tool is applied is based on the characteristics of the blocks, including horizontal and/or vertical application of the SST tool. In some embodiments, the first transform matrix is determined to be usable as a horizontal transform on the block and the second transform matrix is determined to be usable as a vertical transform on the block, and the first transform matrix and the second transform matrix are different. In some embodiments, the first transformation matrix and the second transformation matrix have different dimensions. In some embodiments, the first transform matrix is of size N × N, the second transform matrix is of size M × M, and the dimension of the block is W × H, and determining M or N is based on at least one of W or H. In some embodiments, where W is equal to W1, N is determined to be W1, where W1 is equal to 4 or 8. In some embodiments, where W is greater than or equal to W2, N is determined to be W2, where W2 is 4 or 8. In some embodiments, M is determined to be H1 where H1 is equal to 4 or 8 where H is equal to H1. In some embodiments, M is determined to be H2 where H is greater than or equal to H2, where H2 is equal to 4 or 8. In some embodiments, the set of available transformation matrices includes at least two matrices having the same dimensions.
In some embodiments, the transform matrix is indicated using syntax elements in the bitstream representation. In some embodiments, the transform matrix is derived without any indication in the bitstream representation. In some embodiments, different transform matrices are applied to different color components of a block. In some embodiments, for each color component of a block, the transformation matrix is determined independently based on the characteristics of the block. In some embodiments, the use of the transform matrix for each component of the block is signaled separately in the bitstream. In some embodiments, the same transform matrix is applied to all color components of the block. In some embodiments, the application of the transform matrix to the first color component of the block is first determined based on the characteristics of the block, and then the same transform matrix is determined to be applied to all remaining color components of the block. In some embodiments, the first color component comprises a luma component, a Cb component, or a Cr component. In some embodiments, where the transformation matrix is not applied to a second color component of the remaining color components, the SST tool is disabled for the second color component.
In some embodiments, SST tools are enabled where the transformation matrices applicable to different color components of a block are the same. In some embodiments, signaling the use of or information related to the SST tool is omitted from the bitstream representation in the event that the SST tool is determined to be disabled. In some embodiments, different transformation matrices are applied for different primary transformation types. In some embodiments, the different primary transform types include at least DCT2 or DST 7.
In some embodiments, performing the conversion includes generating a bitstream representation based on the video blocks. In some embodiments, performing the conversion includes generating a video block from the bit stream representation.
Fig. 23 is a block diagram illustrating an example video codec system 100 that may utilize techniques of this disclosure.
As shown in fig. 23, the video codec system 100 may include a source device 110 and a destination device 120. Source device 110 (which may be referred to as a video encoding device) generates encoded video data. Destination device 120 (which may be referred to as a video decoding device) may decode the encoded video data generated by source device 110.
The source device 110 may include a video source 112, a video encoder 114, and an input/output (I/O) interface 116.
The video source 112 may include a source such as a video capture device, an interface that receives video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of such sources. The video data may include one or more pictures. The video encoder 114 encodes video data from the video source 112 to generate a bitstream. The bitstream may comprise a sequence of bits forming a codec representation of the video data. The bitstream may include coded pictures and associated data. A coded picture is a coded representation of a picture. The association data may include sequence parameter sets, picture parameter sets, and other syntax structures. The I/O interface 116 may include a modulator/demodulator (modem) and/or a transmitter. The encoded video data may be sent directly to the destination device 120 over the network 130a via the I/O interface 116. The encoded video data may also be stored on storage medium/server 130b for access by destination device 120.
Destination device 120 may include I/O interface 126, video decoder 124, and display device 122.
I/O interface 126 may include a receiver and/or a modem. I/O interface 126 may retrieve encoded video data from source device 110 or storage medium/server 130 b. The video decoder 124 may decode the encoded video data. The display device 122 may display the decoded video data to a user. Display device 122 may be integrated with destination device 120 or may be external to destination device 120 configured to interface with an external display device.
The video encoder 114 and the video decoder 124 may operate in accordance with video compression standards such as the High Efficiency Video Codec (HEVC) standard, the universal video codec (VVC) standard, and other current and/or further standards.
Fig. 24 is a block diagram illustrating an example of a video encoder 200, which may be the video encoder 114 in the system 100 illustrated in fig. 23.
Video encoder 200 may be configured to perform any or all of the techniques of this disclosure. In the example of fig. 24, the video encoder 200 includes a number of functional components. The techniques described in this disclosure may be shared among various components of the video encoder 200. In some examples, the processor may be configured to perform any or all of the techniques described in this disclosure.
The functional components of the video encoder 200 may include a partition unit 201, a prediction unit 202, a residual generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 210, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy coding unit 214, and the prediction unit 202 may include a mode selection unit 203, a motion estimation unit 204, a motion compensation unit 205, and an intra prediction unit 206.
In other examples, video encoder 200 may include more, fewer, or different functional components. In one example, the prediction unit 202 may include an Intra Block Copy (IBC) unit. The IBC unit may perform prediction in an IBC mode, where the at least one reference picture is a picture in which the current video block is located.
Furthermore, some components such as the motion estimation unit 204 and the motion compensation unit 205 may be highly integrated, but are separately represented in the example of fig. 19 for explanation purposes.
Partition unit 201 may partition a picture into one or more video blocks. The video encoder 200 and the video decoder 300 may support various video block sizes.
The mode selection unit 203 may, for example, select one of the coding modes (intra or inter) based on the error result and provide the resulting intra or inter coded block to the residual generation unit 207 to generate residual block data and to the reconstruction unit 212 to reconstruct the coded block to be used as a reference picture. In some examples, mode selection unit 203 may select a Combination of Intra and Inter Prediction (CIIP) modes, where the prediction is based on an inter prediction signal and an intra prediction signal. In the case of inter prediction, mode selection unit 203 may also select the resolution of the motion vector for the block (e.g., sub-pixel or integer-pixel precision).
To perform inter prediction on the current video block, motion estimation unit 204 may generate motion information for the current video block by comparing one or more reference frames from buffer 213 to the current video block. Motion compensation unit 205 may determine a predictive video block for the current video block based on motion information and decoded samples for pictures from buffer 213 other than the picture associated with the current video block.
The motion estimation unit 204 and the motion compensation unit 205 may perform different operations on the current video block, e.g., depending on whether the current video block is in an I-slice, a P-slice, or a B-slice.
In some examples, motion estimation unit 204 may perform uni-directional prediction on the current video block, and motion estimation unit 204 may search list 0 or list 1 reference pictures for a reference video block for the current video block. Motion estimation unit 204 may then generate a reference index indicating a reference picture in list 0 or list 1 containing a reference video block and a motion vector indicating the spatial displacement between the current video block and the reference video block. Motion estimation unit 204 may output the reference index, the prediction direction indicator, and the motion vector as motion information of the current video block. The motion compensation unit 205 may generate a prediction video block of the current block based on a reference video block indicated by motion information of the current video block.
In other examples, motion estimation unit 204 may perform bi-prediction for the current video block, motion estimation unit 204 may search the reference pictures in list 0 for a reference video block for the current video block, and may also search the reference pictures in list 1 for another reference video block for the current video block. Motion estimation unit 204 may then generate reference indices indicating the reference pictures in list 0 and list 1 containing the reference video block and motion vectors indicating the spatial displacement between the reference video block and the current video block. Motion estimation unit 204 may output the reference index and the motion vector of the current video block as motion information for the current video block. Motion compensation unit 205 may generate a prediction video block for the current video block based on the reference video block indicated by the motion information for the current video block.
In some examples, motion estimation unit 204 may output the complete set of motion information for decoding processing by a decoder.
In some examples, motion estimation unit 204 may not output the full set of motion information for the current video. Instead, motion estimation unit 204 may signal motion information for the current video block with reference to motion information of another video block. For example, motion estimation unit 204 may determine that the motion information of the current video block is sufficiently similar to the motion information of the neighbor video block.
In one example, motion estimation unit 204 may indicate a value in a syntax structure associated with the current video block that indicates to video decoder 300 that the current video block has the same motion information as another video block.
In another example, motion estimation unit 204 may identify another video block and a Motion Vector Difference (MVD) in a syntax structure associated with the current video block. The motion vector difference indicates a difference between a motion vector of the current video block and a motion vector of the indicated video block. The video decoder 300 may determine a motion vector for the current video block using the motion vector and the motion vector difference for the indicated video block.
As discussed above, the video encoder 200 may predictively signal the motion vectors. Two examples of prediction signaling techniques that may be implemented by video encoder 200 include Advanced Motion Vector Prediction (AMVP) and Merge mode signaling.
The intra prediction unit 206 may perform intra prediction on the current video block. When intra prediction unit 206 performs intra prediction on a current video block, intra prediction unit 206 may generate prediction data for the current video block based on decoded samples of other video blocks in the same picture. The prediction data for the current video block may include a prediction video block and various syntax elements.
Residual generation unit 207 may generate residual data for the current video block by subtracting (e.g., as indicated by a minus sign) the predicted video block(s) of the current video block from the current video block. The residual data for the current video block may include residual video blocks corresponding to different sample components of samples in the current video block.
In other examples, residual data for the current video block may not be present for the current video block, e.g., in skip mode, and residual generation unit 207 may not perform the subtraction operation.
Transform processing unit 208 may generate one or more transform coefficient video blocks for the current video block by applying one or more transforms to a residual video block associated with the current video block.
After transform processing unit 208 generates a transform coefficient video block associated with the current video block, quantization unit 209 may quantize the transform coefficient video block associated with the current video block based on one or more Quantization Parameter (QP) values associated with the current video.
Inverse quantization unit 210 and inverse transform unit 211 may apply inverse quantization and inverse transform, respectively, to the transform coefficient video blocks to reconstruct residual video blocks from the transform coefficient video blocks. Reconstruction unit 212 may add the reconstructed residual video block to corresponding sample points from one or more prediction video blocks generated by prediction unit 202 to produce a reconstructed video block associated with the current block for storage in buffer 213.
After reconstruction unit 212 reconstructs the video blocks, a loop filtering operation may be performed to reduce video block artifacts in the video blocks.
Entropy encoding unit 214 may receive data from other functional components of video encoder 200. When entropy encoding unit 214 receives the data, entropy encoding unit 214 may perform one or more entropy encoding operations to generate entropy encoded data and output a bitstream that includes the entropy encoded data.
Fig. 25 is a block diagram illustrating an example of a video decoder 300, the video decoder 300 may be the video decoder 114 in the system 100 illustrated in fig. 23.
Video decoder 300 may be configured to perform any or all of the techniques of this disclosure. In the example of fig. 25, the video decoder 300 includes a plurality of functional components. The techniques described in this disclosure may be shared among various components of the video decoder 300. In some examples, the processor may be configured to perform any or all of the techniques described in this disclosure.
In the example of fig. 25, the video decoder 300 includes an entropy decoding unit 301, a motion compensation unit 302, an intra prediction unit 303, an inverse quantization unit 304, an inverse transform unit 305, and a reconstruction unit 306 and a buffer 307. In some examples, video decoder 300 may perform a decoding process that is generally the inverse of the encoding process described with respect to video encoder 200 (fig. 24).
The entropy decoding unit 301 may retrieve the encoded bitstream. The encoded bitstream may include entropy coded video data (e.g., encoded blocks of video data). The entropy decoding unit 301 may decode the entropy-coded video data, and from the entropy-decoded video data, the motion compensation unit 302 may determine motion information including a motion vector, a motion vector precision, a reference picture list index, and other motion information. For example, the motion compensation unit 302 may determine such information by performing AMVP and Merge modes.
The motion compensation unit 302 may generate motion compensated blocks, possibly based on interpolation filters, to perform the interpolation. An identifier for the interpolation filter used with sub-pixel precision may be included in the syntax element.
Motion compensation unit 302 may use interpolation filters used by video encoder 20 during encoding of video blocks to calculate interpolated values for sub-integer pixels of a reference block. The motion compensation unit 302 may determine an interpolation filter used by the video encoder 200 according to the received syntax information and generate a prediction block using the interpolation filter.
The motion compensation unit 302 may use some syntax information to determine the following: a block size for encoding frame(s) and/or slice(s) of an encoded video sequence, information describing how each macroblock of a picture of the encoded video sequence is partitioned, a mode indicating how each partition is encoded, one or more reference frames (and reference frame lists) for each inter-coded block, and other information for decoding the encoded video sequence.
The intra prediction unit 303 may form a prediction block from spatial neighboring blocks using, for example, an intra prediction mode received in a bitstream. The inverse quantization unit 303 inversely quantizes, i.e., dequantizes, the quantized video block coefficients provided in the bitstream and decoded by the entropy decoding unit 301. The inverse transform unit 303 applies inverse transform.
The reconstruction unit 306 may add the residual block to the corresponding prediction block generated by the motion compensation unit 202 or the intra prediction unit 303 to form a decoded block. A deblocking filter may also be applied to filter the decoded blocks to remove blocking artifacts, if desired. The decoded video blocks are then stored in a buffer 307, which buffer 307 provides reference blocks for subsequent motion compensation/intra prediction and also produces decoded video for presentation on a display device.
Some embodiments of the disclosed technology include making a decision or determination to enable a video processing tool or mode. In one example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in processing the video block, but may not necessarily modify the resulting bitstream based on the use of the tool or mode. That is, when a video processing tool or mode is enabled based on the decision or determination, the conversion from the video block to the bitstream representation of the video will use that video processing tool or mode. In another example, when a video processing tool or mode is enabled, the decoder will process the bitstream with the knowledge that the bitstream has been modified based on the video processing tool or mode. That is, the conversion from a bitstream representation of the video to video blocks will be performed using a video processing tool or mode that is enabled based on the decision or determination.
Some embodiments of the disclosed technology include making a decision or determination to disable a video processing tool or mode. In one example, when a video processing tool or mode is disabled, the encoder will not use that tool or mode at the time of the conversion of the video block to the bitstream representation of the video. In another example, when a video processing tool or mode is disabled, the decoder will process the bitstream knowing that the bitstream has not been modified using the video processing tool or mode that was enabled based on the decision or determination.
In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during a transition from a pixel representation of a video to a corresponding bitstream representation (or vice versa). The bitstream representation of the current video block may, for example, correspond to bits that are co-located within the bitstream or distributed at different locations within the bitstream, as defined by the syntax. For example, a macroblock may be encoded according to the transformed and encoded error residual values, and may also be encoded using bits in the header and bits in other fields in the bitstream.
The disclosure and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standard stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few embodiments and examples are described and other embodiments, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims (68)

1. A video processing method, comprising:
determining, for a transition between a video unit of a video and a bitstream representation of the video, whether a Separable Secondary Transform (SST) tool is enabled for the video unit; and
performing the conversion based on the determination.
2. The method of claim 1, wherein the determining is based on a syntax structure associated with the video unit.
3. The method of claim 2, wherein the syntax structure is included in the bitstream representation, and wherein the syntax structure is coded with at least one context used in arithmetic coding.
4. The method of claim 2, wherein the syntax structure is omitted from the bitstream representation based on characteristics of the video unit.
5. The method of claim 4, wherein the characteristic of the video unit comprises at least a dimension of the video unit, a codec mode of the video unit, or a syntax flag associated with the video unit.
6. The method of claim 4, wherein the syntax structure is omitted from the bitstream representation in the event that a codec block flag associated with the video unit is equal to 0.
7. The method of any of claims 2-6, wherein the video unit comprises a codec block or a transform block, and wherein the syntax structure comprises a codec unit or a transform unit.
8. The method of any of claims 2-6, wherein the video unit comprises a picture, and wherein the syntax structure comprises a picture header or a picture parameter set.
9. The method of any of claims 2-6, wherein the video unit comprises a slice, and wherein the syntax structure comprises a slice header, a sequence header, or a sequence parameter set.
10. The method of any of claims 2-9, wherein the syntax structure comprises a video parameter set, a decoder parameter set, an adaptive parameter set, a slice group, a slice, a row of coding tree elements, or a coding tree element.
11. The method of claim 1, wherein the determining is based on a characteristic of the video unit.
12. The method of claim 11, wherein the characteristic of the video unit comprises at least a codec mode of the video unit, an intra prediction mode of the video unit, a type of a primary transform of the video unit, or a dimension of the video unit.
13. The method of claim 11, wherein the video unit comprises a block of the video, and wherein the characteristic of the video unit comprises a dimension of the block.
14. The method of claim 13, wherein the SST tool is disabled if at least one of the width or height of the block is less than or equal to a threshold Tmin.
15. The method of claim 14, wherein Tmin is equal to 2 or 4.
16. The method of claim 13, wherein the SST tool is disabled if at least one of the width or height of the block is greater than or equal to a threshold Tmax.
17. The method of claim 16, wherein Tmax is equal to 32, 64, or 128.
18. The method of any of claims 13 to 17, wherein the determining is based on a width of the block and a height of a color component of the block.
19. The method of claim 18, wherein the color component comprises a luminance component or an R color component.
20. The method of any of claims 13 to 17, wherein the determining is based on a width of the block and heights of all color components of the block.
21. The method of any one of claims 1 to 20, wherein signaling the use of the SST tool or information related to the SST tool is omitted from the bitstream representation in the event that the SST tool is determined to be disabled.
22. The method of any one of claims 1-21, wherein the SST is enabled for a first color component of the video unit and disabled for a second color component of the video unit.
23. A video processing method, comprising:
determining, for a transition between a video unit of a video and a bitstream representation of the video, a manner of indicating a use of a transformation tool or a transformation matrix used by the transformation tool based on a bottom right location (SRx, SRy) of a scan area; and
performing the conversion based on the determination.
24. The method of claim 23, wherein the transformation means comprises at least a Separable Secondary Transformation (SST), an inseparable secondary transformation, or a primary transformation.
25. The method of claim 23 or 24, wherein the indication of the use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation in case SRx is greater than or equal to a first threshold and/or SRy is greater than or equal to a second threshold.
26. The method according to any of claims 23 to 25, wherein the indication of the use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation in case SRx is less than or equal to a third threshold and/or SRy is less than or equal to a fourth threshold.
27. The method of any of claims 23 to 26, wherein the transformation tool is considered disabled in the event that the use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation.
28. The method of any of claims 23 to 26, wherein a default transformation matrix is used in case the use of the transformation tool or the indication of the transformation matrix is omitted in the bitstream representation.
29. The method of any of claims 23 to 28, wherein coefficients outside the scan region are zero.
30. A video processing method, comprising:
determining, for a transition between a block of a video and a bitstream representation of the video, a transform matrix for use in a quadratic transform separable (SST) tool based on a characteristic of the block, wherein the SST tool provides a set of available transform matrices; and
performing the conversion based on the determination.
31. The method of claim 30, wherein the characteristics of the block comprise at least a dimension of the block, an intra prediction mode of the block, quantized/unquantized coefficients after applying a transform, a color component of the block, or a type of a primary transform of the block.
32. The method of claim 30 or 31, wherein the same syntax element associated with the use of the SST tool indicates different transform matrices for blocks of different dimensions.
33. The method of any of claims 30 to 32, wherein the set of available transform matrices includes at least a 4x4 matrix, an 8x8 matrix, a.
34. The method of claim 33, wherein the transform matrix is determined to be the 4x4 matrix if the condition specifies that at least one of a width or a height of the block is equal to 4.
35. The method of claim 33, wherein the transform matrix is determined to be the 8x8 matrix if the condition specifies that at least one of a width or a height of the block is equal to 8.
36. The method of claim 33, wherein the transform matrix is determined to be the 8x8 matrix if the condition specifies that at least one of a width or a height of the block is greater than or equal to 8.
37. The method of claim 33, wherein the transform matrix is determined to be the nxn matrix if the condition specifies that at least one of a width or a height of the block is equal to N.
38. The method of claim 33, wherein the transform matrix is determined to be the nxn matrix if the condition specifies that at least one of a width or a height of the block is greater than or equal to N.
39. The method of any of claims 30 to 38, wherein the transform matrix is determined to be the nxn matrix to be applied to a top-left nxn sub-block of the block.
40. The method of any of claims 30 to 38, wherein the transform matrix is determined to be an nxn matrix to be applied to a different nxn sub-block of the block than a top left nxn sub-block.
41. The method of claim 40, wherein the NxN sub-block is right-adjacent or bottom-adjacent to the top NxN sub-block.
42. The method of any one of claims 30 to 41, wherein the manner in which the SST tool is applied is based on the characteristics of the block, including horizontal and/or vertical application of the SST tool.
43. The method of claim 42, wherein a first transform matrix is determined to be available as a horizontal transform and a second transform matrix is determined to be available as a vertical transform for the block, and wherein the first transform matrix and the second transform matrix are different.
44. The method of claim 43, wherein the first transform matrix and the second transform matrix have different dimensions.
45. The method of any of claims 42-44, wherein the first transform matrix is of size NxN, the second transform matrix is of size MxM, and the block has dimensions WxH, and wherein determining M or N is based on at least one of W or H.
46. The method of claim 45, wherein N is determined to be W1 where W1 is equal to 4 or 8 where W is equal to W1.
47. The method of claim 45, wherein, in the event W is greater than or equal to W2, N is determined to be W2, wherein W2 is 4 or 8.
48. The method of claim 45, wherein M is determined to be H1 where H1 is equal to 4 or 8 where H is equal to H1.
49. The method of claim 45, wherein M is determined to be H2, where H2 is equal to 4 or 8, where H is greater than or equal to H2.
50. The method of any one of claims 30 to 49, wherein the set of available transformation matrices comprises at least two matrices having the same dimensions.
51. The method of claim 50, wherein the transform matrix is indicated using a syntax element in the bitstream representation.
52. The method of claim 50, wherein the transform matrix is derived without any indication in the bitstream representation.
53. The method of any of claims 30 to 52, wherein different transform matrices are applied to different color components of the block.
54. The method of claim 53, wherein for each color component of the block, a transformation matrix is independently determined based on characteristics of the block.
55. The method of claim 54, wherein the use of the transform matrix for each component of the block is signaled separately in the bitstream.
56. The method of any of claims 30 to 52, wherein the same transformation matrix is applied to all color components of the block.
57. The method of claim 56, wherein applying the transform matrix to a first color component of the block is first determined based on characteristics of the block, and wherein then determining to apply the same transform matrix to all remaining color components of the block.
58. The method of claim 56, wherein the first color component comprises a luma component, a Cb component, or a Cr component.
59. The method of claim 57 or 58, wherein the SST tool is disabled for a second color component of the remaining color components if the transformation matrix is not applied to the second color component.
60. A method as claimed in any one of claims 30 to 59, wherein the SST tool is enabled if the transformation matrices applicable to the different colour components of the block are the same.
61. A method as claimed in any one of claims 30 to 60, wherein in the event that it is determined that the SST tool is disabled, signaling of the use of the SST tool or information related to the SST tool is omitted from the bitstream representation.
62. The method of any one of claims 30 to 61, wherein different transform matrices are applied for different primary transform types.
63. The method of claim 62, wherein the different primary transform types include at least DCT2 or DST 7.
64. The method of any one or more of claims 1-63, wherein performing the conversion comprises generating the bitstream representation based on the video.
65. The method of any one or more of claims 1-63, wherein performing the conversion comprises generating the video from the bitstream representation.
66. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-65.
67. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for implementing a method according to any one of claims 1 to 65.
68. A computer readable medium having stored thereon a bitstream representation of a video, the bitstream representation generated according to the method of any one or more of claims 1 to 65.
CN202080083999.4A 2019-12-02 2020-12-02 Scalable secondary transform processing of coded video Pending CN115066899A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2019122366 2019-12-02
CNPCT/CN2019/122366 2019-12-02
PCT/CN2020/133273 WO2021110018A1 (en) 2019-12-02 2020-12-02 Separable secondary transform processing of coded video

Publications (1)

Publication Number Publication Date
CN115066899A true CN115066899A (en) 2022-09-16

Family

ID=76221456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080083999.4A Pending CN115066899A (en) 2019-12-02 2020-12-02 Scalable secondary transform processing of coded video

Country Status (2)

Country Link
CN (1) CN115066899A (en)
WO (1) WO2021110018A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499641A (en) 2019-05-10 2024-02-02 北京字节跳动网络技术有限公司 Conditional use of simplified quadratic transforms for video processing
KR20220016844A (en) * 2019-06-07 2022-02-10 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Conditional signaling of reduced quadratic transform in video bitstreams
KR20220038682A (en) 2019-08-03 2022-03-29 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Selection of matrices for reduced quadratic transformation in video coding
CN114223208B (en) 2019-08-17 2023-12-29 北京字节跳动网络技术有限公司 Context modeling for side information of reduced secondary transforms in video

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109076242B (en) * 2016-05-13 2023-01-03 索尼公司 Image processing apparatus and method
CN109644269B (en) * 2016-08-24 2022-02-11 索尼公司 Image processing apparatus, image processing method, and storage medium
EP3567858A4 (en) * 2017-01-03 2020-06-17 LG Electronics Inc. -1- Method and device for encoding/decoding video signal using secondary transform
EP3349451A1 (en) * 2017-01-11 2018-07-18 Thomson Licensing Method and apparatus for selecting a coding mode used for encoding/decoding a residual block

Also Published As

Publication number Publication date
WO2021110018A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
WO2021083257A1 (en) Cross-component adaptive loop filter
CN113728636B (en) Selective use of quadratic transforms in codec video
WO2020228717A1 (en) Block dimension settings of transform skip mode
WO2021083376A1 (en) Derivation of linear parameter in cross-component video coding
WO2021110018A1 (en) Separable secondary transform processing of coded video
US20220078424A1 (en) Sub-block based use of transform skip mode
WO2021104409A1 (en) Cross-component adaptive filtering and subblock coding
CN117319649A (en) Residual coding of transform skipped blocks
CN114270838A (en) Signaling of transition skip mode
WO2020228718A1 (en) Interaction between transform skip mode and other coding tools
WO2021190593A1 (en) Coded video processing using enhanced secondary transform
WO2021190440A1 (en) Using neighboring samples in cross-component video coding
WO2020253642A1 (en) Block size dependent use of secondary transforms in coded video
WO2021209065A1 (en) Use restrictions for cross-component prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination