CN117981306A

CN117981306A - Independent history-based rice parameter derivation for video coding

Info

Publication number: CN117981306A
Application number: CN202280057673.3A
Authority: CN
Inventors: 余越; 于浩平
Original assignee: Innopeak Technology Inc
Current assignee: Innopeak Technology Inc
Priority date: 2021-08-26
Filing date: 2022-08-25
Publication date: 2024-05-03

Abstract

In some embodiments, a video decoder decodes video from a bitstream of the video using history-based rice parameter derivation. The video decoder accesses a binary string representing a partition of the video and processes each Coding Tree Unit (CTU) in the partition to generate decoded coefficient values in the CTU. The process comprises the following steps: the method comprises updating a replacement variable of a Transform Unit (TU) in the CTU for calculating a rice parameter, wherein the updating operation is performed independently of a previous TU or CTU. The process further includes: calculating a rice parameter of the TU in the CTU based on the value of the substitution variable; based on the calculated rice parameter, the binary string corresponding to the TU is decoded into a coefficient value. The pixel values of the TUs may be determined from the decoded coefficient values for output.

Description

Independent history-based rice parameter derivation for video coding

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application No. 63/260,604 entitled "independent history based rice parameter derivation (IndependentHistory Based RICE PARAMETER Derivations for Video Coding) for video encoding" filed on month 8, 26 of 2021, U.S. provisional application No. 63/248,289 entitled "independent history based rice parameter derivation (IndependentHistory Based RICE PARAMETER Derivations for Video Coding) for video encoding" filed on month 9, 2021, U.S. provisional application No. 63/248,819 entitled "independent history based rice parameter derivation (IndependentHistory Based RICE PARAMETER Derivations for Video Coding) for video encoding" filed on month 9, 27 of 2021, and U.S. provisional application No. 63/250,964 entitled "independent history based rice parameter derivation (IndependentHistory Based RICE PARAMETER Derivations for Video Coding) for video encoding" filed on month 9, 30 of 2021, which are incorporated herein by reference in their entirety.

Technical Field

The present disclosure relates generally to computer-implemented methods and systems for video processing. In particular, the present disclosure relates to independent history-based rice parameter (RICE PARAMETER) derivation for video encoding.

Background

The ubiquitous camera-enabled devices, such as smartphones, tablets, and computers, make capturing video or images easier than ever. However, even short videos may have a very large data size. Video encoding techniques (including video encoding and decoding) enable compression of video data to smaller sizes, thereby enabling storage and transmission of various videos. Video coding has been used in a wide range of applications such as digital television broadcasting, video transmission over the internet and mobile networks, real-time applications such as video chat, video conferencing, DVD, blu-ray disc, etc. In order to reduce the storage space for storing video and/or the network bandwidth consumption for transmitting video, it is desirable to increase the efficiency of the video coding scheme.

Disclosure of Invention

Some embodiments relate to independent history-based rice parameter derivation for video encoding. In one example, a method for decoding video includes: accessing a binary string representing a partition of the video, the partition comprising a plurality of Coding Tree Units (CTUs); decoding each of a plurality of CTUs in a partition, the decoding of the CTUs including decoding a Transform Unit (TU) of the CTU, as follows: updating a replacement variable HistValue of the TU for calculating the rice parameter, wherein updating the replacement variable HistValue is performed independently of another TU located before the TU among the CTUs and another CTU located before the CTU among the CTUs; calculating the rice parameter of TU in the CTU based on the updated replacement variable HistValue; decoding a binary string corresponding to a TU in the CTU into a coefficient value of the TU based on the calculated rice parameter; and determining pixel values of TUs in the CTU according to the coefficient values; and outputting a decoded partition of the video, the decoded partition including the plurality of CTUs decoded in the partition.

In another example, a non-transitory computer-readable medium has program code stored thereon that is executable by one or more processing devices to perform a plurality of operations. The plurality of operations includes: accessing a binary string representing a partition of the video, the partition comprising a plurality of Coding Tree Units (CTUs); decoding each of a plurality of CTUs in a partition, the decoding of the CTUs including decoding a Transform Unit (TU) of the CTU, as follows: updating a replacement variable HistValue of the TU for calculating the rice parameter, wherein updating the replacement variable HistValue is performed independently of another TU located before the TU among the CTUs and another CTU located before the CTU among the CTUs; calculating the rice parameter of TU in the CTU based on the updated replacement variable HistValue; decoding a binary string corresponding to a TU in the CTU into a coefficient value of the TU based on the calculated rice parameter; and determining pixel values of TUs in the CTU according to the coefficient values; and outputting a decoded partition of the video, the decoded partition including the plurality of CTUs decoded in the partition.

In another example, a system includes a processing device and a non-transitory computer readable medium communicatively coupled to the processing device. The processing device is configured to execute program code stored in a non-transitory computer readable medium to perform operations comprising: accessing a binary string representing a partition of the video, the partition comprising a plurality of Coding Tree Units (CTUs); decoding each of a plurality of CTUs in a partition, the decoding of the CTUs including decoding a Transform Unit (TU) of the CTU, as follows: updating a replacement variable HistValue of the TU for calculating the rice parameter, wherein updating the replacement variable HistValue is performed independently of another TU located before the TU among the CTUs and another CTU located before the CTU among the CTUs; calculating the rice parameter of TU in the CTU based on the updated replacement variable HistValue; decoding a binary string corresponding to a TU in the CTU into a coefficient value of the TU based on the calculated rice parameter; and determining pixel values of TUs in the CTU according to the coefficient values; and outputting a decoded partition of the video, the decoded partition including the plurality of CTUs decoded in the partition.

In another example, a method for encoding video, comprising: accessing a partition of the video, the partition including a plurality of Coding Tree Units (CTUs); processing a partition of a video to generate a binary representation of the partition, the processing comprising: encoding each CTU of a plurality of CTUs in a partition, the encoding the CTU including encoding a Transform Unit (TU) of the CTU, as follows: updating a replacement variable HistValue of the TU for calculating the rice parameter, wherein updating the replacement variable HistValue is performed independently of (a) another TU of the CTU that is located before the TU and (b) another CTU of the plurality of CTUs that is located before the CTU; calculating the rice parameter of TU in the CTU based on the updated replacement variable HistValue; and encoding coefficient values of TUs into binary representations corresponding to TUs in the CTU based on the calculated rice parameters; and encoding the binary representation of the partition into a bitstream of the video.

In another example, a non-transitory computer-readable medium has program code stored thereon, the program code executable by one or more processing devices to perform operations comprising: accessing a partition of the video, the partition including a plurality of Coding Tree Units (CTUs); processing a partition of a video to generate a binary representation of the partition, the processing comprising: encoding each CTU of a plurality of CTUs in a partition, the encoding the CTU including encoding a Transform Unit (TU) of the CTU, as follows: updating a replacement variable HistValue of the TU for calculating the rice parameter, wherein updating the replacement variable HistValue is performed independently of another TU located before the TU among the CTUs and another CTU located before the CTU among the CTUs; calculating the rice parameter of TU in the CTU based on the updated replacement variable HistValue; and encoding coefficient values of TUs into binary representations corresponding to TUs in the CTU based on the calculated rice parameters; and encoding the binary representation of the partition into a bitstream of the video.

In another example, a system includes a processing device and a non-transitory computer readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer readable medium to perform operations comprising: accessing a partition of the video, the partition including a plurality of Coding Tree Units (CTUs); processing a partition of a video to generate a binary representation of the partition, the processing comprising: encoding each CTU of a plurality of CTUs in a partition, the encoding the CTU including encoding a Transform Unit (TU) of the CTU, as follows: updating a replacement variable HistValue of the TU for calculating the rice parameter, wherein updating the replacement variable HistValue is performed independently of another TU located before the TU among the CTUs and another CTU located before the CTU among the CTUs; calculating the rice parameter of TU in the CTU based on the updated replacement variable HistValue; and encoding coefficient values of TUs into binary representations corresponding to TUs in the CTU based on the calculated rice parameters; and encoding the binary representation of the partition into a bitstream of the video.

These illustrative embodiments are not mentioned to limit or define the disclosure, but to provide examples to aid understanding of the disclosure. Additional embodiments are discussed in, and further description is provided in, the detailed description.

Drawings

The features, embodiments and advantages of the present disclosure will be better understood when the following detailed description is read with reference to the accompanying drawings.

Fig. 1 is a block diagram illustrating an example of a video encoder configured to implement embodiments presented herein.

Fig. 2 is a block diagram illustrating an example of a video decoder configured to implement embodiments presented herein.

Fig. 3 depicts an example of coding tree unit partitioning of pictures in video according to some embodiments of the present disclosure.

Fig. 4 depicts an example of coding unit partitioning of coding tree units according to some embodiments of the present disclosure.

Fig. 5 depicts an example of an encoded block, wherein elements of the encoded block are processed with a predetermined order.

Fig. 6 depicts an example of a template pattern for calculating local summation variables for coefficients located near transform unit boundaries.

Fig. 7 depicts an example of a process for encoding a partition of video according to some embodiments of the present disclosure.

Fig. 8 depicts an example of a process for decoding partitions of video according to some embodiments of the present disclosure.

FIG. 9 depicts an example of a computing system that may be used to implement some embodiments of the present disclosure.

Detailed Description

Various embodiments provide independent history-based rice parameter derivation for video encoding. As discussed above, more and more video data is generated, stored, and transmitted. It would be advantageous to increase the efficiency of video coding techniques to represent video using less data without compromising the visual quality of the decoded video. One way to increase the coding efficiency is to compress the processed video samples into a binary stream using as few bits as possible by entropy coding.

In entropy coding, transform coefficient data is binarized into binary bins, and coding algorithms such as context-adaptive binary arithmetic coding (CABAC) may further compress the bins into bits. Binarization requires calculation of binarization parameters, such as rice parameters used in a combination of Truncated Rice (TR) and limited k-th order exponential golomb (EGk) binarization processes specified in the general video coding (VVC) specification. In order to improve the coding efficiency, a history-based rice parameter derivation method is used. In this method, rice parameters of Transform Units (TUs) in a current Coding Tree Unit (CTU) of a partition (e.g., a picture, slice, or tile) are derived based on a history counter (denoted StatCoeff) that is calculated from coefficients in previous CTUs in the partition and previous TUs in the current CTU. The history counter is then used to derive a replacement variable (denoted HistValue) for deriving the rice parameter. The history counter may be updated as TUs are processed and updated based on previous StatCoeff derived from previous CTUs or previous TUs in the current CTU. As a result, the history value used in deriving rice parameters for all scan positions in the current TU is based on the information of the previous TUs. This establishes a dependency between the current TU and a previous TU in the current CTU, or even a previous CTU (in some cases). Such a design is not hardware friendly and may prevent parallel processing of multiple CTUs or TUs.

Various embodiments described herein address these issues as follows: by providing independent history-based rice parameter derivation, the rice parameter derivation for each TU is made independent of the rice parameter derivation for other TUs. The following non-limiting examples are provided to introduce some embodiments.

In one embodiment, the history counter StatCoeff is updated for each TU that needs to be derived based on the history-based rice parameters. The initial value of the history counter StatCoeff may be established and the history counter StatCoeff for each TU may be calculated based on the initial value of the history counter, rather than based on the history counter StatCoeff from the previous TUs. Once the history counter StatCoeff has been updated, the replacement variable HistValue is updated based on the value of the history counter StatCoeff. In this manner, the remaining locations within the current TU are encoded using the updated history counter StatCoeff and the corresponding updated substitution variable HistValue until the substitution variable HistValue and history counter StatCoeff are updated again. As a result, history counter StatCoeff is independent of history counter StatCoeff of the previous TUs. Likewise, the derived substitution variable HistValue is also independent of the substitution variable HistValue from the previous TU or CTU.

In another embodiment, the replacement variable HistValue for each TU is updated based on the quantization level of the first non-zero golomb-rice encoded transform coefficient in the TU that is encoded as abs_ remainder or dec_abs_level. Since the quantization levels of the first non-zero golomb-rice encoded transform coefficients in different TUs that are encoded as abs_ remainder or dec_abs_level are independent of each other, the derived substitution variables HistValue are also independent of each other. As a result, the history-based rice parameter derivation for each TU is independent of the rice parameter derivation for other TUs.

In a further embodiment, the replacement variable HistValue for each TU is updated based on the first non-zero quantization level in the TU, and the derived replacement variables HistValue are also independent of each other because the first non-zero quantization level in the different TUs is independent of each other. As a result, the history-based rice parameter derivation for each TU is independent of the rice parameter derivation for other TUs. Furthermore, the computational complexity involved in updating the replacement variable HistValue based on the first non-zero quantization level is much less than in the existing rice parameter derivation process, and thus the computational effort may be reduced. The updated substitution variable HistValue will be used to derive the rice parameters of the remaining abs_ remainder and dec_abs_level syntax elements within the current TU.

Using the rice parameters determined as discussed above, the video encoder may binarize prediction residual data (prediction residual data) (e.g., quantized transform coefficients of residual (quantized transformcoefficients of residuals)) into binary bins and further compress the bins into bits to be included in the video bitstream using an entropy coding algorithm. On the decoder side, the decoder may reverse decode the code stream into binary bins, determine the rice parameters using any of the methods or any combination of the methods described above, and then determine coefficients from the binary bins. The coefficients may be further inverse quantized and inverse transformed to reconstruct the video block for display.

As described herein, some embodiments provide improvements in video coding efficiency and computational efficiency by removing dependencies between a current TU and a previous TU or a previous CTU when deriving based on historical rice parameters. In so doing, parallel processing may be performed between different CTUs or TUs, thereby improving encoding and decoding speeds without compromising the stability of the encoding process. Furthermore, the computational complexity in deriving history-based rice parameters may be reduced, which further speeds up the process. These techniques may be efficient coding tools in future video coding standards.

Referring now to the drawings, FIG. 1 is a block diagram illustrating an example of a video encoder 100 configured to implement embodiments presented herein. In the example shown in fig. 1, video encoder 100 includes a segmentation module 112, a transformation module 114, a quantization module 115, an inverse quantization module 118, an inverse transformation module 119, a loop filter module 120, an intra prediction module 126, an inter prediction module 124, a motion estimation module 122, a decoded picture buffer 130, and an entropy encoding module 116.

The input to the video encoder 100 is an input video 102 that contains a sequence of pictures (also referred to as frames or images). In a block-based video encoder, for each picture, video encoder 100 employs a segmentation module 112 to segment the picture into blocks 104, each block containing a plurality of pixels. The block may be a macroblock, a coding tree unit, a coding unit, a prediction unit, and/or a prediction block. A picture may include blocks of different sizes, and the block partitions of different pictures of the video may also be different. Each block may be encoded using different predictions (e.g., intra-prediction or inter-prediction or a hybrid of intra-and inter-prediction).

Typically, the first picture of a video signal is an intra-predicted picture, which is encoded using only intra-prediction. In intra prediction mode, only data from the same picture is used to predict a block of a picture. Intra-predicted pictures may be decoded without information from other pictures. To perform intra prediction, the video encoder 100 shown in fig. 1 may employ an intra prediction module 126. The intra prediction module 126 is configured to generate an intra prediction block (prediction block 134) using reconstructed samples in a reconstructed block 136 of neighboring blocks of the same picture. Intra prediction is performed according to an intra prediction mode selected for the block. Then, the video encoder 100 calculates the difference between the block 104 and the intra prediction block 134. This difference is referred to as residual block 106.

To further remove redundancy from the block, the transform module 114 transforms the residual block 106 into the transform domain by applying a transform to the samples in the block. Examples of transforms may include, but are not limited to, discrete Cosine Transforms (DCTs) or Discrete Sine Transforms (DSTs). The transformed values may be referred to as transform coefficients, which represent residual blocks in the transform domain. In some examples, the residual block may be quantized directly without transformation by transformation module 114. This is referred to as a transform skip mode.

Video encoder 100 may further quantize the transform coefficients using quantization module 115 to obtain quantized coefficients. Quantization involves dividing the samples by a quantization step followed by rounding, while inverse quantization involves multiplying the quantization value by the quantization step. This quantization process is known as scalar quantization. Quantization is used to reduce the dynamic range of video samples (transformed or untransformed) so that fewer bits are used to represent the video samples.

Quantization of coefficients/samples within a block can be performed independently, which is used in some existing video compression standards (e.g., h.264 and HEVC). For an nxm block, the 2D coefficients of the block may be converted to a 1-D array using a particular scan order for coefficient quantization and encoding. Quantization of coefficients within a block may use scan order information. For example, the quantization of a given coefficient in a block may depend on the state of the previously quantized values along the scan order. To further increase coding efficiency, more than one quantizer may be used. Which quantizer to use for quantizing the current coefficient depends on the information preceding the current coefficient in the encoding/decoding scan order. This quantization method is called dependent quantization.

Quantization step sizes may be used to adjust the quantization level. For example, for scalar quantization, different quantization steps may be applied to achieve finer or coarser quantization. A smaller quantization step corresponds to finer quantization, while a larger quantization step corresponds to coarser quantization. The quantization step size may be indicated by a Quantization Parameter (QP). Quantization parameters are provided in the encoded bitstream of video so that a video decoder can apply the same quantization parameters for decoding.

The quantized samples are then encoded by entropy encoding module 116 to further reduce the size of the video signal. The entropy encoding module 116 is configured to apply an entropy encoding algorithm to the quantized samples. In some examples, the quantized samples are binarized into binary bins, and the encoding algorithm further compresses the binary bins into bits. Examples of binarization methods include, but are not limited to, truncated Rice (TR) and finite k-th order exponential golomb (EGk) binarization. In order to improve coding efficiency, a history-based rice parameter derivation method is used, in which rice parameters derived for a Transform Unit (TU) are based on variables obtained or updated from a previous TU. Examples of entropy coding algorithms include, but are not limited to, variable Length Coding (VLC) schemes, context adaptive VLC schemes (CAVLC), arithmetic coding schemes, binarization, context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or other entropy coding techniques. The entropy encoded data is added to the bitstream of the output encoded video 132.

As discussed above, reconstructed blocks 136 from neighboring blocks are used in intra prediction of a block of a picture. Generating a reconstructed block 136 of a block involves calculating a reconstructed residual for the block. The reconstructed residual may be determined by applying inverse quantization and inverse transform to the quantized residual of the block. The inverse quantization module 118 is configured to apply inverse quantization to the quantized samples to obtain inverse quantized coefficients. The inverse quantization module 118 applies an inverse of the quantization scheme applied by the quantization module 115 by using the same quantization step size as the quantization module 115. The inverse transform module 119 is configured to apply an inverse transform of the transform applied by the transform module 114 to the dequantized samples, such as an inverse DCT or an inverse DST. The output of inverse transform module 119 is the reconstructed residual of the block in the pixel domain. The reconstructed residual may be added to a prediction block 134 of the block to obtain a reconstructed block 136 in the pixel domain. For blocks for which the transform has been skipped, the inverse transform module 119 is not applied to those blocks. The dequantized samples are the reconstructed residuals of the block.

Inter-prediction or intra-prediction may be used to encode blocks in subsequent pictures after the first intra-predicted picture. In inter prediction, the prediction of a block in a picture is from one or more previously encoded video pictures. To perform inter prediction, the video encoder 100 uses an inter prediction module 124. The inter prediction module 124 is configured to perform motion compensation on the block based on the motion estimation provided by the motion estimation module 122.

Motion estimation module 122 compares current block 104 of the current picture with decoded reference picture 108 for motion estimation. The decoded reference picture 108 is stored in a decoded picture buffer 130. The motion estimation module 122 selects a reference block from the decoded reference pictures 108 that best matches the current block. The motion estimation module 122 further identifies an offset between the location of the reference block (e.g., x, y coordinates) and the location of the current block. This offset is referred to as a Motion Vector (MV) and is provided to the inter prediction module 124. In some cases, a plurality of reference blocks are identified for blocks in the plurality of decoded reference pictures 108. Accordingly, a plurality of motion vectors are generated and provided to the inter prediction module 124.

The inter prediction module 124 performs motion compensation using the motion vector and other inter prediction parameters to generate a prediction of the current block (i.e., the inter prediction block 134). For example, based on the motion vector, the inter prediction module 124 may locate the prediction block to which the motion vector points in the corresponding reference picture. If there is more than one prediction block, these prediction blocks are combined with some weights to generate the prediction block 134 of the current block.

For inter-prediction blocks, video encoder 100 may subtract inter-prediction block 134 from block 104 to generate residual block 106. The residual block 106 may be transformed, quantized, and entropy encoded in the same manner as the residual of the intra-prediction block discussed above. Likewise, the reconstructed block 136 of the inter prediction block may be obtained by inverse quantizing, inverse transforming the residual, and then combining with the corresponding prediction block 134.

To obtain the decoded picture 108 for motion estimation, the reconstruction block 136 is processed by the loop filter module 120. The loop filter module 120 is configured to smooth the pixel transitions, thereby improving video quality. Loop filter module 120 may be configured to implement one or more loop filters, such as a deblocking filter, or a Sample Adaptive Offset (SAO) filter, or an Adaptive Loop Filter (ALF), or the like.

Fig. 2 depicts an example of a video decoder 200 configured to implement embodiments presented herein. The video decoder 200 processes the encoded video 202 in the bitstream and generates decoded pictures 208. In the example shown in fig. 2, video decoder 200 includes entropy decoding module 216, inverse quantization module 218, inverse transform module 219, loop filter module 220, intra prediction module 226, inter prediction module 224, and decoded picture buffer 230.

The entropy decoding module 216 is configured to perform entropy decoding of the encoded video 202. The entropy decoding module 216 decodes the quantized coefficients, encoding parameters including intra-prediction parameters and inter-prediction parameters, and other information. In some examples, entropy decoding module 216 decodes the bitstream of encoded video 202 into a binary representation, and then converts the binary representation into quantization levels of the coefficients. The entropy decoded coefficients are then inverse quantized by inverse quantization module 218 and then inverse transformed to the pixel domain by inverse transform module 219. The inverse quantization module 218 and the inverse transformation module 219 function similarly to the inverse quantization module 118 and the inverse transformation module 119 described above with reference to fig. 1, respectively. The inverse transformed residual block may be added to the corresponding prediction block 234 to generate a reconstructed block 236. For blocks for which the transform has been skipped, the inverse transform module 219 is not applied to those blocks. The dequantized samples generated by the dequantization module 118 are used to generate the reconstruction block 236.

The prediction block 234 of a particular block is generated based on the prediction mode of the block. If the encoding parameters of the block indicate intra prediction of the block, a reconstructed block 236 of a reference block in the same picture may be fed into the intra prediction module 226 to generate a predicted block 234 of the block. If the coding parameters of the block indicate that the block is inter predicted, a prediction block 234 is generated by the inter prediction module 224. The intra-prediction module 226 and the inter-prediction module 224 function similarly to the intra-prediction module 126 and the inter-prediction module 124, respectively, of fig. 1.

As discussed above with respect to fig. 1, inter prediction involves one or more reference pictures. The video decoder 200 generates the decoded picture 208 of the reference picture by applying the loop filter module 220 to the reconstructed block of the reference picture. The decoded pictures 208 are stored in a decoded picture buffer 230 for use by the inter prediction module 224 and also for output.

Referring now to fig. 3, fig. 3 depicts an example of coding tree unit partitioning of pictures in video according to some embodiments of the present disclosure. As discussed above with respect to fig. 1 and 2, to encode a picture of video, the picture is divided into blocks, such as CTUs (coding tree units) 302 in VVCs, as shown in fig. 3. For example, CTU 302 may be a 128 x 128 pixel block. The CTUs are processed in a certain order, such as the order shown in fig. 3. In some examples, as shown in fig. 4, each CTU 302 in a picture may be divided into one or more CUs (coding units) 402, and the CUs 402 may be further divided into prediction units or Transform Units (TUs) for prediction and transformation. CTU 302 may be partitioned into CUs 402 in different ways depending on the coding scheme. For example, in VVC, CU 402 may be rectangular or square, and may be encoded without being further divided into prediction units or transform units. Each CU 402 may be as large as its root CTU 302 or may be a subdivision of the root CTU 302, as small as 4 x4 blocks. As shown in fig. 4, CTU 302 is divided into CUs 402 in VVC, which may be a quadtree division or a binary tree division or a trigeminal tree division. In fig. 4, the solid line indicates a quadtree division, and the broken line indicates a binary tree or a trigeminal tree division.

As discussed above with respect to fig. 1 and 2, quantization is used to reduce the dynamic range of elements of blocks in a video signal so that fewer bits are used to represent the video signal. In some examples, elements located at a particular position of a block are referred to as coefficients prior to quantization. After quantization, the quantized values of the coefficients are referred to as quantization levels or levels. Quantization typically involves dividing by a quantization step followed by rounding, while inverse quantization involves multiplying by the quantization step. This quantization process is also known as scalar quantization. Quantization of coefficients within a block may be performed independently, and such independent quantization methods are used in some existing video compression standards (e.g., h.264, HEVC, etc.). In other examples, such as in VVC, dependent quantization is employed.

For an nxm block, the 2-D coefficients of the block may be converted into a 1-D array using a particular scan order for coefficient quantization and encoding, and the same scan order for encoding and decoding. Fig. 5 shows an example of a coding block, such as a Transform Unit (TU), in which coefficients of the coding block are processed using a predetermined scan order. In this example, the size of the encoding block 500 is 8×8, and the process starts at the lower right corner at position L ₀ and ends at the upper left corner L ₆₃. If block 500 is a transform block, the predetermined sequence shown in FIG. 5 begins at the highest frequency and proceeds to the lowest frequency. In some examples, processing (e.g., quantization and binarization) of the block begins with a first non-zero element of the block according to a predetermined scan order. For example, if the coefficients at positions L ₀ to L ₁₇ are all zero and the coefficient at L ₁₈ is not zero, the process starts with the coefficient at L ₁₈ and is performed in scan order for each coefficient after L ₁₈.

Residual coding

In video coding, residual coding is used to convert quantization levels into a code stream. After quantization, there are n×m quantization levels for an n×m Transform Unit (TU) coded block. The nxm levels may be zero or non-zero values. If the level is not in binary form, the non-zero level will be further binarized into binary bins. Context-adaptive binary arithmetic coding (CABAC) may further compress bins into bits. Furthermore, there are two coding methods based on context modeling. Specifically, one of the methods adaptively updates a context model based on neighboring coding information. This method is called a context-coding method, and a bin coded in this way is called a context-coded bin. In contrast, the other approach assumes that the probability of 1 or 0 is always 50% and therefore always uses fixed context modeling without adaptation. This method is called the bypass method, and the bin encoded by this method is called the bypass bin.

For a Regular Residual Code (RRC) block in VVC, the position of the last non-zero level is defined as the position of the last non-zero level along the code scan order. The representation of the last non-zero level 2D coordinates (last_sig_coeff_x and last_sig_coeff_y) includes a total of 4 prefix and suffix syntax elements, which are last_sig_coeff_x_prefix, last_sig_coeff_y_prefix, last_sig_coeff_x_suffix, last_sig_coeff_y_suffix, respectively. The syntax elements last_sig_coeff_x_prefix and last_sig_coeff_y_prefix are first encoded using a context encoding method. If there are last_sig_coeff_x_suffix and last_sig_coeff_y_suffix, then the last_sig_coeff_x_suffix and last_sig_coeff_y_suffix are encoded using a bypass method. The RRC block may be composed of several predefined sub-blocks. The syntax element sb_coded_flag is used to indicate whether all levels of the current sub-block are equal to zero. If the sb_coded_flag is equal to 1, there is at least one non-zero coefficient in the current sub-block. If the sb_coded_flag is equal to 0, all coefficients in the current sub-block are zero. However, it is derived from last_sig_coeff_x and last_sig_coeff_y according to the coding scan order: the sb_coded_flag of the last non-zero sub-block with the last non-zero level is 1 instead of being encoded into the code stream. Furthermore, it is also derived that: the sb_coded_flag of the upper left sub-block containing the DC position is 1 instead of being encoded into the code stream. The syntax element of the sb_coded_flag in the code stream is encoded by a context encoding method. The RRC will start with the last non-zero sub-block and encode sub-blocks by sub-block in the reverse coding scan order as discussed above with respect to fig. 5.

To guarantee worst-case throughput, a predefined value remBinsPassl is used to limit the maximum number of context-encoded bins. Within one sub-block, the RRC will encode the level of each location in reverse code scan order. If remBinsPassl is greater than 4, when encoding the current level, a flag named sig_coeff_flag is first encoded into the bitstream to indicate whether the level is zero. If the level is not zero, abs_level_gtx_flag [ n ] [0] will be encoded to indicate whether the absolute level is 1 or greater than 1, where n is the index of the current position within the sub-block along the scan order. If the absolute level is greater than 1, the par_level_flag will be encoded to indicate whether the level is odd or even in the VVC, and then abs_level_gtx_flag [ n ] [ l ] will appear. The flags par_level_flag and abs_level_gtx_flag n are also used together to indicate a level of 2, or 3 or greater than 3. After each of the above syntax elements is encoded into a context-encoded bin, the value remBinsPassl is decremented by 1.

If the absolute level is greater than 3 or the value remBinsPassl is not greater than 4, after the aforementioned bin is encoded by the context encoding method, the other two syntax elements abs_ remainder and dec_abs_level may be encoded into bypass encoded bins for the remaining levels. Furthermore, the symbols of each level within the block will also be encoded to represent the quantization level, and encoded into bypass encoded bins.

Another residual coding method uses abs_level_ gtxX _flag and the remaining levels to allow conditional parsing of syntax elements for level coding of residual blocks, the corresponding binarization of the absolute value of the level (Absolution Value) is shown in table 1. Here, abs_level_ gtxX _flag describes whether the absolute value of the level is greater than X, where X is an integer, e.g., 0,1,2, or N. If abs_level_ gtxY _flag is 0, then the flag abs_level_gtx (Y+1), where Y is an integer between 0 and N-1, will not be present. If abs_level_ gtxY _flag is 1, then there will be a flag abs_level_gtx (Y+1). Further, if abs_level_ gtxN _flag is 0, there will be no remaining levels. When abs_level_ gtxN _flag is 1, there will be remaining levels, which represent the value after (n+1) is deleted from the level. Typically, abs_level_ gtxX _flag is encoded using a context coding method and the remaining levels are encoded using a bypass method, respectively.

Table 1 residual coding based on abs_level_ gtxX _flag and remainder

abs(level)	0	1	2	3	4	5	6	7	8	9	10	11	12	…
															abs_level_gtx0_flag	0	1	1	1	1	1	1	1	1	1	1	1	1	…
abs_level_gtx1_flag		0	1	1	1	1	1	1	1	1	1	1	1	…
															abs_level_gtx2_flag			0	1	1	1	1	1	1	1	1	1	1	…
abs_level_gtx3_flag				0	1	1	1	1	1	1	1	1	1	…
															The rest are					0	1	2	3	4	5	6	7	8	…

For blocks encoded in transform skip residual coding mode (TSRC), the TSRC will start from the upper left sub-block and encode sub-blocks by sub-block along the coding scan order. Similarly, a syntax element sb_coded_flag is used to indicate whether all residuals of the current sub-block are equal to zero. When a specific condition occurs, all syntax elements of the sb_coded_flag of all sub-blocks except the last sub-block are encoded into the bitstream. If all sb_coded_flag of all sub-blocks before the last sub-block are not equal to 1, it will be deduced that: the sb_coded_flag of the last sub-block is 1 and the flag is not coded into the code stream. To guarantee worst-case throughput, a predefined value RemCcbs is used to limit the maximum context coding bin. If the current sub-block has a non-zero level, the TSRC will encode the level of each location in the code scan order. If RemCcbs is greater than 4, the subsequent syntax elements will be encoded using the context encoding method. For each level, sig_coeff_flag is first encoded into the bitstream to indicate whether the level is zero. If the level is not zero, coeff_sign_flag will be encoded to indicate whether the level is positive or negative. Abs_level_gtx_flag [ n ] [0] will then be encoded to indicate whether the current absolute level of the current position is greater than 1, where n is the index of the current position within the sub-block along the scan order. If abs_level_gtx_flag [ n ] [0] is not zero, par_level_flag will be encoded. After each of the above syntax elements is encoded using the context encoding method, remCcbs is decremented by 1.

After encoding the above syntax elements for all positions within the current sub-block, if RemCcbs is still greater than 4, up to four more abs_level_gtx_flag [ n ] [ j ] will be encoded using the context encoding method, where n is the index of the current position within the sub-block along the scan order; j is from 1 to 4. After encoding each abs_level_gtx_flag [ n ] [ j ], remCcbs is decremented by 1. If RemCcbs is not greater than 4, the syntax element abs_ remainder will be encoded using the bypass method for the current position within the sub-block, if necessary. For those positions where the absolute level is encoded entirely by the bypass method using the syntax element abs_ remainder, the coeff_sign_flag is also encoded by the bypass method. In summary, there is a predefined counter remBinsPassl in RRC or RemCcbs in TSRC to limit the total number of context-encoded bins and ensure worst-case throughput.

Rice parameter derivation

In the current RRC design in VVC, two syntax elements abs_ remainder and dec_abs_level encoded into bypass bins may be present in the remaining levels of the code stream. Abs_ remainder and dec_abs_level are binarized by a combination of Truncated Rice (TR) and finite k-th order exponential golomb (EGk) binarization processes specified in the VVC specification, which requires rice parameters to binarize a given level. In order to obtain the optimal rice parameters, a local summation method is employed, as described below.

Array AbsLevel [ xC ] [ yC ] represents an array of absolute values of transform coefficient levels of the current transform block. Given an array AbsLevel [ x ] [ y ] of transform blocks with color component indices cIdx and the upper left corner luminance position (x 0, y 0), a local summing variable locSumAbs is derived, as specified by the following pseudocode process:

where log2TbWidth and log2TbHeight are the base 2 logarithms of the width and height, respectively, of the transformed block. For abs_ remainder and dec_abs_level, variables baseLevel are 4 and 0, respectively. Given the local summation variable locSumAbs, the rice parameter crriceparam is derived as specified in table 2.

TABLE 2-locSumAbs cRiceParam based Specification

locSumAbs	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
																	cRiceParam	0	0	0	0	0	0	0	1	1	1	1	1	1	1	2	2
locSumAbs	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
																	cRiceParam	2	2	2	2	2	2	2	2	2	2	2	2	3	3	3	3

History-based Lesi parameter derivation

If the coefficients lie at the TU boundaries, or are first decoded using the rice method, the template calculation for rice parameter derivation may produce inaccurate coefficient estimates. For these coefficients, the template calculation is biased towards 0, as some template positions may be outside of the TU and interpreted or initialized to a value of 0. Fig. 6 shows an example of a template pattern for calculating locSumAbs coefficients located near a TU boundary. Fig. 6 shows a CTU 602 divided into multiple CUs, each comprising multiple TUs. For TU 604, the position of the current coefficient is shown as a solid block, with the positions of its neighboring samples in the template pattern shown as patterned blocks. The patterned blocks indicate a predetermined neighborhood of current coefficients for calculating the local summation variables locSumAbs.

In fig. 6, because the current coefficient 606 is close to the boundary of the TU 604, some neighboring samples of the current coefficient 606 in the template pattern, such as neighboring samples 608B and 608E, are located outside the TU boundary. In the rice parameter derivation described above, when the local summation variable locSumAbs is calculated, these neighboring samples that lie outside the boundary are set to 0, resulting in inaccurate rice parameter derivation. For high bit depth samples (e.g., greater than 10 bits), neighboring samples that lie outside of the TU boundary may be large numbers. Setting these large numbers to 0 introduces more errors in the rice parameter derivation.

In order to improve the accuracy of rice parameter estimation from the calculation template, it is suggested to update the local summation variable locSumAbs with a history-derived value for template positions outside the current TU, instead of initializing with 0. The implementation of the method is described below by the VVC specification text extracted from clause 9.3.3.2, with the suggested text underlined.

To maintain a history of neighboring coefficient/sample values, three color components Y, U, V are represented, respectively, using a history counter StatCoeff [ cIdx ] for each color component, where cIdx = 0,1, 2. If the CTU is the first CTU in a partition (e.g., picture, slice, or tile), then StatCoeff [ cIdx ] is initialized as follows:

StatCoeff[idx] ＝ 2 * Floor(Log2( BitDepth - 10) (1)

Here BitDepth specifies the bit depth of the samples of the luminance and chrominance arrays of the video, floor (x) represents the largest integer less than or equal to x, log2 (x) is the base 2 logarithm of x. Prior to TU decoding and history counter updating, the replacement variable HistValue is initialized to:

HistValue[cIdx] ＝ 1 << StatCoeff[cIdx] (2)

The substitution variable HistValue is used as an estimate of neighboring samples that lie outside of the TU boundary (e.g., neighboring samples have horizontal or vertical coordinates that lie outside of the TU). The local summation variables locSumAbs are re-derived, as specified by the following pseudocode process, with modifications underlined:

Prior to encoding each TU, the historical value HistValue calculated according to equation (2) is used to derive the rice parameter for encoding (if possible) the first syntax element abs_ remainder or dec_abs_level of each TU.

For each TU, the history counter StatCoeff is updated once by an exponential moving average process from the first non-zero golomb-rice encoded transform coefficient (abs_ remainder [ cIdx ] or dec_abs_level [ cIdx ]). When the first non-zero golomb-rice encoded transform coefficient in the TU is encoded into abs_ remainder, the history counter StatCoeff for the color component cIdx is updated as follows:

StatCoeff[cIdx] ＝ (StatCoeff[cIdx]+Floor(Log2(abs_remainder[cIdx]))+2)>>1 (3)

when the first non-zero golomb-rice encoded transform coefficient in the TU is encoded into dec_abs_level, the history counter StatCoeff for the color component cIdx is updated as follows:

StatCoeff[cIdx] ＝ (StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1 (4)

the updated StatCoeff may be used to calculate the replacement variable HistValue for the next TU according to equation (2) before decoding the next TU.

Although according to the above, the history counter may be updated as TUs are processed, and updated based on previous StatCoeff derived from previous CTUs or previous TUs in the current CTU. As a result, the history value used in deriving rice parameters for all scan positions in the current TU is based on the information of the previous TUs. This establishes a dependency between the current TU and a previous TU in the current CTU, or even a previous CTU (in some cases). Such a design is not hardware friendly and may prevent parallel processing of multiple CTUs or TUs. In this disclosure, embodiments are described to provide independent history-based rice parameter derivation such that the rice parameter derivation for each TU is independent of the rice parameter derivation for other TUs.

StatCoff initialization

During encoding, an initial value of the history counter StatCoeff for each color component cIdx is defined, which is denoted StatCoff _init [ cIdx ]. Such values may be fixed for all frames. Alternatively or additionally, different types of frames (e.g., I-frames, P-frames, B-frames) may have different initial values of StatCoff _init [ cIdx ]. By way of example, statCoff _init [ cIdx ] may be defined as:

StatCoeff_init[idx] ＝ 2 * Floor(Log2( BitDepth - 10) (5)

Wherein BitDepth specifies the bit depth of the samples of the luma or chroma array, floor (x) represents the largest integer less than or equal to x.

In another example, statCoeff _init [ cIdx ] may be defined as:

StatCoeff_init[idx] ＝ Clip(MIN_Stat, MAX_Stat, (int) ((19 - QP)/6))-l (6)

Where min_stat, max_stat are two predefined integers, QP is the initial quantization parameter for each slice, clip () is an operation defined as follows:

Each slice may also have StatCoff _init for each color component separately, and StatCoff _init is specified at the slice header. Exemplary grammar modifications are shown below, with underlining:

7.3.7 slice header syntax

StatCoeff _init specifies the value of HistValue to be used in computing the residual_coding () syntax structure in the current slice. When statCoeff _init is not present, the value of statCoeff _init is inferred to be equal to 0.

Before encoding each TU, a history value HistValue is calculated according to equation (8). The calculated history value is used to derive rice parameters (if possible) for encoding the first syntax element abs_ remainder or dec_abs_level for each TU. In another example, if the last significant coefficient (e.g., the last non-zero coefficient) in the TU has an abs_ remainder [ ] portion at its level encoding, the calculated history value may be used to derive the rice parameter used to encode the remaining levels of the last significant coefficient.

HistValue＝1<<StatCoeff_init[cIdx] (8)

Although StatCoeff _init [ cIdx ] is defined for each slice in the slice header in the above example, statCoeff _init [ cIdx ] may be defined for other types of partitions (e.g., frames or tiles).

StatCoff updates

In one embodiment, for each TU that needs to be derived based on the history-based rice parameter, the history counter StatCoeff is updated based on the initial history counter value StatCoeff _init, rather than based on the history counter StatCoeff from the previous TU. Once the history counter StatCoeff has been updated, the replacement variable HistValue is updated based on the value of the history counter StatCoeff. In this manner, the remaining locations within the current TU are encoded using the updated history counter StatCoeff and the corresponding updated substitution variable HistValue until the substitution variable HistValue and history counter StatCoeff are updated again. As a result, history counter StatCoeff is independent of history counter StatCoeff of the previous TUs. Likewise, the derived substitution variable HistValue is also independent of the substitution variable HistValue from the previous TU or CTU.

In this embodiment, when the first non-zero golomb-rice encoded transform coefficient in a TU is encoded into abs_ remainder, the history counter for the color component cIdx is updated as follows:

StatCoeff[cIdx]＝(StatCoeff_init[cIdx]+Floor(Log2(abs_remainder[cIdx]))+2)>>1 (9)

When the first non-zero golomb-rice encoded transform coefficient in the TU is encoded into dec_abs_level, the history counter for the color component cIdx is updated as follows:

StatCoeff[cIdx]＝(StatCoeff_init[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1

(10) Once history counter StatCoeff [ cIdx ] has been updated, histValue will be updated in combination according to equation (2), and updated HistValue will be used to derive the rice parameters of the remaining syntax elements abs_ remainder and dec_abs_level until new StatCoeff [ cIdx ] and HistValue are updated again.

Exemplary modifications to the VVC specification are specified below.

Clause 7.3.11.11 (residual coding syntax) is modified as follows (the added content is underlined):

/>

In this embodiment, when the first non-zero Columbus-rice encoded transform coefficient in the TU is encoded as abs_ remainder or dec_abs_level, histValue will be updated based on the quantization level of the transform coefficient according to equation (11), the updated HistValue will be used to derive the rice parameters of the remaining syntax elements abs_ remainder and dec_abs_level,

HistValue＝abs(level) (11)

Where abs (x) represents the absolute value of x, and level is the quantization level at the current position (i.e., the quantization level of the first non-zero golomb-rice encoded transform coefficient in the TU that is encoded into abs_ remainder or dec_abs_level).

Exemplary modifications to the VVC specification are specified below.

Clause 7.3.11.11 (residual coding syntax) is modified as follows (added content is underlined, deleted content is shown in strikethrough):

/>

In a further embodiment, the replacement variable HistValue for each TU is updated based on the first non-zero quantization level in the TU. Since the first non-zero quantization levels in different TUs are independent of each other, the derived substitution variables HistValue are also independent of each other. As a result, the history-based rice parameter derivation for each TU is independent of the rice parameter derivation for other TUs. Furthermore, since the last significant coefficient in the TU (the first non-zero coefficient to be encoded) is closer to the boundary of the TU, that coefficient is a better estimate of neighboring coefficient values that lie outside the TU boundary as shown in fig. 6. Thus, using the quantization level of the last significant coefficient as HistValue, a better estimate can be provided for neighboring coefficients that lie outside the boundary, which makes the estimation of the rice parameter more accurate.

In this embodiment, after encoding the last significant coefficient (first non-zero quantization level) in the TU, histValue will be updated using the absolute level of the last significant coefficient, as shown in equation (12), updated HistValue will be used to derive the rice parameters of the remaining syntax elements abs_ remainder and dec_abs_levels within the current TU,

HistValue＝abs(level) (12)

Where abs (x) represents the absolute value of x.

Exemplary modifications to the VVC specification are specified below.

/>

In this example, in response to determining that rice parameter derivation (indicated by sps_persistence_feature_adaptation_enabled_flag) is enabled and that the current position is the last significant coefficient, the replacement variable HistValue is updated. Flag updateHist is no longer needed and can be removed.

In another example, the modification to the VVC specification may be made as follows.

/>

In this example, in response to determining that flag updateHist indicates that replacement variable HistValue has not been updated (e.g., has a value of 1), replacement variable HistValue is updated. After HistValue has been updated, flag updateHist is changed to 0 to indicate that the replacement variable HistValue for that TU has been updated so that HistValue need not be updated again for the TU.

In a further example, the modification to the VVC specification may be made as follows.

/>

In this example, for an entire partition (e.g., slice, frame, or tile), the initial values of the substitution variables HistValue are calculated as follows:

HistValue＝sps_persistent_rice_adaptation_enabled_flag1<<StatCoeff_init[cIdx]:0 (13)

Before encoding each TU, the initial history value may be used to derive rice parameters (if possible) for encoding the first syntax element abs_ remainder or dec_abs_level for each TU. In another case, if the last significant coefficient in the TU has an abs_ remainder [ ] portion at its level encoding, the calculated history value may be used to derive the rice parameter used to encode the remaining levels of the last significant coefficient. Since the initial values of the replacement variables HistValue are defined at the partition level, initialization HistValue for each TU is not required. Thus, the initialization of HistValue in the above table is removed.

As can be seen from the above example of this embodiment, by updating HistValue based on the first non-zero quantization level in the TU, the computation and updating of the history counter StatCoeff can be cancelled, thereby reducing the computational complexity of the encoding process.

Fig. 7 depicts an example of a process 700 for encoding a partition of video according to some embodiments of the present disclosure. One or more computing devices (e.g., a computing device implementing the video encoder 100) implement the operations depicted in fig. 7 by executing suitable program code (e.g., program code implementing the entropy encoding module 116). For purposes of illustration, process 700 is described with reference to some examples depicted in the figures. However, other implementations are possible.

At block 702, process 700 involves accessing a partition of a video signal. A partition may be a video frame, slice, or tile, or any type of partition that is processed as a unit by a video encoder when performing encoding. The partition includes a set of CTUs. Each CTU includes one or more CUs, and each CU includes a plurality of TUs for encoding, as shown in the example of fig. 6

At block 704, process 700 involves processing each CTU in a set of CTUs in a partition to encode the partition into bits, block 704 including 706 through 712. At block 706, process 700 involves updating a replacement variable HistValue for a TU in a CTU independently of a previous TU or CTU. As discussed above, in some embodiments, updating the replacement variable HistValue may be performed each time the history counter StatCoeff is updated. For example, when the first non-zero golomb-rice encoded transform coefficient in the TU is encoded into abs_ remainder and dec_abs_level, respectively, the update history counter StatCoeff is performed according to equation (9) and equation (10).

In another embodiment, the substitution variable HistValue may be updated based on the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level. HistValue may be updated based on the quantization level of the transform coefficient according to equation (11), and the updated HistValue will be used to derive the rice parameters for the remaining syntax elements abs_ remainder and dec_abs_level. In a further embodiment, as shown in equation (12), histValue is updated using the absolute level of the last significant coefficient of the TU, and the updated HistValue is used to derive the rice parameters for the remaining syntax elements abs_ remainder and dec_abs_level within the current TU.

At block 708, rice parameters for TUs in the CTU are calculated based on the updated substitution variables HistValue, as discussed above. At block 710, process 700 involves encoding TUs in a CTU into a binary representation based on the calculated rice parameters, e.g., by a combination of Truncated Rice (TR) and finite k-order EGk specified in the VVC specification. At block 712, process 700 involves encoding a binary representation of a CTU into bits for inclusion in a video bitstream. For example, the encoding may be performed using context-adaptive binary arithmetic coding (CABAC) as discussed above. At block 714, process 700 involves outputting an encoded video bitstream.

Fig. 8 depicts an example of a process 800 for decoding partitions of video according to some embodiments of the present disclosure. One or more computing devices implement the operations depicted in fig. 8 by executing suitable program code. For example, a computing device implementing the video decoder 200 may implement the operations depicted in fig. 8 by executing program code for the entropy decoding module 216, the inverse quantization module 218, and the inverse transform module 219. For purposes of illustration, the process 800 is described with reference to some examples depicted in the figures. However, other implementations are possible.

At block 802, process 800 involves accessing a binary string or binary representation representing a partition of a video signal. A partition may be a video frame, slice, or tile, or any type of partition that is processed as a unit by a video encoder when performing encoding. The partition includes a set of CTUs. Each CTU includes one or more CUs, and each CU includes a plurality of TUs for encoding, as shown in the example of fig. 6.

At block 804, process 800 involves processing the binary string of each CTU in the set of CTUs in the partition to generate decoded samples for the partition, block 804 including 806 through 812. At block 806, process 800 involves updating a replacement variable HistValue for a TU in a CTU independently of a previous TU or CTU. As discussed above, in some embodiments, updating the replacement variable HistValue may be performed each time the history counter StatCoeff is updated. For example, when the first non-zero golomb-rice encoded transform coefficient in the TU is encoded into abs_ remainder and dec_abs_level, respectively, the update history counter StatCoeff is performed according to equation (9) and equation (10).

At block 808, process 800 involves calculating rice parameters for TUs in the CTU based on the updated substitution variables HistValue, as discussed above. At block 810, process 800 involves decoding a binary string or binary representation of TUs in a CTU into coefficient values based on the calculated rice parameters, e.g., by a combination of Truncated Rice (TR) and limited-k-order EGK specified in the VVC specification. At block 812, process 800 involves reconstructing pixel values of TUs in the CTU, such as by inverse quantization and inverse transform as discussed above with respect to fig. 2. At block 814, process 800 involves outputting a decoded partition of the video.

Although in the above description, TUs are shown and described in the figures (e.g., fig. 6), the same techniques may be applied to Transform Blocks (TBs). In other words, in the embodiments presented above (including the figures), TU may also represent TB.

Computing system examples for implementing quantization-dependent video coding

Any suitable computing system may be used to perform the operations described herein. For example, fig. 9 depicts an example of a computing device 900 that may implement video encoder 100 of fig. 1 or video decoder 200 of fig. 2. In some embodiments, computing device 900 may include a processor 912, processor 912 communicatively coupled to memory 914 and executing computer-executable program code and/or access information stored in memory 914. The processor 912 may include a microprocessor, application specific integrated circuit ("ASIC"), state machine, or other processing device. The processor 912 may include any of a number of processing devices, including a single processing device. Such a processor may include, or may be in communication with, a computer-readable medium storing instructions that, when executed by the processor 912, cause the processor to perform the operations described herein.

Memory 914 may include any suitable non-transitory computer-readable medium. The computer readable medium may include any electronic, optical, magnetic, or other storage device that can provide computer readable instructions or other program code to a processor. Non-limiting examples of computer readable media include magnetic disks, memory chips, ROM, RAM, ASIC, configured processors, optical memory, magnetic tape or other magnetic memory, or any other medium from which a computer processor may read instructions. The instructions may include processor-specific instructions generated by a compiler and/or interpreter according to code written in any suitable computer programming language including, for example, C, C ++, c#, visual Basic, java, python, perl, javaScript, and ActionScript.

Computing device 900 may also include a bus 916. The bus 916 communicatively couples one or more components of the computing device 900. Computing device 900 may also include a number of external or internal devices, such as input or output devices. For example, computing device 900 is shown with an input/output ("I/O") interface 918, which I/O interface 918 may receive input from one or more input devices 920 or provide output to one or more output devices 922. One or more input devices 920 and one or more output devices 922 are communicatively coupled to the I/O interface 918. The communication coupling may be implemented by any suitable means (e.g., via a printed circuit board connection, via a cable connection, via wireless transmission communication, etc.). Non-limiting examples of input devices 920 include a touch screen (e.g., one or more cameras for imaging a touch area, or pressure sensors for detecting pressure changes caused by a touch), a mouse, a keyboard, or any other device that may be used to generate input events in response to physical actions of a user of a computing device. Non-limiting examples of output devices 922 include an LCD screen, an external monitor, speakers, or any other device that may be used to display or otherwise present output generated by a computing device.

Computing device 900 may execute program code that configures processor 912 to perform one or more of the operations described above with respect to fig. 1-8. The program code may include the video encoder 100 or the video decoder 200. The program code may reside in the memory 914 or any suitable computer readable medium and may be executed by the processor 912 or any other suitable processor.

Computing device 900 may also include at least one network interface device 924. Network interface device 924 may include any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks 928. Non-limiting examples of network interface device 924 include an ethernet network adapter, modem, and the like. Computing device 900 may transmit messages as electronic or optical signals through network interface device 924.

Overall consideration

Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, it will be understood by those skilled in the art that the claimed subject matter may be practiced without these specific details. In other instances, methods, devices, or systems known by those of ordinary skill have not been described in detail so as not to obscure the claimed subject matter.

Unless specifically stated otherwise, it is appreciated that throughout the discussion of the present specification, terms such as "processing," "computing," "calculating," "determining," "identifying," or the like, are used to refer to the action or processes of a computing device (e.g., one or more computers or similar electronic computing device) that manipulates and transforms data represented as physical, electronic or magnetic quantities within the computing platform's memories, registers or other information storage devices, transmission devices or display devices.

The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. The computing device may include any suitable arrangement of components that provide results conditioned on one or more inputs. Suitable computing devices include a multi-purpose microprocessor-based computer system that accesses stored software that programs or configures the computing system from a general-purpose computing device to a special-purpose computing device, thereby implementing one or more embodiments of the subject matter herein. Any suitable programming, scripting, or other type of language or combination of languages may be used to implement the teachings contained herein in software for programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such a computing device. The order of the blocks appearing in the above examples may be changed-e.g., the blocks may be reordered, combined, and/or broken into sub-blocks. Some blocks or processes may be performed in parallel.

"Adapted" or "configured" as used herein means an open and inclusive language that does not exclude apparatuses adapted or configured to perform additional tasks or steps. Furthermore, the use of "based on" is intended to be open and inclusive in that a process, step, calculation, or other action "based on" one or more of the listed conditions or values may in fact be based on additional conditions or values beyond those listed. Headings, lists, and numbers included herein are for ease of explanation only and are not meant as limitations.

While the subject matter herein has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example and not limitation, and that such modifications, variations and/or additions to the subject matter herein are not excluded as would be readily apparent to one of ordinary skill in the art.

Claims

1. A method for decoding video, the method comprising:

Accessing a binary string representing a partition of the video, the partition comprising a plurality of coding tree units CTUs;

decoding each CTU of the plurality of CTUs in the partition, decoding the CTU including decoding a transform unit, TU, of the CTU by:

updating a replacement variable HistValue of the TU for calculating a rice parameter, wherein updating the replacement variable HistValue is performed independently of another one of the CTUs located before the TU and another one of the CTUs located before the CTU;

calculating rice parameters of the TUs in the CTU based on the updated substitution variables HistValue;

Decoding a binary string corresponding to the TU in the CTU into a coefficient value of the TU based on the calculated rice parameter; and

Determining pixel values of the TUs in the CTU according to the coefficient values; and

Outputting a decoded partition of the video, the decoded partition including a plurality of CTUs decoded in the partition.

2. The method of claim 1, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the transform coefficient at the location is the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level.

3. The method of claim 1, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the quantization level at the location is the first non-zero quantization level in the TU.

4. A method according to claim 3, wherein updating the substitution variable HistValue is performed in response to at least one of:

determining that the flag for updating the replacement variable HistValue is true; or alternatively

A flag for enabling rice parameter derivation is determined to be true and the current position has the first non-zero quantization level.

5. The method of claim 1, wherein updating the replacement variable HistValue comprises:

updating a history counter StatCoeff of the color component; and

Based on the updated history counter StatCoeff for the color component, the substitution variable HistValue is updated before the next rice parameter is calculated, as follows:

HistValue[cIdx]＝1＜＜StatCoeff[cIdx]。

6. the method of claim 5, wherein updating the history counter StatCoeff comprises:

In response to determining that the first non-zero golomb-rice encoded transform coefficient in the TU is encoded to abs_ remainder, the history counter StatCoeff of the color component cIdx is updated to:

StatCoeff[cIdx]＝(StatCoeff_init[cIdx]+Floor(Log2(abs_remainder[cIdx]))+2)>>1；

in response to determining that the first non-zero golomb-rice encoded transform coefficient in the TU is encoded to dec_abs_level, the history counter StatCoeff of the color component cIdx is updated to:

StatCoeff[cIdx]＝(StatCoeff_init[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

Wherein Floor (x) represents a maximum integer less than or equal to x, log2 (x) is a base 2 logarithm of x, wherein StatCoeff _init represents an initial value of the history counter StatCoeff.

7. The method of claim 6, wherein StatCoeff _init of the color component cIdx is determined for the partition based on at least one of:

StatCoeff _init [ idx ] =2×floor (Log 2 (BitDepth-10)), or

StatCoeff_init[idx]＝Clip(MIN_Stat,MAX_Stat,(int)((19-QP)/6))-1，

Wherein BitDepth specifies the bit depth of the samples of the luma or chroma array, floor (x) represents the maximum integer less than or equal to x, where min_stat, max_stat are two predefined integers, QP is the initial quantization parameter for each partition, and Clip () is defined as follows:

8. the method of claim 1, wherein the partition is a frame, slice, or tile.

9. The method of claim 1, further comprising: the history counter StatCoeff is set to an initial value based on the bit depth of samples of the luma and chroma arrays of the video or quantization parameters of the current partition.

10. A non-transitory computer-readable medium having stored thereon program code executable by one or more processing devices to perform operations comprising:

accessing a binary string representing a partition of a video, the partition comprising a plurality of coding tree units CTUs;

Decoding each CTU of the plurality of CTUs in the partition, decoding the CTU comprising decoding a transform unit, TU, of the CTU by:

11. The non-transitory computer-readable medium of claim 10, wherein updating the replacement variable HistValue of the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the transform coefficient at the location is the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level.

12. The non-transitory computer-readable medium of claim 10, wherein updating the replacement variable HistValue of the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the quantization level at the location is the first non-zero quantization level in the TU.

13. The non-transitory computer-readable medium of claim 12, wherein updating the substitution variable HistValue is performed in response to at least one of:

14. The non-transitory computer-readable medium of claim 10, wherein updating the replacement variable HistValue comprises:

Updating the history counter StatCoeff based on an initial value of the history counter StatCoeff of the color component; and

Based on the updated history counter StatCoeff for the color component, the substitution variable HistValue is updated as follows, before the next rice parameter is calculated:

HistValue[cIdx]＝1＜＜StatCoeff[cIdx]。

15. A system, comprising:

A processing device; and

A non-transitory computer readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer readable medium to perform a plurality of operations comprising:

decoding each CTU of the plurality of CTUs in the partition, decoding the CTU including decoding a Transform Unit (TU) of the CTU by:

16. The system of claim 15, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the transform coefficient at the location is the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level.

17. The system of claim 15, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the quantization level at the location is the first non-zero quantization level in the TU.

18. The system of claim 17, wherein updating the substitution variable HistValue is performed in response to at least one of:

19. The system of claim 15, wherein updating the replacement variable HistValue comprises:

HistValue[cIdx]＝1＜＜StatCoeff[cIdx]。

20. the system of claim 15, wherein the partition is a frame, slice, or tile.

21. A method for encoding video, the method comprising:

Accessing a partition of the video, the partition comprising a plurality of coding tree units CTUs;

processing the partition of the video to generate a binary representation of the partition, the processing comprising:

Encoding each CTU of the plurality of CTUs in the partition, the encoding the CTU including encoding a Transform Unit (TU) of the CTU by:

Updating a replacement variable HistValue of the TU for calculating a rice parameter, wherein,

Updating the substitution variable HistValue is performed independently of a) another one of the CTUs that precedes the TU and b) another one of the plurality of CTUs that precedes the CTU;

Calculating rice parameters of the TUs in the CTU based on the updated substitution variables HistValue; and

Encoding coefficient values of the TUs into binary representations corresponding to the TUs in the CTU based on the calculated rice parameters; and

The binary representation of the partition is encoded into a bitstream of the video.

22. The method of claim 21, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the transform coefficient at the location is the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level.

23. The method of claim 21, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the quantization level at the location is the first non-zero quantization level in the TU.

24. The method of claim 23, wherein updating the substitution variable HistValue is performed in response to at least one of:

25. The method of claim 21, wherein updating the replacement variable HistValue comprises:

updating a history counter StatCoeff of the color component; and

HistValue[cIdx]＝1＜＜StatCoeff[cIdx]。

26. The method of claim 25, wherein updating the history counter StatCoeff comprises:

StatCoeff[cIdx]＝(StatCoeff_init[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

27. The method of claim 26, wherein StatCoeff _init of the color component cIdx is determined for the partition based on at least one of:

StatCoeff _init [ idx ] =2×floor (Log 2 (BitDepth-10)), or

StatCoeff_init[idx]＝Clip(MIN_Stat,MAX_Stat,(int)((19-QP)/6))-1，

28. the method of claim 21, wherein the partition is a frame, slice, or tile.

29. The method of claim 1, further comprising: the history counter StatCoeff is set to an initial value based on the bit depth of samples of the luma and chroma arrays of the video or quantization parameters of the current partition.

30. A non-transitory computer-readable medium having stored thereon program code executable by one or more processing devices to perform operations comprising:

Accessing a partition of a video, the partition comprising a plurality of Coding Tree Units (CTUs);

Updating the substitution variable HistValue is performed independently of another one of the CTUs that precedes the TU and another one of the plurality of CTUs that precedes the CTU;

31. The non-transitory computer-readable medium of claim 30, wherein updating the replacement variable HistValue of the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the transform coefficient at the location is the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level.

32. The non-transitory computer-readable medium of claim 30, wherein updating the replacement variable HistValue of the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the quantization level at the location is the first non-zero quantization level in the TU.

33. The non-transitory computer-readable medium of claim 32, wherein updating the substitution variable HistValue is performed in response to at least one of:

34. The non-transitory computer-readable medium of claim 30, wherein updating the replacement variable HistValue comprises:

HistValue[cIdx]＝1＜＜StatCoeff[cIdx]。

35. A system, comprising:

A processing device; and

accessing a partition of a video, the partition comprising a plurality of coding tree units CTUs;

encoding each CTU of the plurality of CTUs in the partition, the encoding the CTU including encoding a transform unit TU of the CTU by:

36. The system of claim 35, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the transform coefficient at the location is the first non-zero golomb-rice encoded transform coefficient in the TU encoded as abs_ remainder or dec_abs_level.

37. The system of claim 35, wherein updating the replacement variable HistValue for the TU comprises: the replacement variable HistValue for the TU is calculated based on the absolute value of the quantization level at a location within the TU, where the quantization level at the location is the first non-zero quantization level in the TU.

38. The system of claim 37, wherein updating the substitution variable HistValue is performed in response to at least one of:

39. The system of claim 35, wherein updating the replacement variable HistValue comprises:

HistValue[cIdx]＝1＜＜StatCoeff[cIdx]。

40. The system of claim 35, wherein the partition is a frame, slice, or tile.