CN116965028A

CN116965028A - Residual level binarization for video coding

Info

Publication number: CN116965028A
Application number: CN202280019616.6A
Authority: CN
Inventors: 余越; 于浩平
Original assignee: Innopeak Technology Inc
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-11
Filing date: 2022-03-11
Publication date: 2023-10-27
Also published as: WO2022192902A1

Abstract

In some embodiments, a video decoder decodes video blocks from a bitstream of video. The video decoder obtains a binary string representing a video block decoded from a bitstream of the video. The video block is associated with a plurality of quantization levels. The video decoder processes the binary string to recover a plurality of quantization levels for the block. The processing includes obtaining a portion of the binary string corresponding to a quantization level of the plurality of quantization levels, and converting the portion of the binary string to a quantization level according to a k-th order exponential golomb binarization, where k is an integer greater than zero. The video decoder reconstructs the block by determining pixel values for the block from a plurality of quantization levels.

Description

Residual level binarization for video coding

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application No. 63/159,913, entitled "Remaining Level Binarization method for AVS Video Coding (residual level binarization method for AVS video coding)" filed on 3/11 of 2021, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates generally to computer-implemented methods and systems for video processing. In particular, the disclosure relates to residual level binarization for video coding.

Background

The ubiquitous camera-enabled devices, such as smartphones, tablets, and computers, make capturing video or images easier than ever before. However, the data amount of even short video may be quite large. Video coding techniques (including video encoding and decoding) enable video data to be compressed into smaller sizes, enabling the storage and transmission of a variety of videos. Video coding has been used in a wide range of applications, such as digital television broadcasting, video transmission over the internet and mobile networks, real-time applications (e.g., video chat, video conferencing), digital Versatile Discs (DVDs), and blu-ray discs, among others. In order to reduce the storage space in which video is stored and/or the network bandwidth consumption in which video is transmitted, it is desirable to increase the efficiency of the video coding scheme.

Disclosure of Invention

Some embodiments relate to residual level binarization for video coding. In one example, a method for decoding video includes: obtaining a binary representation of a block of video, the block of video being associated with a plurality of quantization levels; processing the binary representation to recover a plurality of quantization levels for the block; and reconstructing the block by determining pixel values of the block from the plurality of quantization levels. The process includes: obtaining a portion of the binary representation corresponding to a quantization level of the plurality of quantization levels; and converting the portion of the binary representation into quantization levels according to an exponential Golomb of order k (Exp-Golomb) binarization. k indicates the order of the exponential golomb binarization and is an integer greater than zero.

In another example, a non-transitory computer-readable medium has program code stored thereon, the program code executable by one or more processing devices to perform the following operations. The operation includes: obtaining a binary representation of a block of video, the block of video being associated with a plurality of quantization levels; processing the binary representation to recover a plurality of quantization levels for the block; and reconstructing the block by determining pixel values of the block from the plurality of quantization levels. The process includes: obtaining a portion of the binary representation corresponding to a quantization level of the plurality of quantization levels; and converting the portion of the binary representation into quantization levels according to the k-th order exponential golomb binarization. k indicates the order of the exponential golomb binarization and is an integer greater than zero.

In another example, a system includes a processing device and a non-transitory computer readable medium communicatively coupled to the processing device. The processing device is configured to execute program code stored in a non-transitory computer readable medium and thereby perform the following operations. The operation includes: obtaining a binary representation of a block of video, the block of video being associated with a plurality of quantization levels; processing the binary representation to recover a plurality of quantization levels for the block; and reconstructing the block by determining pixel values of the block from the plurality of quantization levels. The process includes: obtaining a portion of the binary representation corresponding to a quantization level of the plurality of quantization levels; and converting the portion of the binary representation into quantization levels according to the k-th order exponential golomb binarization. k indicates the order of the exponential golomb binarization and is an integer greater than zero.

In a further example, a method for encoding video includes: acquiring a plurality of quantization levels of a block of video; processing each of a plurality of quantization levels of a block to generate a binary representation of the plurality of quantization levels, the processing comprising: determining a remaining level of the quantization level, and converting the remaining level of the quantization level to a binary representation of the quantization level according to an exponential golomb binarization of k, wherein k indicates an order of the exponential golomb binarization and is an integer greater than zero; and encoding at least a binary representation of the plurality of quantization levels of the block into a bitstream of the video.

In another example, a non-transitory computer readable medium has program code stored thereon. The program code may be executed by one or more processing devices to perform the following operations. The operation includes: acquiring a plurality of quantization levels of a block of video; processing each of a plurality of quantization levels of a block to generate a binary representation of the plurality of quantization levels, the processing comprising: determining a remaining level of the quantization level, and converting the remaining level of the quantization level to a binary representation of the quantization level according to an exponential golomb binarization of k, wherein k indicates an order of the exponential golomb binarization and is an integer greater than zero; and encoding at least a binary representation of the plurality of quantization levels of the block into a bitstream of the video.

In yet another example, a system includes a processing device and a non-transitory computer readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer readable medium and thereby perform the following operations. The operation includes: acquiring a plurality of quantization levels of a block of video; processing each of a plurality of quantization levels of a block to generate a binary representation of the plurality of quantization levels, the processing comprising: determining a remaining level of the quantization level, and converting the remaining level of the quantization level to a binary representation of the quantization level according to an exponential golomb binarization of k, wherein k indicates an order of the exponential golomb binarization and is an integer greater than zero; and encoding at least a binary representation of the plurality of quantization levels of the block into a bitstream of the video.

These illustrative embodiments are not mentioned to limit or define the disclosure, but to provide examples to aid understanding of the disclosure. Additional examples are discussed in the detailed description, and further description is provided in the detailed description.

Drawings

The features, embodiments, and advantages of the present disclosure will be better understood when the following detailed description is read with reference to the accompanying drawings.

Fig. 1 is a block diagram illustrating an example of a video encoder configured to implement embodiments presented herein.

Fig. 2 is a block diagram illustrating an example of a video decoder configured to implement embodiments presented herein.

Fig. 3 depicts an example of coding tree unit partitioning of pictures in video according to some embodiments of the present disclosure.

Fig. 4 depicts an example of coding unit partitioning of coding tree units according to some embodiments of the present disclosure.

Fig. 5 depicts an example of scan region based coefficient encoding in accordance with some embodiments of the present disclosure.

FIG. 6A depicts a table listing examples of k-th order exponential Columbus binarizations.

Fig. 6B depicts an example of a special position template used in the adaptive binarization method according to some embodiments of the present disclosure.

Fig. 7 depicts an example of a process for encoding video blocks according to some embodiments of the present disclosure.

Fig. 8 depicts an example of a process for decoding a video block according to some embodiments of the present disclosure.

FIG. 9 depicts an example of a computing system that may be used to implement some embodiments of the present disclosure.

Detailed Description

Various embodiments provide a residual level binarization scheme for video coding. As discussed above, more and more video data is being generated, stored, and transmitted. It would be beneficial to increase the efficiency of video coding techniques to represent video using less data without compromising the visual quality of the decoded video. One way to increase the coding efficiency is to compress the processed video coefficients by entropy coding into a binary bit stream using as few bits as possible. Prior to entropy encoding, the video coefficient levels (or the remaining levels of coefficient levels) are binarized into binary bins, and coding algorithms such as context-adaptive modeling-based binary arithmetic coding (CABAC) may further compress the bins into bits. However, the binarization method currently used in the audio video coding standard (AVS) uses an exponential Golomb (Exp-Golomb) codeword of order 0. Such a binarization method may not be optimal, especially when the bit depth of the video samples increases and the value to be binarized becomes larger. Various embodiments described herein address these issues by introducing higher order binarization methods into the residual level binarization, thereby improving coding efficiency.

In one embodiment, the 1 st order exponential golomb binarization method is used instead of the 0 th order exponential golomb. This allows fewer bits to be used to represent the coefficient levels (or residual coefficient levels), particularly those having a larger value, thereby improving coding efficiency. In another embodiment, a k-th order exponential golomb binarization method is used to further increase the coding efficiency of high bit depth video, where k >1. In a further embodiment, an adaptive k-th order exponential golomb binarization method is used to binarize video coefficient levels. For example, the level information before the current position is used to determine a k-th order exponential golomb binarization method for binarizing the remaining level of the current position. The adaptive binarization method allows the order of the binarization method to be changed according to the content of the video, resulting in a more efficient encoding result (i.e., using fewer bits to represent the video). In future video coding standards, these techniques may be efficient coding tools.

Referring now to the drawings, FIG. 1 is a block diagram illustrating an example of a video encoder 100 configured to implement embodiments presented herein. In the example shown in fig. 1, video encoder 100 includes a partitioning module 112, a transform module 114, a quantization module 115, an inverse quantization module 118, an inverse transform module 119, an in-loop filter module 120, an intra prediction module 126, an inter prediction module 124, a motion estimation module 122, a decoded picture buffer 130, and an entropy encoding module 116.

The input to the video encoder 100 is an input video 102 that contains a sequence of pictures (also referred to as frames or images). In a block-based video encoder, for each picture, video encoder 100 uses a partitioning module 112 to partition the picture into blocks 104, each block containing a plurality of pixels. The block may be a macroblock, a coding tree unit, a coding unit, a prediction unit, and/or a prediction block. One picture may include blocks of different sizes, and the block partitions of different pictures of the video may also be different. Each block may be encoded using different predictions (e.g., intra-prediction, or inter-prediction, or a hybrid of intra-and inter-prediction).

Typically, the first picture of a video signal is an intra-coded picture, which is coded using only intra-prediction. In intra prediction mode, only blocks of a picture are predicted using data that has been encoded from the same picture. Intra-coded pictures may be decoded without information from other pictures. To perform intra prediction, the video encoder 100 shown in fig. 1 may use an intra prediction module 126. The intra prediction module 126 is configured to generate an intra prediction block (prediction block 134) using reconstructed samples in a reconstructed block 136 of neighboring blocks in the same picture. Intra prediction is performed according to an intra prediction mode selected for the block. Then, the video encoder 100 calculates the difference between the block 104 and the intra prediction block 134. This difference is referred to as residual block 106.

To further remove redundancy from the block, the transform module 114 transforms the residual block 106 into the transform domain by applying a transform to the samples in the block. Examples of transforms may include, but are not limited to, discrete Cosine Transforms (DCTs) or Discrete Sine Transforms (DSTs). The transformed values may be referred to as transform coefficients, representing a residual block in the transform domain. In some examples, the residual block may be quantized directly without transformation by transformation module 114. This is referred to as a transform skip mode.

The video encoder 100 may also quantize the transform coefficients using the quantization module 115 to obtain quantized coefficients. Quantization involves dividing the samples by a quantization step followed by rounding, while inverse quantization involves multiplying the quantization value by the quantization step. This quantization process is known as scalar quantization. Quantization is used to reduce the dynamic range of video samples (transformed or untransformed) so that fewer bits are used to represent the video samples.

Quantization of intra coefficients/samples can be performed independently, and such quantization methods are used in some existing video compression standards (e.g., h.264 and HEVC). For an nxm block, a particular scan order may be used to convert the 2D coefficients of the block into a 1-D array for coefficient quantization and encoding. Quantization of the intra-block coefficients may utilize scan order information. For example, the quantization of a given coefficient in a block may depend on the state of quantized values preceding along the scan order. To further increase coding efficiency, more than one quantizer may be used. Which quantizer to use for quantizing the current coefficient depends on information preceding the current coefficient in the encoding/decoding scan order. This quantization method is called dependency quantization.

Quantization step sizes may be used to adjust the quantization level. For example, for scalar quantization, different quantization steps may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization, while larger quantization steps correspond to coarser quantization. The quantization step size may be indicated by a Quantization Parameter (QP). Quantization parameters are provided in the encoded bitstream of video so that a video decoder can apply the same quantization parameters for decoding.

The quantized samples are then encoded by entropy encoding module 116 to further reduce the size of the video signal. The entropy encoding module 116 is configured to apply an entropy encoding algorithm to the quantized samples. In some examples, the quantized samples are binarized into bins, and the encoding algorithm further compresses the bins into bits. Examples of binarization methods include, but are not limited to, combined Truncated Rice (TR) and limited k-th order exponential golomb (EGk) binarization, and k-th order exponential golomb binarization. Examples of entropy coding algorithms include, but are not limited to, variable Length Coding (VLC) schemes, context adaptive VLC schemes (CAVLC), arithmetic coding schemes, binarization, context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or other entropy coding techniques. The entropy encoded data is added to the bitstream that outputs the encoded video 132.

As discussed above, reconstructed block 136 from neighboring blocks is used in intra prediction of a block of a picture. The generation of the reconstructed block 136 of a block involves calculating the reconstructed residual of the block. The reconstructed residual may be determined by applying inverse quantization and inverse transform to the quantized residual of the block. The inverse quantization module 118 is configured to apply inverse quantization to the quantized samples to obtain dequantized coefficients. The inverse quantization module 118 applies an inverse of the quantization scheme applied by the quantization module 115 by using the same quantization step size as the quantization module 115. The inverse transform module 119 is configured to apply an inverse transform, such as an inverse DCT or an inverse DST, of the transform applied by the transform module 114 to the dequantized samples. The output of inverse transform module 119 is the reconstructed residual of the block in the pixel domain. The reconstructed residual may be added to a prediction block 134 of the block to obtain a reconstructed block 136 in the pixel domain. For blocks that have skipped the transform, the inverse transform module 119 is not applied to those blocks. The dequantized samples are the reconstructed residuals of the block.

The block in a subsequent picture after the first intra-predicted picture may be encoded using inter-prediction or intra-prediction. In inter prediction, the prediction of a block in a picture is from one or more previously encoded video pictures. To perform inter prediction, the video encoder 100 uses an inter prediction module 124. The inter prediction module 124 is configured to perform motion compensation on the block based on the motion estimation provided by the motion estimation module 122.

Motion estimation module 122 compares current block 104 of the current picture with decoded reference picture 108 for motion estimation. The decoded reference picture 108 is stored in a decoded picture buffer 130. The motion estimation module 122 selects a reference block from the decoded reference pictures 108 that best matches the current block. Motion estimation module 122 further identifies an offset between the location (e.g., x, y coordinates) of the reference block and the location of the current block. This offset is referred to as a Motion Vector (MV) and is provided to the inter prediction module 124 along with the selected reference block. In some cases, a plurality of reference blocks are identified for a current block in a plurality of decoded reference pictures 108. Accordingly, a plurality of motion vectors are generated and provided to the inter prediction module 124 along with corresponding reference blocks.

The inter prediction module 124 performs motion compensation using the motion vector and other inter prediction parameters to generate a prediction of the current block (i.e., the inter prediction block 134). For example, based on the motion vector, the inter prediction module 124 may locate a prediction block pointed to by the motion vector in a corresponding reference picture. If there is more than one prediction block, these prediction blocks are combined with some weights to generate the prediction block 134 of the current block.

For inter-prediction blocks, video encoder 100 may subtract inter-prediction block 134 from block 104 to generate residual block 106. The residual block 106 may be transformed, quantized, and entropy encoded in the same manner as the residual of the intra-prediction block discussed above. Likewise, a reconstructed block 136 of the inter prediction block may be obtained by inverse quantizing, inverse transforming, and then combining the residual with the corresponding prediction block 134.

To obtain decoded pictures 108 for motion estimation, reconstructed block 136 is processed by in-loop filter module 120. In-loop filter module 120 is configured to smooth transitions of pixels to improve video quality. In-loop filter module 120 may be configured to implement one or more in-loop filters, such as deblocking filters, or Sample Adaptive Offset (SAO) filters, or Adaptive Loop Filters (ALF), or the like.

Fig. 2 depicts an example of a video decoder 200 configured to implement embodiments presented herein. The video decoder 200 processes the encoded video 202 in the bitstream and generates decoded pictures 208. In the example shown in fig. 2, video decoder 200 includes entropy decoding module 216, inverse quantization module 218, inverse transform module 219, in-loop filter module 220, intra prediction module 226, inter prediction module 224, and decoded picture buffer 230.

The entropy decoding module 216 is configured to perform entropy decoding on the encoded video 202. The entropy decoding module 216 decodes the quantized coefficients, encoding parameters including intra-prediction parameters and inter-prediction parameters, and other information. In some examples, entropy decoding module 216 decodes the bitstream of encoded video 202 into a binary representation and then converts the binary representation into quantization levels of the coefficients. The entropy decoded coefficient levels are then inverse quantized by inverse quantization module 218 and then inverse transformed to the pixel domain by inverse transform module 219. The inverse quantization module 218 and the inverse transformation module 219 function similarly to the inverse quantization module 118 and the inverse transformation module 119, respectively, as described above with respect to fig. 1. The inverse transformed residual block may be added to the corresponding prediction block 234 to generate a reconstructed block 236. For blocks that have skipped the transform, the inverse transform module 219 is not applied to those blocks. The dequantized samples generated by the dequantization module 118 are used to generate the reconstructed block 236.

The prediction block 234 of a particular block is generated based on the prediction mode of that block. If the encoding parameters of a block indicate that the block is intra-predicted, a reconstructed block 236 of a reference block in the same picture may be input into the intra-prediction module 226 to generate a predicted block 234 of the block. If the encoding parameters of the block indicate that the block is inter predicted, a prediction block 234 is generated by the inter prediction module 224. The intra-prediction module 226 and the inter-prediction module 224 function similarly to the intra-prediction module 126 and the inter-prediction module 124, respectively, of fig. 1.

As discussed above with respect to fig. 1, inter prediction involves one or more reference pictures. The video decoder 200 generates a decoded picture 208 of the reference picture by applying the in-loop filter module 220 to the reconstructed block of the reference picture. Decoded picture 208 is stored in decoded picture buffer 230 for use by inter prediction module 224 and also for output.

Referring now to fig. 3, fig. 3 depicts an example of coding tree unit partitioning of pictures in video according to some embodiments of the present disclosure. As discussed above with respect to fig. 1 and 2, to encode a picture of video, the picture is divided into blocks, such as Coding Tree Unit (CTU) 302 in AVS as shown in fig. 3. For example, CTU 302 may be a 128 x 128 pixel block. The CTUs are processed according to a certain order, such as the order shown in fig. 3. In some examples, each CTU 302 in a picture may be divided into one or more Coding Units (CUs) 402 as shown in fig. 4, which may be further divided into prediction units or Transform Units (TUs) for prediction and transform. CTU 302 may be partitioned into CUs 402 in different ways depending on the coding scheme. For example, in AVS, CU 402 may be rectangular or square, and may be encoded without being further divided into prediction units or transform units. Each CU 402 may be as large as its root CTU 302 or a subdivision of the root CTU 302 as small as a 4 x 4 block. As shown in fig. 4, the partitioning from CTU 302 to CU 402 in AVS may be a quadtree partitioning or a binary tree partitioning or a trigeminal tree partitioning. In fig. 4, the solid line represents a quadtree division, and the broken line represents a binary tree division or a trigeminal tree division.

As discussed above with respect to fig. 1 and 2, quantization is used to reduce the dynamic range of elements of a block in a video signal, thereby using fewer bits to represent the video signal. In some examples, prior to quantization, the transformed or untransformed video signal at a particular location is referred to as a coefficient. After quantization, the quantized value of the coefficient is called quantization level or level. Quantization typically involves dividing by a quantization step size and then rounding, while inverse quantization involves multiplying by the quantization step size. This quantization process is also known as scalar quantization. Quantization of coefficients within a block may be performed independently, and such independent quantization methods are used in some existing video compression standards (e.g., h.264, HEVC, AVS, etc.). In other examples, dependency quantization is employed, such as in multi-function video coding (VVC).

Residual coding

In video coding, residual coding is used to convert quantization levels into a bit stream. After quantization, there are n×m quantization levels for an n×m block. The nxm levels may be zero or non-zero values. If the level is not binary, the non-zero level will be further binarized into binary bins. The bin may be further compressed into bits based on context modeled binary arithmetic coding, such as CABAC. For the Regular Residual Coding (RRC) and Transform Skip Residual Coding (TSRC) blocks of transforms in AVS, scan region based coefficient coding (SRCC) may be used.

Fig. 5 shows an example of scan region based coefficient coding. For the block width by block height block 500, two-dimensional (2-D) coordinates (scan_region_x and scan_region_y) are encoded in a bitstream to indicate the smallest rectangular region 504 where there are non-zero levels within the smallest rectangular region 504, and where the levels at all locations outside the smallest rectangular region 504 would be zero. The minimum rectangular area 504 is referred to as an SRCC area or SRCC block. scan_region_x and scan_region_y are less than or equal to the block width and the block height, respectively. Within the SRCC block, the level of each location may be zero or non-zero, and there is always a non-zero level whose coordinates are equal to scan_region_x or scan_region_y or to scan_region_x and scan_region_y.

The SRCC block may include a number of predetermined sub-blocks (e.g., 4 x 4 sub-blocks). Since the size of the SRCC block may not fit into an integer number of regular sub-blocks, the size of the sub-block in the last row or column may be smaller than the regular sub-block. For SRCC blocks of size scan_region_x×scan_region_y, a particular coding scan order may be used to convert the 2-D coefficients of the block into a one-dimensional (1-D) order for coefficient quantization and coding. Typically, the code scan starts from the upper left corner and stops at the last sub-block located in the lower right corner of the SRCC block in the lower right direction. The last sub-block is derived from (scan _ region _ x and scan _ region _ y) according to a predetermined code scan order. The RRC will start from the last sub-block and encode each sub-block in reverse coding scan order. Within the sub-block, residual coding will code the level of each position in reverse coding scan order. Fig. 5 shows an example of a block 500 having an SRCC 504 and sub-blocks 506A-506D. Each sub-block 506 has a predetermined reverse scan order for encoding the quantization levels in the sub-block 506. In this example, sub-block 506A has a size of 3×3, encoding is from the bottom right corner L ₀ Starting at a position at the upper left corner L ₈ The process ends.

For each level, a flag named sig_flag is first encoded into the bitstream to indicate whether the level is zero or non-zero. When certain conditions are met, these sig_flags for all positions within the sub-block except for the two special positions will be encoded sequentially into the bit stream. At position (0, scan_region_y), if the quantization level for all positions (x, scan_region_y) is zero, where x=1, …, scan_region_x, then sig_flag is not encoded because the level for that position must be non-zero. Similarly, at a location (scan_region_x, 0), if the quantization level for all locations (scan_region_x, y) is zero, where y=1, …, scan_region_y, then the sig_flag is not encoded because the level for that location must be non-zero.

After encoding all sig_flag within a sub-block, coeff_abs_level_greater1_flag will be encoded for any non-zero level within the sub-block to indicate whether the absolute level is 1 or greater than 1. In AVS, if the absolute level is greater than 1, coeff_abs_level_groter 2_flag will be encoded to indicate whether the absolute level is 2 or greater than 2.

After encoding coeff_abs_level_great1_flag and coeff_abs_level_great2_flag within a sub-block, another syntax element, symmetrical to coeff_abs_level_remaining, will be encoded for any absolute level greater than 2 in any position within the sub-block for those positions. coeff_abs_level_remaining represents the absolute level minus 3 in the current AVS. After encoding the syntax element coeff_abs_level_remaining within the sub-block, a flag coeff_sign indicating that the level is negative or positive for each non-zero level position will be encoded. Once sig_flag, coeff_abs_level_great1_flag, coeff_abs_level_great2_flag, coeff_abs_level_remaining, and coeff_sign within a sub-block are encoded, the residual encoding process will proceed to the next sub-block in the reverse encoding scan order until all syntax elements of all sub-blocks within the residual block are encoded.

In some examples, video coding schemes such as AVS may employ more flexible syntax elements (e.g., abs_level_gtxx_flag) to allow conditional parsing of syntax elements for level coding of residual blocks. Table 1 shows an example of binarization of the absolute value of the quantization level. Where abs_level_gtxx_flag describes whether the absolute value of the quantization level is greater than X, where X is an integer, such as 0,1,2, … or N. If abs_level_gtxY_flag is 0, where Y is an integer between 0 and N-1, then abs_level_gtx (Y+1) _flag will not be present. If abs_level_gtxY_flag is 1, then abs_level_gtx (Y+1) _flag will be present. For example, for abs (level) =2, abs_level_gtx0_flag is 1, so there is abs_level_gtx1_flag. Since abs_level_gtx1_flag is 1, abs_level_gtx2_flag exists. In this example, abs_level_gtx2_flag is 0, so there will be no abs_level_gtx3_flag.

Furthermore, if abs_level_gtxn_flag is 0, there will be no remaining levels. When abs_level_gtxn_flag is 1, there will be a residual level, which represents the level minus the value of n+1. In the example shown in table 1, n=3. For abs (level) =3, abs_level_gtx3_flag is 0, so there is no remaining level (in table 1, denoted as remaining). For abs (level) =5, abs_level_gtx3_flag is 1, so there is a remaining level, and the remaining level is 5- (3+1) =1. The manner in which these syntax elements are encoded in the bitstream is not limited.

Table 1 residual coding based on abs_level_gtxx_flag and residual levels

abs(level)	0	1	2	3	4	5	6	7	8	9	10	11	12	…
															abs_level_gtx0_flag	0	1	1	1	1	1	1	1	1	1	1	1	1	…
abs_level_gtx1_flag		0	1	1	1	1	1	1	1	1	1	1	1	…
															abs_level_gtx2_flag			0	1	1	1	1	1	1	1	1	1	1	…
abs_level_gtx3_flag				0	1	1	1	1	1	1	1	1	1	…
															The rest are					0	1	2	3	4	5	6	7	8	…

In AVS, the 0 th order exponential golomb binarization method is used to binarize the residual level. However, the 0 th order exponential golomb may not be optimal for the residual level of binarization, especially when the bit depth of the video samples is high, resulting in a higher bit rate of the encoded video. In one embodiment, it is proposed to use an exponential golomb binarization method of order 1 in residual level binarization to improve video coding performance.

Fig. 6 shows a k-order exponential golomb binarized codeword, where k is an integer, e.g., 0,1, 2. As can be seen from fig. 6, the lower order exponential golomb binarization works better for the remaining level with small values, while the higher order exponential golomb binarization works better for the remaining level with large values. For example, if the residual levels are distributed over a small number range (e.g., 0 to 2), then in the binarization scheme of FIG. 6, the 0 th order exponential Columbus binarization provides the smallest total number of bins to represent these residual levels, thus requiring the least number of bits. However, if the remaining levels at many locations are in the range of 3 to 5, the 1-order exponential golomb binarization may form a fewer number of binarized bins than the 0-order exponential golomb binarization. Also, as the value of the residual level becomes larger, higher order exponential golomb binarization provides better coding efficiency than lower order binarization schemes.

In one embodiment, an exponential golomb binarization of order 1 (i.e., in fig. 6, k=1) is used to binarize the remaining level after removal of (n+1) from the absolute level, where N is the maximum value that exists for the absolute level abs_level_gtxn_flag. In the latest AVS, N is 2. The proposed 1-order exponential golomb binarization can be used for Regular Residual Coding (RRC) and Transform Skip Residual Coding (TSRC). Alternatively, the 1 st order exponential golomb binarization may be used only for RRC or only for TSRC. In another example, k-th order exponential golomb binarization is used to binarize the remaining level, where k may be greater than 1.

Alternatively or additionally, adaptive exponential golomb binarization may be used to binarize the remaining level. In the adaptive binarization method, level information before the current position is used to determine the order k of the exponential golomb code to binarize the remaining level of the current position. For example, statistics (e.g., a sum, an average, or another statistic) of M absolute levels or remaining levels of previously encoded locations may be used to adaptively determine an exponential golomb binarized order k for the current location. Several threshold values t ₁ ，t ₂ ，...t _n (e.g., t ₁ <t ₂ <...<t _n ) The statistics may be used to classify the statistics into several categories, which are mapped to several values representing different values of the order k, respectively. Furthermore, to be hardware friendly, a special location template may be used to calculate statistics of M absolute levels of previously encoded locations. The special location template may ensure that certain locations are not used in calculating the statistics. For example, in some implementations, locations on the same scan line are processed in parallel. Therefore, to avoid breaking parallelism, the current position bits should not be used when calculating statisticsThose encoded locations on the same scan line. A special location template may achieve this. Fig. 6B depicts an example of a special position template used in adaptive binarization. In this example, the current location 602 in block 600 is shown in solid. The scan lines are shown using line 604. The template 606 includes five shadow locations. It can be seen that the locations in template 606 do not include locations along scan line 604. Thus, the template 606 may be used to determine the location for calculating statistics of the binarization order selections without breaking parallelism. For example, if the statistic falls at the threshold t ₁ And t ₂ Between, k=0 can be selected if the statistics fall within the threshold t ₂ And t ₃ And k=1, and so on.

Fig. 7 depicts an example of a process 700 for encoding a video partition according to some embodiments of the present disclosure. One or more computing devices (e.g., a computing device implementing the video encoder 100) implement the operations depicted in fig. 7 by executing appropriate program code (e.g., program code implementing the entropy encoding module 116). For purposes of illustration, the process 700 is described with reference to some examples depicted in the accompanying drawings. However, other implementations are also possible.

At block 702, process 700 involves obtaining a quantization level for a residual of a block in a video. The block may be a portion of a picture of the input video, such as the encoding unit 402 discussed in fig. 4, or any type of block that is processed as a unit by the video encoder when performing quantization and binarization.

At block 704, block 704 includes 706 through 708, process 700 involves processing each quantization level of the block to generate a binarization level of the block. At block 706, process 700 involves determining a remaining level of quantization levels. As discussed above, a video encoder may use syntax elements such as abs_level_gtxx_flag to indicate a quantization level. If the value of the quantization level is greater than the value that can be represented by these syntax elements, the video encoder may determine the remaining level for binarization as the quantization level minus the portion represented by the syntax elements. For example, for quantization level 6 in table 1, after subtracting the portion (i.e., 4) represented by the syntax elements (i.e., abs_level_gtx0_flag, abs_level_gtx1_flag, … abs_level_gtx3_flag) from quantization level 6, the remaining level is 2.

At block 708, the process 700 involves converting the remaining levels to a binary representation using a k-th order exponential golomb codeword. In some examples, k is 1, i.e., the remaining levels are binarized using exponential golomb binarization of order 1. In other examples, k is greater than 1, and the remaining level is binarized using higher order exponential golomb binarization. Binarization may be performed by converting the value of the remaining level indicated in the third column of the table shown in fig. 6A into binarization shown in the second column. For example, if exponential golomb binarization of order 1 is used and the remaining level is 5, then the binarization is "0111" according to fig. 6A. According to fig. 6A and the order of the exponential golomb binarization, other values of the remaining levels may be converted in a similar manner. As discussed in detail above, adaptive exponential golomb binarization may be used to binarize the remaining level. In this adaptive binarization method, the order of exponential golomb binarization is determined based on a quantization level or a residual level before the current position.

At block 710, process 700 involves encoding a binary representation of quantization levels in a block into a bitstream of video. For example, encoding may be performed using CABAC as discussed above.

Fig. 8 depicts an example of a process 800 for decoding video blocks according to some embodiments of the present disclosure. One or more computing devices implement the operations depicted in fig. 8 by executing appropriate program code. For example, a computing device implementing the video decoder 200 may implement the operations depicted in fig. 8 by executing program code of the entropy decoding module 216, the inverse quantization module 218, and the inverse transform module 219. For purposes of illustration, the process 800 is described with reference to some examples depicted in the drawings. However, other implementations are also possible.

At block 802, process 800 involves acquiring a binary string or binary representation of a block representing a video signal. The block may be a portion of a picture of the input video, such as the encoding unit 402 discussed in fig. 4, or any type of block that is processed as a unit by the video encoder when performing quantization and binarization.

At block 804, block 804 includes 806 through 810, process 800 involves processing the binary representation of the block to recover the quantization level in the block. At block 806, process 800 involves obtaining a portion of the binary representation corresponding to a quantization level in the block. At block 808, the process 800 involves converting a portion of the binary representation to a residual level using a k-th order exponential golomb codeword. In some examples, k is 1, and the remaining levels are recovered from the binary representation using exponential golomb binarization of order 1. In other examples, k is greater than 1, and higher order exponential golomb binarization is used to recover the remaining level. According to the mapping between the second column and the third column of the table shown in fig. 6A, the binarization may be performed by converting the binary representation into values of the remaining levels. For example, if exponential golomb binarization of order 1 is used and the binary string is "0111", the quantization level is 5 according to fig. 6. Other values of the remaining levels may be recovered in a similar manner according to fig. 6 and the order of the exponential golomb binarization. In some examples, adaptive exponential golomb binarization may be used to binarize the remaining level. In these examples, the decoder may first determine the order of the exponential golomb binarization based on the quantization level or residual level of the block or other blocks that have been decoded prior to the current residual level. The decoder then selects the appropriate exponential golomb binarization based on the determined order to convert the binary representation into the value of the remaining level.

At block 810, process 800 involves reconstructing the quantization level from the residual level and other syntax elements, such as abs_level_gtxx_flag discussed above. The decoder may parse the syntax elements from the portion of the binary representation and determine the values of the quantization levels corresponding to the syntax elements. The decoder may also determine that the quantization level is the sum of a value determined from the syntax element and the residual level.

At block 812, process 800 involves reconstructing the block by determining pixel values of the block from the quantization levels, e.g., by inverse quantization and inverse transform as discussed above with respect to fig. 2. Decoded blocks of video may be output for display.

Computing system examples for implementing dependency quantization for video coding

Any suitable computing system may be used to perform the operations described herein. For example, fig. 9 depicts an example of a computing device 900 that may implement video encoder 100 of fig. 1 or video decoder 200 of fig. 2. In some embodiments, computing device 900 may include a processor 912, with processor 912 communicatively coupled to memory 914 and executing computer-executable program code and/or accessing information stored in memory 914. The processor 912 may include a microprocessor, application specific integrated circuit ("ASIC"), state machine, or other processing device. The processor 912 may include any of a number of processing devices, including a single processing device. Such a processor may include, or may be in communication with, a computer-readable medium storing instructions that, when executed by the processor 912, cause the processor to perform the operations described herein.

Memory 914 may include any suitable non-transitory computer-readable medium. The computer readable medium may include any electronic, optical, magnetic, or other storage device that can provide computer readable instructions or other program code to a processor. Non-limiting examples of computer readable media include magnetic disks, memory chips, ROM, RAM, ASIC, configured processors, optical memory, magnetic tape, or other magnetic memory, or any other medium from which a computer processor may read instructions. The instructions may include processor-specific instructions generated by a compiler and/or interpreter in accordance with code written in any suitable computer programming language including, for example, C, C ++, c#, visual Basic, java, python, perl, javaScript, and ActionScript.

Computing device 900 may also include a bus 916. The bus 916 communicatively couples one or more components of the computing device 900. Computing device 900 may also include a number of external or internal devices, such as input or output devices. For example, computing device 900 is shown with an input/output ("I/O") interface 918, which I/O interface 918 may receive input from one or more input devices 920 or provide output to one or more output devices 922. One or more input devices 920 and one or more output devices 922 are communicatively coupled to the I/O interface 918. The communication coupling may be achieved by any suitable means (e.g., connection through a printed circuit board, connection through a cable, communication through wireless transmission, etc.). Non-limiting examples of input devices 920 include a touch screen (e.g., one or more cameras for imaging a touch area, or pressure sensors for detecting pressure changes caused by a touch), a mouse, a keyboard, or any other device that may be used to generate input events in response to physical actions of a user on a computing device. Non-limiting examples of output devices 922 include an LCD screen, an external monitor, speakers, or any other device that may be used to display or otherwise present output generated by a computing device.

Computing device 1300 may execute program code that configures processor 912 to perform one or more operations described above with respect to fig. 1-8. The program code may include the video encoder 100 or the video decoder 200. The program code may reside in the memory 914 or any suitable computer readable medium and may be executed by the processor 912 or any other suitable processor.

Computing device 900 may also include at least one network interface device 924. Network interface device 924 may include any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks 928. Non-limiting examples of network interface device 924 include an ethernet network adapter, modem, and the like. The computing device 900 may send messages as electronic or optical signals through the network interface device 924.

Overall consideration

Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, it will be understood by those skilled in the art that the claimed subject matter may be practiced without these specific details. In other instances, methods, devices, or systems that may be known by one of ordinary skill have not been described in detail so as not to obscure the claimed subject matter.

Unless specifically stated otherwise, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," or "identifying" or the like, refer to the action and processes of a computing device (e.g., one or more computers or similar electronic computing devices or devices) that manipulate or transform data represented as physical, electronic, or magnetic quantities within the computing device's memories, registers or other information storage, transmission or display devices.

The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. The computing device may include any suitable arrangement of components that provide results conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems that access storage software that programs or configures a computing system from a general-purpose computing device to a special-purpose computing device to implement one or more embodiments of the subject matter herein. Any suitable programming language, scripting language, or other type of language or combination of languages may be used to implement the teachings contained herein in software that is used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such a computing device. The order of the blocks presented in the above examples may be varied, e.g., the blocks may be reordered, combined, and/or broken into sub-blocks. Some blocks or processes may be performed in parallel.

The use of "adapted" or "configured to" herein means open and inclusive language and does not exclude apparatuses adapted or configured to perform additional tasks or steps. Furthermore, the use of "based on" means open and inclusive in that a process, step, calculation, or other action "based on" one or more recited conditions or values may in practice be based on additional conditions or values beyond the recited conditions or values. Headings, lists, and numbers included herein are for ease of explanation only and are not meant as limitations.

While the subject matter herein has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it is to be understood that the present disclosure has been presented for purposes of example, and not limitation, and does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art.

Claims

1. A method for decoding video, the method comprising:

obtaining a binary representation of a block of the video, the block of the video being associated with a plurality of quantization levels;

processing the binary representation to recover the plurality of quantization levels for the block, the processing comprising:

obtaining a portion of the binary representation corresponding to a quantization level of the plurality of quantization levels; and

converting the portion of the binary representation into the quantization level according to an exponential golomb binarization of order k, wherein k indicates an order of the exponential golomb binarization and is an integer greater than zero; and

reconstructing the block by determining pixel values of the block from the plurality of quantization levels.

2. The method of claim 1, wherein the converting the portion of the binary representation into the quantization level according to k-th order exponential golomb binarization comprises:

converting a first one of the portions of the binary representation to generate a first value of the quantization level;

converting a second one of the portions of the binary representation according to the k-th order exponential golomb binarization to generate a remaining level of the quantization level; and

The quantization level is obtained by adding the first value to the remaining levels of the quantization level.

3. The method of claim 1, wherein k is 1.

4. The method of claim 1, wherein k is greater than 1.

5. The method of claim 1, wherein the converting the portion of the binary representation into the quantization level according to k-th order exponential golomb binarization comprises:

determining a value of an order k of the exponential golomb binarization based at least in part on one or more quantization levels of the plurality of quantization levels associated with the block, the one or more quantization levels preceding the quantization level; and

the portion of the binary representation is converted to the quantization level according to the determined k-th order exponential golomb binarization.

6. The method of claim 1, wherein the block comprises a coding unit.

7. The method of claim 1, wherein the plurality of quantization levels associated with the block comprise a quantized transform signal of the block or a quantized non-transform signal of the block.

8. A non-transitory computer-readable medium having stored thereon program code executable by one or more processing devices to perform operations comprising:

Obtaining a binary representation of a block of video, the block of video being associated with a plurality of quantization levels;

converting the portion of the binary representation into the quantization level according to an exponential golomb binarization of order k, where k is an integer greater than zero; and

9. The non-transitory computer-readable medium of claim 8, wherein the converting the portion of the binary representation into the quantization level according to k-th order exponential golomb binarization comprises:

10. The non-transitory computer-readable medium of claim 8, wherein k is 1.

11. The non-transitory computer-readable medium of claim 8, wherein k is greater than 1.

12. The non-transitory computer-readable medium of claim 8, wherein the converting the portion of the binary representation into the quantization level according to k-th order exponential golomb binarization comprises:

13. The non-transitory computer-readable medium of claim 8, wherein the block comprises an encoding unit.

14. The non-transitory computer-readable medium of claim 8, wherein the plurality of quantization levels associated with the block comprise a quantized transform signal of the block or a quantized non-transform signal of the block.

15. A system, comprising:

A processing device; and

a non-transitory computer readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer readable medium and thereby perform operations comprising:

16. The system of claim 15, wherein the converting the portion of the binary representation into the quantization level according to k-th order exponential golomb binarization comprises:

17. The system of claim 15, wherein k is 1.

18. The system of claim 15, wherein k is greater than 1.

19. The system of claim 15, wherein the converting the portion of the binary representation string into the quantization level according to k-th order exponential golomb binarization comprises:

20. The system of claim 15, wherein the block comprises an encoding unit.

21. A method for encoding video, the method comprising:

Acquiring a plurality of quantization levels of a block of the video;

processing each of the plurality of quantization levels of the block to generate a binary representation of the plurality of quantization levels, the processing comprising:

determining a remaining level of the quantization level; and

converting the remaining levels of the quantization levels into a binary representation of the quantization levels according to an exponential golomb binarization of order k, wherein k indicates the order of the exponential golomb binarization and is an integer greater than zero; and

at least the binary representations of the plurality of quantization levels of the block are encoded into a bitstream of the video.

22. The method of claim 21, wherein k is 1.

23. The method of claim 21, wherein k is greater than 1.

24. The method of claim 21, wherein the converting the remaining levels of the quantization levels into binary representations according to k-th order exponential golomb binarization comprises:

determining a value of the exponential golomb binarized order k based at least in part on one or more quantization levels or residual levels of the plurality of quantization levels of the block, the one or more quantization levels or residual levels preceding the quantization level; and

The remaining levels of the quantization levels are converted to the binary representation according to the determined k-th order exponential golomb binarization.

25. The method of claim 21, wherein the block comprises a coding unit.

26. The method of claim 21, wherein the plurality of quantization levels of the block comprise a quantized transform signal of the block or a quantized non-transform signal of the block.

27. A non-transitory computer-readable medium having stored thereon program code executable by one or more processing devices to perform operations comprising:

acquiring a plurality of quantization levels of a block of video;

determining a remaining level of the quantization level; and

28. The non-transitory computer-readable medium of claim 27, wherein k is 1.

29. The non-transitory computer-readable medium of claim 27, wherein k is greater than 1.

30. The non-transitory computer-readable medium of claim 27, wherein the converting the remaining level of the quantization level to a binary representation according to an exponential golomb binarization of k-th order comprises:

31. The non-transitory computer-readable medium of claim 27, wherein the block comprises an encoding unit.

32. The non-transitory computer-readable medium of claim 27, wherein the plurality of quantization levels of the block comprise a quantized transform signal of the block or a quantized non-transform signal of the block.

33. A system, comprising:

a processing device; and

acquiring a plurality of quantization levels of a block of video;

determining a remaining level of the quantization level; and

34. The system of claim 33, wherein k is 1.

35. The system of claim 33, wherein k is greater than 1.

36. The system of claim 33, wherein the converting the remaining levels of the quantization levels to binary representations according to k-th order exponential golomb binarization comprises:

37. The system of claim 33, wherein the block comprises an encoding unit.

38. The system of claim 33, wherein the plurality of quantization levels of the block comprise a quantized transform signal of the block or a quantized non-transform signal of the block.