CN117223285A

CN117223285A - Bypass alignment in video coding

Info

Publication number: CN117223285A
Application number: CN202280029562.1A
Authority: CN
Inventors: 余越; 于浩平
Original assignee: Innopeak Technology Inc
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-26
Filing date: 2022-04-25
Publication date: 2023-12-12
Also published as: CN117203960A

Abstract

In certain aspects, a method for encoding an image of a video including a transform unit is disclosed. The processor quantizes the coefficients for each position in the transform unit to generate a quantization level for the corresponding position. The high throughput mode is enabled. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. In the high throughput mode, the processor encodes the quantization levels of the transform unit into a bitstream.

Description

Bypass alignment in video coding

Cross Reference to Related Applications

U.S. provisional application No.63/180,007 entitled "BYPASS ALIGNMENT METHOD FOR VIDEO CODING (bypass alignment method for video coding)" filed on month 26 of 2021, U.S. provisional application No.63/215,862 entitled "BYPASS ALIGNMENT METHOD FOR VIDEO CODING (bypass alignment method for video coding)" filed on month 6 of 2021, and U.S. provisional application No.63/216,447 entitled "BYPASS ALIGNMENT METHOD FOR VIDEO CODING (bypass alignment method for video coding)" filed on month 29 of 2021 are all incorporated herein by reference in their entirety.

Background

Embodiments of the present disclosure relate to video coding.

Digital video has become mainstream and is widely used in various applications including digital television, video telephony, and teleconferencing. These digital video applications are viable due to advances in computing and communication technology as well as efficient video coding techniques. Video data may be compressed using various video encoding techniques such that encoding of video data may be performed using one or more video encoding standards. Exemplary video coding standards may include, but are not limited to, universal video coding (versatile video coding, h.266/VVC), high-efficiency video coding (high-efficiency video coding, h.265/HEVC), advanced video coding (advanced video coding, h.264/AVC), moving picture experts group (moving picture expert group, MPEG) coding, and the like.

Disclosure of Invention

According to one aspect of the present disclosure, a method for encoding an image of a video including a transform unit is disclosed. The processor quantizes the coefficients for each position in the transform unit to generate a quantization level for the corresponding position. The high throughput mode is enabled. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment (bypass bit-alignment) is applied. In the high throughput mode, the processor encodes the quantization levels of the transform unit into a bitstream.

According to another aspect of the disclosure, a system for encoding an image of a video including a transform unit includes a memory configured to store instructions and a processor coupled to the memory. The processor is configured to, upon execution of the instructions, quantize the coefficients of each position in the transform unit to generate a quantization level for the respective position. The processor is further configured to enable a high throughput mode when executing the instructions. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The processor is further configured to encode the quantization level of the transform unit into the bitstream in a high throughput mode when executing the instructions.

According to yet another aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor, perform a process for encoding an image of a video that includes a transform unit is disclosed. The process includes quantizing the coefficients for each position in the encoded block to generate a quantization level for the corresponding position. The process also includes enabling a high throughput mode. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The process also includes encoding the quantization levels of the transform unit into the bitstream in a high throughput mode.

According to yet another aspect of the present disclosure, a method for decoding an image of a video including a transform unit is disclosed. The high throughput mode is enabled. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The processor decodes the bit stream to obtain a quantization level for each position in the transform unit in the high throughput mode. The quantization levels of the transform units are dequantized to generate coefficients for each position in the transform units.

According to yet another aspect of the disclosure, a system for decoding an image of a video including a transform unit includes a memory configured to store instructions and a processor coupled to the memory. The processor is configured to enable a high throughput mode when executing instructions. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The processor is further configured to, upon execution of the instructions, decode the bitstream to obtain a quantization level for each position in the transform unit in the high throughput mode. The processor is further configured to dequantize the quantization level of the transform unit when executing the instruction to generate coefficients for each position in the transform unit.

According to yet another aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor, perform a process for decoding an image of a video that includes a transform unit is disclosed. The process includes enabling a high throughput mode. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The process also includes decoding the bitstream to obtain a quantization level for each position in the transform unit in the high throughput mode. The process also includes dequantizing the quantization level of the transform unit to generate coefficients for each position in the transform unit.

These illustrative embodiments are not mentioned to limit or define the disclosure, but to provide examples to aid understanding of the disclosure. Additional embodiments are described in the specific embodiments and further description is provided.

Drawings

The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present disclosure and, together with the description, further serve to explain the principles of the disclosure and to enable a person skilled in the pertinent art to make and use the disclosure.

Fig. 1 illustrates a block diagram of an exemplary encoding system according to some embodiments of the present disclosure.

Fig. 2 illustrates a block diagram of an exemplary decoding system, according to some embodiments of the present disclosure.

Fig. 3 illustrates a detailed block diagram of an exemplary encoder in the encoding system of fig. 1, according to some embodiments of the present disclosure.

Fig. 4 illustrates a detailed block diagram of an exemplary decoder in the decoding system in fig. 2, according to some embodiments of the present disclosure.

Fig. 5 illustrates an exemplary image divided into Coding Tree Units (CTUs) according to some embodiments of the present disclosure.

Fig. 6 illustrates an exemplary CTU partitioned into Coding Units (CUs) according to some embodiments of the present disclosure.

Fig. 7A illustrates an exemplary transform block encoded using conventional residual coding (regular residual coding, RRC) according to some embodiments of the present disclosure.

Fig. 7B illustrates an exemplary transform skip block encoded using transform skip residual coding (transform skip residual coding, TSRC) according to some embodiments of the present disclosure.

Fig. 8A and 8B show the coding procedure in RRC and TSRC, respectively.

Fig. 9A illustrates an exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure.

Fig. 9B illustrates another exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure.

Fig. 9C illustrates yet another exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure.

Fig. 9D illustrates an exemplary bypass alignment scheme in Transform Unit (TU) coding and RRC according to some embodiments of the present disclosure.

Fig. 10A illustrates an exemplary bypass alignment scheme in a TSRC according to some embodiments of the present disclosure.

Fig. 10B illustrates an exemplary bypass alignment scheme in TU coding and TSRC according to some embodiments of the present disclosure.

Fig. 11 illustrates a flowchart of an exemplary method of video encoding according to some embodiments of the present disclosure.

Fig. 12 illustrates a flowchart of an exemplary method of video decoding according to some embodiments of the present disclosure.

Fig. 13 illustrates a flowchart of another exemplary method of video encoding according to some embodiments of the present disclosure.

Fig. 14 illustrates a flowchart of another exemplary method of video decoding according to some embodiments of the present disclosure.

Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Detailed Description

While some configurations and arrangements are discussed, it should be understood that this is done for illustrative purposes only. Those skilled in the art will recognize that other configurations and arrangements may be used without departing from the spirit and scope of the present disclosure. It will be apparent to those skilled in the relevant art that the present disclosure may also be used in a variety of other applications.

It should be noted that references in the specification to "one embodiment," "an example embodiment," "some embodiments," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Furthermore, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the relevant art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

Generally, terms are to be understood, at least in part, from usage in the context. For example, the term "one or more" as used in this disclosure may be used to describe any feature, structure, or characteristic in the singular or may be used to describe a combination of features, structures, or characteristics in the plural, depending at least in part on the context. Similarly, terms such as "a," "an," or "the" may also be understood to convey a singular usage or a plural usage, depending at least in part on the context. Furthermore, the term "based on" may be understood as not necessarily conveying an exclusive set of factors, but may allow for other factors to be present that are not necessarily explicitly described, also depending at least in part on the context.

Various aspects of a video encoding system will now be described with reference to various apparatus and methods. These apparatus and methods will be described in the following detailed description and illustrated in the figures by various modules, components, circuits, steps, operations, processes, algorithms, etc. (collectively referred to as "elements"). These elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on the overall system.

The techniques described in this disclosure may be used for various video coding applications. As described herein, video coding (coding) includes encoding and decoding video. Encoding and decoding of video may be performed in units of blocks. For example, encoding/decoding processes such as transform, quantization, prediction, in-loop filtering, reconstruction, etc. may be performed on the encoded block, the transformed block, or the predicted block. As described in this disclosure, the block to be encoded/decoded will be referred to as a "current block". For example, the current block may represent an encoded block, a transformed block, or a predicted block according to the current encoding/decoding process. Furthermore, it should be understood that the term "unit" as used in this disclosure indicates a basic unit for performing a specific encoding/decoding process, and the term "block" indicates a predetermined-sized sample array. Unless otherwise indicated, "block" and "unit" may be used interchangeably.

In video coding, quantization is used to reduce the dynamic range of a video signal, either transformed or not, so that fewer bits are used to represent the video signal. Prior to quantization, the transformed or untransformed video signal at a particular location is referred to as a "coefficient. After quantization, the quantized value of the coefficient is referred to as a "quantization level (quantization level)" or "level". In this disclosure, a quantization level of a location refers to a quantization level of a coefficient of the location. In video coding, residual coding is used to convert the quantization level of a position into a bit stream. After quantization, there are n×m quantization levels for n×m coded blocks. These nxm quantization levels may be zero or non-zero values. If the non-zero level is not binary, the non-zero level will be further binarized into binary bits (bin).

For example, context-adaptive modeling based binary arithmetic coding (context-adaptive modeling based binary arithmetic coding, CABAC) for h.266/VVC, h.265/HEVC and h.264/AVC uses binary bits to encode the quantization levels of the positions into bits. CABAC uses two coding methods based on context modeling. The context-based method adaptively updates a context model based on neighboring coding information. The binary bits encoded in this way are referred to as context-encoded bits (CCBs). In contrast, another bypass approach assumes that the probability of 1 or 0 is always 50% and therefore always uses fixed context modeling without adaptation. The binary bits encoded by this method are referred to as bypass-encoded bits (BCB).

Throughput becomes a more serious problem for high bit depth and high bit rate video coding. However, encoding of binary bits using context encoding requires a relatively complex hardware implementation compared to encoding bypass encoded binary bits and generally reduces the throughput of video encoding, and thus encoding of binary bits using context encoding has become a bottleneck to improving the throughput of high bit depth and high bit rate video encoding.

In order to improve the throughput of video coding, particularly high bit depth and high bit rate video coding, the present disclosure provides various schemes for bypass coding and bit alignment in video coding. For example, for applications requiring high bit depth and high bit rate video coding to obtain better throughput, the high throughput mode may be enabled during residual coding as needed.

In some embodiments, in high throughput mode, some or all of the context-coded bits used for residual coding may be changed to bypass-coded bits. In some embodiments, in high throughput mode, some or all of the context coded binary bits for residual coding may be skipped. Thus, in high throughput mode, only bypass encoded binary bits may be used for residual encoding.

Furthermore, since bypass coding can be implemented by a shift operation instead of undergoing a conventional CABAC operation after applying bit alignment, which can implement simultaneous coding using a plurality of bypass-coded binary bits, bypass bit alignment can be applied in a high throughput mode to further improve the throughput of bypass coding. In high throughput mode, bypass bit alignment may be invoked at different stages of residual coding, e.g., the start of the coding process of the current coding block, the start of the coding process of the transform unit, etc., as desired.

The high throughput mode may be enabled at various levels during residual coding, such as the coding block level or the transform unit level, as desired. The high throughput mode may further extend from residual coding to some or all of the other context-coded bits used in video coding, such as motion vector dependent bits.

Fig. 1 illustrates a block diagram of an exemplary encoding system 100, according to some embodiments of the present disclosure. Fig. 2 illustrates a block diagram of an exemplary decoding system 200, according to some embodiments of the present disclosure. Each system 100 or 200 may be applied or integrated into a variety of systems and devices capable of data processing, such as computers and wireless communication devices. For example, the system 100 or 200 may be all or part of a mobile phone, a desktop computer, a laptop computer, a tablet computer, a vehicle computer, a game console, a printer, a pointing device, a wearable electronic device, a smart sensor, a Virtual Reality (VR) device, an Augmented Reality (AR) device, or any other suitable electronic device having data processing capabilities. As shown in fig. 1 and 2, the system 100 or 200 may include a processor 102, a memory 104, and an interface 106. These components are shown as being interconnected by a bus, but other connection types are also permissible. It should be appreciated that the system 100 or 200 may include any other suitable components for performing the functions described in this disclosure.

The processor 102 may include a microprocessor, such as a graphics processing unit (graphic processing unit, GPU), an image signal processor (image signal processor, ISP), a central processing unit (central processing unit, CPU), a digital signal processor (digital signal processor, DSP), a tensor processing unit (tensor processing unit, TPU), a vision processing unit (vision processing unit, VPU), a neural processing unit (neural processing unit, NPU), a synergistic processing unit (synergistic processing unit, SPU), or a physical processing unit (physics processing unit, PPU), a microcontroller unit (microcontroller unit, MCU), an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA), a programmable logic device (programmable logic device, PLD), a state machine, gating logic, discrete hardware circuitry, or other suitable hardware configured to perform the various functions described in this disclosure. Although only one processor is shown in fig. 1 and 2, it will be appreciated that multiple processors may be included. Processor 102 may be a hardware device having one or more processing cores. The processor 102 may execute software. Software should be construed broadly to mean any software, firmware, middleware, microcode, hardware description language, or other instructions, instruction sets, code segments, program code, programs, subroutines, software modules, applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like. The software may include computer instructions written in an interpreted language, compiled language, or machine code. Other techniques for indicating hardware are also permissible under a broad class of software.

The memory 104 may broadly include both memory (also known as main/system memory) and storage (also known as secondary memory). For example, the memory 104 may include random-access memory (RAM), read-only memory (ROM), static RAM (SRAM), dynamic RAM (DRAM), ferroelectric RAM (ferro-electric RAM, FRAM), electrically erasable programmable ROM (electrically erasable programmable ROM), compact disc read-only memory (CD-ROM) or other optical disc memory, hard Disk Drive (HDD), such as magnetic disk memory or other magnetic storage device, flash memory drive, solid State Drive (SSD), or any other medium that may be used to carry or store the desired program code in the form of instructions that may be accessed and executed by the processor 102. In general, the memory 104 may be implemented by any computer-readable medium, such as a non-transitory computer-readable medium. Although only one memory is shown in fig. 1 and 2, it is understood that a plurality of memories may be included.

Interface 106 may broadly comprise a data interface and a communication interface configured to receive and transmit signals in the course of receiving and transmitting information with other external network elements. For example, interface 106 may include input/output (I/O) devices and wired or wireless transceivers. Although only one memory is shown in fig. 1 and 2, it will be appreciated that multiple interfaces may be included.

The processor 102, memory 104, and interface 106 may be implemented in various forms in the system 100 or 200 for performing video encoding functions. In some embodiments, the processor 102, memory 104, and interface 106 of the system 100 or 200 are implemented (e.g., integrated) on one or more systems-on-chips (SOCs). In one example, the processor 102, memory 104, and interface 106 may be integrated on an application processor (application processor, AP) SoC that processes applications (including running video encoding and decoding applications) in an Operating System (OS) environment. In another example, the processor 102, memory 104, and interface 106 may be integrated on a dedicated processor chip for video encoding, such as a GPU or ISP chip dedicated to image and video processing in a real-time operating system (real-time operating system, RTOS).

As shown in fig. 1, in an encoding system 100, a processor 102 may include one or more modules, such as an encoder 101. Although fig. 1 shows encoder 101 within one processor 102, it is understood that encoder 101 may include one or more sub-modules that may be implemented on different processors that are close to or remote from each other. Encoder 101 (and any corresponding sub-modules or sub-units) may be a hardware unit (e.g., part of an integrated circuit) of processor 102 designed for use with other components or software units implemented by processor 102 by executing at least a portion of a program, i.e., instructions. The instructions of the program may be stored on a computer readable medium, such as the memory 104, and when executed by the processor 102, may perform processes having one or more functions related to video coding, such as image partitioning, inter prediction, intra prediction, transformation, quantization, filtering, entropy coding, and the like, described in detail below.

Similarly, as shown in fig. 2, in decoding system 200, processor 102 may include one or more modules, such as decoder 201. Although fig. 2 shows decoder 201 within one processor 102, it is understood that decoder 201 may include one or more sub-modules that may be implemented on different processors that are close or remote from each other. Decoder 201 (and any corresponding sub-modules or sub-units) may be a hardware unit (e.g., part of an integrated circuit) of processor 102 designed for use with processor 102 by executing at least a portion of a program, i.e., instructions, to implement use with other components or software units. The instructions of the program may be stored on a computer readable medium, such as the memory 104, and when executed by the processor 102, may perform processes having one or more functions related to video decoding, such as entropy decoding, inverse quantization, inverse transformation, inter-prediction, intra-prediction, filtering, described in detail below.

Fig. 3 illustrates a detailed block diagram of an exemplary encoder 101 in the encoding system 100 of fig. 1, according to some embodiments of the present disclosure. As shown in fig. 3, encoder 101 may include a partitioning module 302, an inter-prediction module 304, an intra-prediction module 306, a transform module 308, a quantization module 310, a dequantization module 312, an inverse transform module 314, a filter module 316, a buffer module 318, and an encoding module 320. It should be understood that each element shown in fig. 3 is shown separately to represent different characteristic functions from each other in the video encoder, and this does not mean that each component is formed by a configuration unit of separate hardware or a single software. That is, for convenience of explanation, each element is included and listed as one element, and at least two elements may be combined to form a single element, or one element may be divided into a plurality of elements to perform a function. It should also be understood that some elements are not necessary elements to perform the functions described in this disclosure, but may be optional elements for improved performance. It should also be understood that these elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on encoder 101.

The dividing module 302 may be configured to divide an input image of a video into at least one processing unit. The image may be a frame of video or a field of video. In some embodiments, the image includes an array of luma samples in a monochrome format, or an array of luma samples and two corresponding arrays of chroma samples. At this time, the processing unit may be a Prediction Unit (PU), a Transform Unit (TU), or a Coding Unit (CU). The division module 302 may divide an image into a plurality of combinations of coding units, prediction units, and transform units, and encode the image by selecting a combination of the coding units, the prediction units, and the transform units based on a predetermined criterion (e.g., a cost function).

Similar to H.265/HEVC, H.266/VVC is a block-based hybrid spatial-temporal prediction coding scheme. As shown in fig. 5, during encoding, an input image 500 is first divided into square block-CTUs 502 by a dividing module 302. For example, CTU 502 may be a 128 x 128 pixel block. As shown in fig. 6, each CTU 502 in the image 500 may be partitioned by the partitioning module 302 into one or more CUs 602, which CUs 602 may be used for prediction and transformation. Unlike h.265/HEVC, in h.266/VVC, CU 602 may be rectangular or square, and may be encoded without further division into prediction units or transform units. For example, as shown in fig. 6, partitioning CTU 502 into CUs 602 may include quadtree partitioning (shown in solid lines), binary tree partitioning (shown in dashed lines), and trigeminal tree partitioning (shown in dashed lines). According to some embodiments, each CU 602 may be as large as its root CTU 502, or a subunit (sub) of the root CTU 502, i.e., a block as small as 4×4.

Referring to fig. 3, the inter prediction module 304 may be configured to perform inter prediction on a prediction unit, and the intra prediction module 306 may be configured to perform intra prediction on a prediction unit. It is possible to determine whether inter prediction or intra prediction is used or performed for the prediction unit, and to determine specific information (e.g., intra prediction mode, motion vector, reference picture, etc.) according to each prediction method. At this time, the processing unit for performing prediction may be different from the processing unit for determining the prediction method and the specific content. For example, a prediction method and a prediction mode may be determined in a prediction unit, and prediction may be performed in a transform unit. Residual coefficients in the residual block between the generated prediction block and the original block may be input into the transform module 308. In addition, prediction mode information, motion vector information, etc. for prediction may be encoded into the bitstream by the encoding module 320 along with residual coefficients or quantization levels. It should be appreciated that in some coding modes, the original block may be encoded as is without generating the prediction block by the prediction module 304 or 306. It should also be appreciated that in some coding modes, prediction, transform, and/or quantization may also be skipped.

In some embodiments, the inter prediction module 304 may predict the prediction unit based on information of at least one of the images before or after the current image, and in some cases, may predict the prediction unit based on information of a partial region that has been encoded in the current image. The inter prediction module 304 may include sub-modules such as a reference picture interpolation module, a motion prediction module, and a motion compensation module (not shown). For example, the reference image interpolation module may receive reference image information from the buffer module 318 and generate pixel information for an integer number of pixels or less from the reference image. In the case of luminance pixels, pixel information of an integer number of pixels or less in units of 1/4 pixel may be generated using an 8-tap interpolation filter based on discrete cosine transform (discrete cosine transform, DCT) with varying filter coefficients. In the case of the color difference signal, a DCT-based 4-tap interpolation filter having a varying filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of 1/8 pixel. The motion prediction module may perform motion prediction based on the reference picture interpolated by the reference picture interpolation section. Various methods such as a full search-based block matching algorithm (FBMA), a three-step search (TSS), and a new three-step search (NTS) algorithm may be used as a method of calculating a motion vector. The motion vector may have a motion vector value of 1/2, 1/4, or 1/16 pixels or an integer number of pixels based on the interpolated pixel. The motion prediction module may predict the current prediction unit by changing a motion prediction method. As the motion prediction method, various methods such as a skip method, a merge method, an advanced motion vector prediction (advanced motion vector prediction, AMVP) method, an intra-block copy method, and the like may be used.

In some embodiments, the intra prediction module 306 may generate the prediction unit based on information of reference pixels surrounding the current block (i.e., pixel information in the current image). When a block in the neighborhood of the current prediction unit is a block on which inter prediction has been performed and thus the reference pixel is a pixel on which inter prediction has been performed, the reference pixel information of a block in the neighborhood on which intra prediction has been performed may be replaced with the reference pixel included in the block on which inter prediction has been performed. That is, when reference pixels are not available, at least one of the available reference pixels may be used instead of the unavailable reference pixel information. In intra prediction, the prediction mode may have an angular prediction mode using reference pixel information according to a prediction direction and a non-angular prediction mode not using direction information when performing prediction. The mode for predicting the luminance information may be different from the mode for predicting the color difference information, and the intra prediction mode information for predicting the luminance information or the predicted luminance signal information may be used for predicting the color difference information. If the size of the prediction unit is the same as the size of the transform unit when intra prediction is performed, intra prediction may be performed on the prediction unit based on a pixel on the left side, a pixel on the upper left side, and a pixel on the top of the prediction unit. However, if the size of the prediction unit is different from the size of the transform unit when performing intra prediction, intra prediction may be performed using reference pixels based on the transform unit.

The intra prediction method may generate a prediction block after applying an adaptive intra smoothing (adaptive intra smoothing, AIS) filter to the reference pixels according to a prediction mode. The type of AIS filter applied to the reference pixels may vary. In order to perform the intra prediction method, the intra prediction mode of the current prediction unit may be predicted according to the intra prediction modes of the prediction units existing in the neighborhood of the current prediction unit. When the prediction mode of the current prediction unit is predicted using the mode information predicted from the neighboring prediction unit, if the intra prediction mode of the current prediction unit is identical to the intra prediction mode of the prediction unit in the neighborhood, information indicating that the prediction mode of the current prediction unit is identical to the prediction mode of the prediction unit in the neighborhood may be transmitted using predetermined flag information, and if the prediction mode of the current prediction unit and the prediction mode of the prediction unit in the neighborhood are different from each other, the prediction mode information of the current block may be encoded by additional flag information.

As shown in fig. 3, a residual block including a prediction unit performing prediction based on the prediction unit generated by the prediction module 304 or 306 and residual coefficient information that is a difference value of the prediction unit from the original block may be generated. The generated residual block may be input into transform module 308.

The transform module 308 may be configured to transform a residual block including the original block and residual coefficient information of the prediction units generated by the prediction modules 304 and 306 using a transform method such as DCT, discrete sine transform (discrete sine transform, DST), karhunen-loeve transform (KLT), or transform skip. Whether to apply DCT, DST, or KLT to transform the residual block may be determined based on intra prediction mode information of a prediction unit used to generate the residual block. The transform module 308 may transform the video signal in the residual block from the pixel domain to a transform domain (e.g., a frequency domain depending on the transform method). It should be appreciated that in some examples, the transform module 308 may be skipped and the video signal may not be transformed to the transform domain.

The quantization module 310 may be configured to quantize the coefficients of each position in the encoded block to generate a quantization level for the position. The current block may be a residual block. That is, the quantization module 310 may perform quantization processing on each residual block. The residual block may include N x M positions (samples), each position being associated with a video signal/data (e.g., luminance and/or chrominance information) that is transformed or untransformed, where N and M are positive integers. In the present disclosure, prior to quantization, the video signal, either transformed or not, at a particular location is referred to as a "coefficient" in the present disclosure. After quantization, the quantized value of the coefficient is referred to as a "quantization level" or "level" in this disclosure.

Quantization may be used to reduce the dynamic range of a transformed or untransformed video signal, thereby using fewer bits to represent the video signal. Quantization generally involves dividing by a quantization step and then rounding, while dequantization (also known as inverse quantization) involves multiplying by the quantization step. This quantization process is known as scalar quantization. Quantization of all coefficients within a coded block can be done independently, which is used in some existing video compression standards (e.g., h.264/AVC and h.265/HEVC).

For n×m encoded blocks, two-dimensional (2D) coefficients of the blocks may be converted into one-dimensional (1D) sequences using a specific encoding scan order for coefficient quantization and encoding. Typically, the encoding scan starts at the top left corner and stops at the last non-zero coefficient/level in the bottom right corner or bottom right direction of the encoding block. It should be appreciated that the encoding scan order may include any suitable order, such as a zig-zag scan order, a vertical (column) scan order, a horizontal (row) scan order, a diagonal scan order, or any combination thereof. Quantization of coefficients within a coded block may utilize coded scan order information. This may depend, for example, on the state of the previous quantization level along the coding scan order. To further increase coding efficiency, quantization module 310 may use more than one quantizer, such as two scalar quantizers. Which quantizer to use to quantize the current coefficient may depend on information preceding the current coefficient in the coding scan order. This quantization process is called correlation quantization.

Referring to fig. 3, the encoding module 320 may be configured to encode a quantization level at each position in the encoded block into the bitstream. In some embodiments, the encoding module 320 may perform entropy encoding on the encoded blocks. Entropy encoding may use various binarization methods, such as exponential Golomb (exponential-Golomb) encoding, and the binarized binary bits may be further encoded by context-adaptive variable length coding (CAVLC), CABAC, or the like. In addition to the quantization level, the encoding module 320 may encode various other information such as block type information of an encoding unit, prediction mode information, partition unit information, prediction unit information, transmission unit information, motion vector information, reference frame information, block interpolation information, and filtering information input from, for example, the prediction modules 304 and 306. In some embodiments, the encoding module 320 may perform residual encoding on the encoded blocks to convert the quantization levels into a bitstream. For example, after quantization, there may be n×m quantization levels for one n×m block. These nxm stages may be zero or non-zero values. If the non-zero order is not binary, the non-zero order may be further binarized into binary bits, for example using CABAC.

Non-binary syntax elements may be mapped to binary codewords. The bijective mapping between symbols and codewords typically uses a simple structured code and is referred to as binarization. Binary syntax elements and binary symbols (also referred to as binary bits) of codewords of non-binary data may be encoded using binary arithmetic coding. The core encoding engine of CABAC may support two modes of operation: a context coding mode (in which binary bits are coded using an adaptive probability model), and a low complexity bypass mode using a fixed probability of 1/2. The assignment of the probability model to individual binary bits is referred to as context modeling.

In h.266/VVC, the coding block is a transform block coded using RRC, according to some aspects of the present disclosure. Transform blocks greater than 4 x 4 may be divided into separate 4 x 4 sub-blocks that are processed using an inverse diagonal scan pattern. It is understood that h.266/VVC supports non-4 x 4 sub-blocks due to the support of transform blocks of non-square rectangular shape. For ease of description and without loss of generality, fig. 7A depicts an example of a 16 x 16 transform block, wherein the transform block is further divided into 4 x 4 sub-blocks. The inverse diagonal scan pattern is used to process sub-blocks of the transform block and to process frequency locations within each sub-block.

In RRC, the last non-zero level position (also referred to as the last valid scan position) may be defined as the last non-zero level position along the code scan order. The last non-zero level 2D coordinates (last sig coeff x and last sig coeff y) may be encoded first with up to four syntax elements, i.e. binary bits are encoded using up to four residuals: two context-coded binary bits-last two significant coefficient prefixes (last_sig_coeff_x_prefix and last_sig_coeff_y_prefix), and two bypass-coded binary bits-last two significant coefficient suffixes (last_sig_coeff_x_suffix and last_sig_coeff_x_suffix). Within a sub-block, the RRC may first encode a context-encoded binary-bit-encoded sub-block flag (sb_coded_flag) to indicate whether all levels of the current sub-block are equal to zero. For example, if sb_coded_flag is equal to 1, there may be at least one non-zero coefficient in the current sub-block. If the sb_coded_flag is equal to 0, all coefficients in the current sub-block will be zero. It should be appreciated that the sb_coded_flag of the last non-zero sub-block with the last non-zero level may be derived from last_sig_coeff_x and last_sig_coeff_y according to the coding scan order without being coded into the bitstream. Other sb_coded_flags may be encoded as context-encoded binary bits. The RRC may start sub-block by sub-block coding in reverse coding scan order from the last non-zero sub-block.

To guarantee worst-case throughput, the value of the remaining context-encoded bits (rembinstpass 1) may be used to limit the maximum number of context-encoded bits. The initial value of rembinstpass 1 may be calculated based at least in part on the length and width of the encoded block. Within the sub-block, the RRC may encode the level of each location in reverse coding scan order. A predefined threshold may be compared to rembinstpass 1 to determine if the maximum number of context-encoded binary bits has been reached. For example, the threshold for remBinsPass1 in H.266/VVC may be predefined as 4.

As shown in FIG. 8A, if remBinsPass1 is not less than 4 ("residual CCB. Gtoreq.4" in FIG. 8A), when encoding the quantization level for each position of the sub-block ("SB" in FIG. 8A), a valid flag ("sig_coeff_flag," sig "in FIG. 8A) may be encoded into the bitstream first to indicate whether the level is zero or non-zero. If the level is not zero, a flag (abs_level_gtx_flag [ n ] [0] greater than 1, where n is an index along the scan order of the current location within the sub-block, "gt1" in FIG. 8A) may be encoded into the bitstream to indicate whether the absolute level is 1 or greater than 1. If the absolute level is greater than 1, a parity flag ("par" in FIG. 8A) may be encoded into the bitstream to indicate whether the level is odd or even, and then a greater than flag ("gt" in FIG. 8A) may be present. The flags par_level_flag and abs_level_gtx_flag [ n ] [1] may also be used together to indicate that the level is 2, 3, or greater than 3. After each of the above syntax elements is encoded using a context encoding method (i.e., context-encoded binary bits), the value of rembinstpass 1 may be reduced by 1. In other words, in the first encoding process (process 1 (pass 1) in fig. 8A), for each position of each sub-block, a valid flag, a greater than 1 flag, a parity flag, and a greater than flag may be encoded as context-encoded binary bits.

If the absolute level is greater than 5 or the value of rembinsspass 1 is less than 4, two other syntax elements, i.e., a residual (abs_remainders, "rem" in fig. 8A) and an absolute level (dec_abs_level, "decabs level" in fig. 8A) may be encoded as bypass-encoded binary bits in the second encoding process (process 2 "in fig. 8A) and the third encoding process (process 3" in fig. 8A), respectively, for the residual level after encoding the aforementioned context-encoded binary bits. In addition, a coefficient sign flag (coeff_sign_flag, fig. 8A) of each non-zero level may also be encoded as bypass-encoded binary bits in a fourth encoding process (fig. 8A, process 4) to sufficiently represent the quantization level.

In some embodiments, a more general residual coding method uses a greater-than-level flag (abs_level_gtxx_flag) and remaining-level binary bits to allow conditional parsing of syntax elements for level coding of transform blocks, and binarization of their corresponding level absolute values is shown in table I below. Here abs_level_gtxx_flag describes whether the absolute value of a level is greater than X, where X is an integer, e.g. 0, 1, 2. If abs_level_gtxX_flag is 0, where X is an integer between 0 and N-1, then abs_level_gtx (X+1) _flag is not present. If abs_level_gtxX_flag is 1, then abs_level_gtx (X+1) _flag will be present. Further, if abs_level_gtxn_flag is 0, there is no remaining portion. When abs_level_gtxn_flag is 1, then there is a remainder that represents the value after (n+1) is removed from the stage. Typically, abs_level_gtxx_flag may be encoded as context-encoded bits, while the remaining level bits are encoded as bypass-encoded bits.

Table I residual coding based on abs_level_gtxx_flag bits and residual bits

abs(lvl)	0	1	2	3	4	5	6	7	8	9	...
												abs_level_gtx0_flag	0	1	1	1	1	1	1	1	1	1	...
abs_level_gtx1_flag		0	1	1	1	1	1	1	1	1	...
												abs_level_gtx2_flag			0	1	1	1	1	1	1	1	...
abs_level_gtx3_flag				0	1	1	1	1	1	1	...
												abs_remainder					0	1	2	3	4	5	...

In h.266/VVC, the coding block is a transform skip block coded using TSRC, according to some aspects of the present disclosure. Transform skip blocks greater than 4 x 4 may be divided into separate 4 x 4 sub-blocks that are processed using an inverse diagonal scan mode. It is understood that h.266/VVC supports non-4 x 4 sub-blocks due to the support of transform blocks of non-square rectangular shape. For ease of description and without loss of generality, fig. 7B depicts an example of a 16 x 16 transform skip block, wherein the transform skip block is divided into 4 x 4 sub-blocks. One difference between TSRC and RRC is the reverse scan order. As shown in fig. 7B, diagonal scanning may be used in a forward manner (and not in reverse order as in RRC).

Furthermore, unlike RRC, which encodes the last valid scanning position into the bitstream, in TSRC, the last valid scanning position may not be encoded and all scanning positions of the transform skip block may be encoded. Similar to RRC, in TSRC, a coded sub-block flag (sb_coded_flag) may be used to indicate whether all quantization levels of the current sub-block are equal to zero. Furthermore, to guarantee worst-case throughput, the maximum number of context-coded bits is limited using the value of the remaining context-coded bits (RemCcbs). The predefined threshold may be compared to RemCcbs to determine if the maximum number of context-encoded binary bits has been reached. For example, the threshold for RemCcbs in H.266/VVC may be predefined as 4.

As shown in FIG. 8B, if RemCcbs is not less than 4 ("residual CCB. Gtoreq.4" in FIG. 8B), then for each stage in each sub-block, a valid flag (sig_coeff_flag, "sig" in FIG. 8B) may be first encoded into the bitstream to indicate whether the stage is zero or non-zero. If the level is not zero, a coefficient sign flag (coeff_sign_flag, "sign" in FIG. 8B) may be encoded to indicate whether the level is positive or negative. A flag (abs_level_gtx_flag [ n ] [0] greater than 1, where n is an index along the scan order of the current position within the sub-block, "gt1" in fig. 8B) may then be encoded to indicate whether the current absolute level of the current position is greater than 1. If abs_level_gtx_flag [ n ] [0] is not zero, a parity flag (par_level_flag, "par" in FIG. 8B) may be encoded. After each of the above syntax elements is encoded using a context encoding method (i.e., context-encoded binary bits), the value of remcbs may be reduced by 1. In other words, in the first encoding process ("process 1" in fig. 8B), the significant flag, the coefficient symbol flag, the greater than 1 flag, and the parity flag may be encoded as context-encoded binary bits for each position of each sub-block.

After encoding the above syntax elements for all locations within the current sub-block, if remcbs is still not less than 4, a maximum of 4 greater than flags (abs_level_gtx_flag [ n ] [ j ], where n is an index along the scan order of the current location within the sub-block, j is from 1 to 4, "gt3, gt5, gt7, and gt9" in fig. 8B may be encoded as context-encoded binary bits in the second encoding process ("process 2" in fig. 8B). After each abs_level_gtx_flag [ n ] [ j ] is encoded in the second encoding process, the value of remcbs may be reduced by 1. If remcbs is not less than 4, in the third encoding process ("process 3" in fig. 8B), the remaining portion (abs_remainders, "rem" in fig. 8B) may be encoded as bypass-encoded binary bits for the current position within the sub-block, if necessary. For those positions where the absolute level (dec_abs_level, "decAbsLevel" in fig. 8B) is fully encoded as bypass encoded bits, coeff_sign_flags may also be encoded as bypass encoded bits in the fourth encoding process ("process 4" in fig. 8B).

Referring back to fig. 3, as shown in fig. 3, the dequantization module 312 may be configured to dequantize the quantization levels by the dequantization module 312, and the inverse transformation module 314 may be configured to inverse transform the coefficients transformed by the transformation module 308. The reconstructed residual block generated by the dequantization module 312 and the inverse transform module 314 may be combined with the prediction unit predicted by the prediction module 304 or 306 to generate a reconstructed block.

The filter module 316 may include at least one of a deblocking filter, an offset correction module, and an adaptive loop filter (adaptive loop filter, ALF). The deblocking filter may remove block distortion generated by boundaries between blocks in the reconstructed image. The offset correction module may correct an offset to the original video in units of pixels for the video on which deblocking has been performed. ALF may be performed based on values obtained by comparing the reconstructed and filtered video with the original video. The buffer module 318 may be configured to store the reconstructed block or image calculated by the filter module 316 and, when performing inter prediction, may provide the reconstructed and stored block or image to the inter prediction module 304.

Fig. 4 illustrates a detailed block diagram of an exemplary decoder 201 in the decoding system 200 in fig. 2, according to some embodiments of the present disclosure. As shown in fig. 4, decoder 201 may include a decoding module 402, a dequantizing module 404, an inverse transform module 406, an inter prediction module 408, an intra prediction module 410, a filter module 412, and a buffer module 414. It should be understood that each element shown in fig. 4 is shown separately to represent different feature functions from each other in the video decoder, but this does not mean that each component is formed by a separate hardware configuration unit or a single software. That is, for convenience of explanation, each element is included and listed as one element, and at least two elements may be combined to form a single element, or one element may be divided into a plurality of elements to perform a function. It should also be understood that some elements are not necessary elements to perform the functions described in this disclosure, but may be optional elements for improved performance. It should also be understood that these elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on decoder 201.

When a video bitstream is input from a video encoder (e.g., encoder 101), the input bitstream may be decoded by decoder 201 in a process that is inverse to that of the video encoder. Accordingly, some details of decoding described above with respect to encoding may be skipped for ease of description. The decoding module 402 may be configured to decode the bitstream to obtain various information encoded into the bitstream, such as a quantization level for each position in the encoded block. In some embodiments, the decoding module 402 may perform entropy decoding corresponding to entropy encoding performed by an encoder, such as exponential-Golomb (exponential-Golomb) encoding or context encoding, e.g., CAVLC, CABAC, etc. In addition to the quantization level of the position in the encoded block, the decoding module 402 may decode various other information such as block type information, prediction mode information, partition unit information, prediction unit information, transmission unit information, motion vector information, reference frame information, block interpolation information, and filtering information of the encoding unit. During the decoding process, the decoding module 402 may perform a reset on the bit stream to reconstruct and reset the data from the 1D order to a 2D reset block by a reverse scanning method based on the encoding scan order used by the encoder.

The dequantization module 404 may be configured to dequantize a quantization level for each position of a coded block (e.g., a 2D reconstructed block) to obtain coefficients for each position. In some embodiments, the dequantization module 404 may also perform correlated dequantization based on quantization parameters provided by the encoder, including information related to the quantizers used in correlated quantization, such as quantization step sizes used by each quantizer.

The inverse transform module 406 may be configured to perform inverse transforms, such as inverse DCT, inverse DST, and inverse KLT, on the DCTs, DST, and KLT, respectively, performed by the encoder, to transform the data from a transform domain (e.g., coefficients) back to a pixel domain (e.g., luminance and/or chrominance information). In some embodiments, the inverse transform module 406 may selectively perform transform operations (e.g., DCT, DST, KLT) according to various pieces of information such as a prediction method, a size of a current block, a prediction direction, and the like.

The inter prediction module 408 and the intra prediction module 410 may be configured to generate a prediction block based on information related to the generation of the prediction block provided by the decoding module 402 and information of previously decoded blocks or images provided by the buffer module 414. As described above, if the size of the prediction unit and the size of the transform unit are the same when intra prediction is performed in the same manner as the operation of the encoder, intra prediction may be performed on the prediction unit based on the pixel on the left side, the pixel on the upper left side, and the pixel on the top of the prediction unit. However, if the size of the prediction unit and the size of the transform unit are different when intra prediction is performed, intra prediction may be performed using reference pixels based on the transform unit.

The reconstructed block or reconstructed image combined from the outputs of the inverse transform module 406 and the prediction module 408 or 410 may be provided to a filter module 412. The filter module 412 may include a deblocking filter, an offset correction module, and an ALF. The buffer module 414 may store and use the reconstructed image or block as a reference image or reference block for the inter prediction module 408 and may output the reconstructed image.

However, as described above, the encoding/decoding operations performed by encoding module 320 and decoding module 402 may not be suitable for some video encoding applications, such as high bit depth and high bit rate video encoding, due to their limited throughput. While counters rembinstpass 1 and RemCcbs are in RRC and TSRC, respectively, to limit the total number of context-encoded bits to aid in worst-case throughput, the higher computational cost of handling context-encoded bits and the undesirable switching between context-encoded bits and bypass-encoded bits in CABAC limit video-encoded throughput.

Consistent with the scope of this disclosure, encoding module 320 and decoding module 402 may be configured to enable a high throughput mode in which at least one residual encoded binary bit of an encoded block changes from a context encoded binary bit to a bypass encoded binary bit. Accordingly, the encoding module 320 may be configured to encode the quantization level of the encoded block and/or any other suitable information related to the encoded block into the bitstream in a high throughput mode to improve throughput. Similarly, the decoding module 402 may be configured to decode the bitstream to obtain a quantization level of the encoded block in the high throughput mode and/or any other suitable information related to the encoded block to improve throughput.

Consistent with the scope of the present disclosure, high throughput mode may be enabled by encoding module 320 and decoding module 402 at the encoding block level as well as at the transform unit level. In some embodiments, in the high throughput mode, the plurality of transform unit bits of the transform unit change from context-coded bits to bypass-coded bits. Accordingly, the encoding module 320 may be configured to encode the quantization level of the transform unit and/or any other suitable information related to the transform unit into the bitstream in a high throughput mode to improve throughput. Similarly, the decoding module 402 may be configured to decode the bitstream to obtain quantization levels of transform units in high throughput mode and/or any other suitable information related to the transform units to improve throughput. Any other suitable bypass encoded binary bits, such as motion vector differential binary bits, may be changed to bypass encoded binary bits when the encoding module 320 performs an encoding operation and when the decoding module 402 performs a decoding operation when the high throughput mode is enabled.

CABAC in h.266/VVC is a sequential process in which the evaluation of each iteration depends on the result of the previous iteration. At higher bit depths and higher bit rate operating ranges (especially in 16 bit inputs), the serial nature of the CABAC decoding process can affect codec throughput. Consistent with the scope of the present disclosure, a bypass alignment method for VVC operating range extension may be used prior to starting encoding/decoding of bypass encoded binary bits, for example, by setting the value of the current interval length R of the CABAC engine (e.g., a 9-bit variable called ivlCurrRange) to 256. After the ivlCurrRange is aligned to 256, for example, a shift operation (e.g., via a shift register) may be used to implement the decoding process of bypass encoded binary bits instead of undergoing a conventional CABC operation. Thus, the aligned plurality of bypass encoded bits may be encoded simultaneously to further improve throughput.

To utilize both bypass alignment and full bypass coding schemes, bypass bit alignment is also applied to high throughput modes consistent with the scope of the present disclosure. For example, by setting the value of the current interval length to 256, the application of bypass bit alignment may be invoked at different stages of the encoding and decoding process as described in detail below.

Fig. 9A illustrates an exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure. As shown in fig. 9A, the bit stream may start with the transform unit binary bits of the transform unit. In CABAC, various transform unit bits may be reserved as context-encoded bits for context encoding. The transform unit binary bits may include an encoded Cb transform block flag (tu_cb_coded_flag), an encoded Cr transform block flag (tu_cr_coded_flag), an encoded luma transform block flag (tu_y_coded_flag), a quantization parameter increment value (cu_qp_delta_abs), a chroma quantization parameter offset flag (cu_chroma_qp_offset_flag), a chroma quantization parameter offset index (cu_chroma_qp_offset_idx), a joint chroma flag (tu_joint_cbcr_residual_flag), and a transform skip flag (transform_skip_flag). It should be appreciated that the transform unit bits may also include bypass encoded bits, such as a quantization parameter delta sign flag (cu_qp_delta_sign_flag) in some examples.

As shown in fig. 9A, the transform unit may correspond to one encoded block (e.g., a transform block of RRC) of luminance samples (Y in fig. 9A) and two corresponding encoded blocks of chrominance samples (Cb and Cr in fig. 9A). Thus, the transform unit bits may include three transform_skip_flags for Y, cb and Cr encoding blocks, respectively, where each transform_skip_flag is a context-encoded bit. For each encoded block, the first residual encoded binary bits of the encoded block to be encoded/decoded in the bitstream after the transform_skip_flag may be last_sig_coeff_x_prefix and last_sig_coeff_y_prefix, which remain as context encoded binary bits. As shown in fig. 9A, all other residual encoded bits in each encoded block may be bypass encoded bits. For example, the bypass-encoded residual encoded binary bits may include last significant coefficient suffixes (last_sig_coeff_x_suffix and last_sig_coeff_y_suffix), an encoded sub-block flag (sb_coded_flag), a residual (abs_remain), an absolute level (dec_abs_level), and a coefficient symbol flag (coeff_sign_flag).

That is, for each coding block, the high throughput mode may be enabled after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix and before sb_coded_flag. In some embodiments where it is also desired to encode last_sig_coeff_x_sufix and last_sig_coeff_y_sufix, for each encoded block, the high throughput mode may be enabled after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix and before last_sig_coeff_y_sufix. In other words, the high throughput mode is enabled for each coding block immediately after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix. In the high throughput mode, for each position of each sub-block, the residual encoded binary-sb_coded_flag may change from a context encoded binary to a bypass encoded binary. For example, by setting the value of the remaining context-encoded binary bits (rembinstpass 1) to be less than the threshold value 4, e.g., to be 0, encoding of all other context-encoded binary bits, such as the valid flag (sig_coeff_flag), greater than 1 flag (abs_level_gtx_flag [ n ] [0 ]), parity flag (par_level_flag), and greater than flag (abs_level_gtx_flag [ n ] [1 ]), can be skipped. In other words, in the high throughput mode, the first encoding process of each position of each sub-block of the encoded block may be skipped so that the context-encoded binary bits may not appear in the first encoding process. Therefore, in the high throughput mode, each encoding block may be encoded using only bypass-encoded binary bits, except for last_sig_coeff_x_prefix and last_sig_coeff_y_prefix.

As shown in fig. 9A, for each encoded block, the application of bypass bit alignment may be invoked immediately after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix (e.g., by setting the value of ivl currrange to 256), as part of a high throughput mode, so that all bypass encoded binary bits may be bit aligned to allow shift operations and parallel processing. As shown in fig. 9A, the high throughput mode may be enabled at the coding block level, and 3 times bypass bit alignment may be invoked for three coding blocks corresponding to the transform unit.

Fig. 9B illustrates another exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure. Unlike the bypass alignment scheme in fig. 9A, the bypass alignment scheme of fig. 9B further changes last_sig_coeff_x_prefix and last_sig_coeff_y_prefix from context-coded binary bits to bypass-coded binary bits in the high throughput mode, such that each coding block may be coded using only bypass-coded binary bits in the high throughput mode of fig. 9B. Furthermore, as shown in fig. 9B, for each encoded block, the application of bypass bit alignment may be invoked (e.g., by setting the value of ivl currrange to 256) before last_sig_coeff_x_prefix and last_sig_coeff_y_prefix, as part of the high throughput mode, so that all bypass encoded binary bits may be bit aligned to allow shift operations and parallel processing. For example, bypass bit alignment may be applied at the beginning of the bit stream of each encoded block. As shown in fig. 9B, the high throughput mode may be enabled at the coding block level, and 3 times bypass bit alignment may be invoked for three coding blocks corresponding to the transform unit.

The scheme of fig. 9B may further improve the throughput of video encoding by changing the last significant coefficient prefix from the context-encoded binary bits to the bypass-code binary bits, as compared to the scheme of fig. 9A. For very high bit rates and high bit depth operating ranges, the bits for the last significant coefficient position can also be quite high, since most blocks are encoded to a smaller block size. Since the index of the context variable is derived for each binary bit of last_sig_coeff_x_prefix and last_sig_coeff_y_prefix, the derivation of the context index of last_sig_coeff_x_prefix and last_sig_coeff_y_prefix affects throughput.

Fig. 9C illustrates yet another exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure. Unlike the bypass alignment scheme in fig. 9B, in the bypass alignment scheme of fig. 9C, in the high throughput mode, the transform_skip_flag is further changed from context-encoded binary bits to bypass-encoded binary bits, so that for each transform unit, the application of bypass bit alignment may be invoked before the first transform_skip_flag (e.g., by setting the value of ivlCurrRange to 256) as part of the high throughput mode, so that bypass bit alignment may be invoked only once for all three encoded blocks corresponding to the transform unit.

Compared to the scheme in fig. 9B, the scheme in fig. 9C can further improve the throughput of video coding by only invoking bypass bit alignment once per transform unit, instead of invoking bypass bit alignment 3 times for 3 coding blocks, respectively.

Fig. 9D illustrates yet another exemplary bypass alignment scheme in RRC according to some embodiments of the present disclosure. Unlike the bypass alignment scheme in fig. 9C, in the bypass alignment scheme of fig. 9D, the transform unit bits of the transform unit are further changed from context-encoded bits to bypass-encoded bits, such that in the high throughput mode, all of the transform unit bits of the transform unit are also encoded as bypass-encoded bits. For example, in the high throughput mode, in addition to transform_skip_flags, tu_cb_coded_flag, tu_cr_coded_flag, tu_y_coded_flag, cu_qp_delta_abs, cu_chroma_qp_offset_flag, cu_chroma_qp_offset_idx, and tu_joint_cbcr_residual_flag may change from context-coded binary bits to bypass-coded binary bits. Thus, in fig. 9D, in the high throughput mode, the transform unit and the three corresponding coding blocks may be encoded using only bypass-encoded binary bits.

As shown in fig. 9D, for each transform unit, the application of bypass bit alignment may be invoked before the first transform unit bit (e.g., tu_cb_coded_flag) of the transform unit bits, e.g., by setting the value of ivlCurrRange to 256, as part of a high throughput mode, such that bypass bit alignment may be invoked only once for each transform unit. For example, bypass bit alignment may be applied at the beginning of the bit stream of the transform unit.

Compared to the scheme in fig. 9C, the scheme in fig. 9D may further improve the throughput of video encoding by encoding the transform unit binary bits into bypass encoded binary bits only to avoid any switching between context encoding and bypass encoding by the CABAC encoding engine when encoding the transform unit. The high throughput mode may be enabled at the transform unit level.

Fig. 10A illustrates an exemplary bypass alignment scheme in a TSRC according to some embodiments of the present disclosure. As shown in fig. 10A, the bit stream may start with a transform unit binary bit of the transform unit. In CABAC, various transform unit bits may be reserved as context-encoded bits for context encoding. The transform unit binary bits may include an encoded Cb transform block flag (tu_cb_coded_flag), an encoded Cr transform block flag (tu_cr_coded_flag), an encoded luma transform block flag (tu_y_coded_flag), a quantization parameter increment value (cu_qp_delta_abs), a chroma quantization parameter offset flag (cu_chroma_qp_offset_flag), a chroma quantization parameter offset index (cu_chroma_qp_offset_idx), a joint chroma flag (tu_joint_cbcr_residual_flag), and a transform skip flag (transform_skip_flag). It should be appreciated that the transform unit bits may also include bypass encoded bits, such as a quantization parameter delta sign flag (cu_qp_delta_sign_flag) in some examples.

As shown in fig. 10A, the transform unit may correspond to one encoded block of luminance samples (e.g., a transform skip block of TSRC) and two corresponding encoded blocks of chrominance samples (Cb and Cr in fig. 10A). Thus, the transform unit bits may include three transform_skip_flags for Y, cb and Cr encoding blocks, respectively, where each transform_skip_flag is a context-encoded bit. As shown in fig. 10A, for each encoded block, all residual encoded bits in each encoded block may be bypass encoded bits. For example, the bypass-encoded residual encoded binary bits may include an encoded sub-block flag (sb_encoded_flag), a residual (abs_residual), and a coefficient symbol flag (coeff_sign_flag).

That is, the high throughput mode may be enabled before the sb_coded_flag. In the high throughput mode, for each position of each sub-block, the residual encoded binary bit sb_coded_flag may change from a context encoded binary bit to a bypass encoded binary bit. For example, by setting the value of the remaining context-encoded binary bits (remcbs) to be less than the threshold value 4, e.g., to be 0, encoding of all other context-encoded binary bits, such as the valid flag (sig_coeff_flag), the greater-than flag (abs_level_gtx_flag [ n ] [ j ]), and the parity flag (par_level_flag), may be skipped. In other words, in the high throughput mode, the first and second encoding processes of each position of each sub-block of the encoded block may be skipped such that the context-encoded binary bits encoded during the first and second encoding processes may not be encoded. Thus, in high throughput mode, each coding block may be coded using only bypass coded binary bits.

As shown in fig. 10A, for each encoded block, the application of bypass bit alignment may be invoked before the sb_coded_flag, for example by setting the value of ivlCurrRange to 256, as part of the high throughput mode, so that all bypass encoded binary bits may be bit aligned to allow shift operations and parallel processing. As shown in fig. 10A, the high throughput mode may be enabled at the coding block level, and 3 times bypass bit alignment may be invoked for three coding blocks corresponding to the transform unit.

Fig. 10B illustrates another exemplary bypass alignment scheme in a TSRC according to some embodiments of the present disclosure. Unlike the bypass alignment scheme in fig. 10A, in the bypass alignment scheme of fig. 10B, the transform unit bits of the transform unit are further changed from context-encoded bits to bypass-encoded bits, such that in the high throughput mode, all of the transform unit bits of the transform unit are also encoded as bypass-encoded bits. For example, in the high throughput mode, in addition to transform_skip_flags, tu_cb_coded_flag, tu_cr_coded_flag, tu_y_coded_flag, cu_qp_delta_abs, cu_chroma_qp_offset_flag, cu_chroma_qp_offset_idx, tu_joint_cbcr_residual_flag, and transform_skip_flag may change from context-coded binary bits to bypass-coded binary bits. Thus, in fig. 10B, in the high throughput mode, the transform unit and the three corresponding coding blocks may be encoded using only bypass-encoded binary bits.

As shown in fig. 10B, for each transform unit, the application of bypass bit alignment may be invoked before the first transform unit bit (e.g., tu_cb_coded_flag) of the transform unit bits, e.g., by setting the value of ivlCurrRange to 256, as part of a high throughput mode, such that bypass bit alignment may be invoked only once for each transform unit. For example, bypass bit alignment may be applied at the beginning of the bit stream of the transform unit.

Compared to the scheme in fig. 10A, the scheme in fig. 10B can further improve the throughput of video coding by encoding the transform unit binary bits into bypass-encoded only binary bits to avoid any switching between context encoding and bypass encoding by the CABAC encoding engine when encoding the transform unit. The high throughput mode may be enabled at the transform unit level.

Fig. 11 illustrates a flowchart of an exemplary method 1100 of video encoding according to some embodiments of the present disclosure. Method 1100 may be performed at the encoding block level by encoder 101 of encoding system 100 or any other suitable video encoding system. Method 1100 may include operations 1102, 1104, 1106, and 1108, as described below. It should be appreciated that some of these operations may be optional and some may be performed simultaneously or in a different order than shown in fig. 11.

In operation 1102, coefficients for each position in a coded block are quantized to generate quantization levels for the respective position. For example, as shown in fig. 3, quantization module 310 may be configured to quantize coefficients of a current position in a current encoded block to generate a quantization level of the current position. In some embodiments, the encoded block includes a plurality of sub-blocks. The coding block may be a transform block coded using RRC or a transform skip block coded using TSRC.

At operation 1104, a high throughput mode is enabled. In the high throughput mode, at least one residual encoded binary bit of the encoded block is changed from a context encoded binary bit to a bypass encoded binary bit and bypass bit alignment is applied. For example, as shown in fig. 3, the encoding module 320 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sequence parameter set, sps) range extension syntax for indicating whether high throughput mode is enabled. For example, a spin_high_throughput_mode_enabled_flag equal to 1 may indicate that high throughput mode is enabled, and a spin_high_throughput_mode_enabled_flag equal to 0 may indicate that high throughput mode is not enabled. When there is no sps_high_through_mode_enabled_flag, it can be inferred that the value of sps_high_through_mode_enabled_flag is equal to 0.

Various changes may be made in response to enabling the high throughput mode. In some embodiments, in the high throughput mode, at least one residual encoded binary bit of the encoded block changes from a context encoded binary bit to a bypass encoded binary bit. In one example where the encoded block is a transform block encoded using RRC, the residual encoded binary bit that changes from a context encoded binary bit to a bypass encoded binary bit may include an encoded sub-block flag. For example, as shown in fig. 9A, when the sps_high_through_mode_enabled_flag is equal to 1, the sb_coded_flag may change from a context-coded bit to a bypass-coded bit. In another example, where the encoded block is a transform block encoded using RRC, the residual encoded binary bits that change from context encoded binary bits to bypass encoded binary bits may further include a last significant coefficient prefix. For example, as shown in fig. 9B-9D, when sps_high_through_mode_enabled_flag is equal to 1, last_sig_coeff_x_prefix and last_sig_coeff_y_prefix may also change from context-encoded binary bits to bypass-encoded binary bits. In another example where the encoded block is a transform skip block using TSRC encoding, residual encoded bits that change from context encoded bits to bypass encoded bits may include encoded sub-block flags and/or coefficient symbol flags. For example, as shown in fig. 10A and 10B, when the sps_high_through_mode_enabled_flag is equal to 1, the sb_coded_flag and coeff_sign_flag may change from context-coded binary bits to bypass-coded binary bits.

In some embodiments, in the high throughput mode, the value of the remaining context-encoded binary bits (e.g., counters) is set to be less than a threshold. For example, the threshold may be equal to 4 and the value of the remaining context-encoded binary bit may be set to 0. Thus, in high throughput mode, encoding of any context-encoded binary bits for each position of each sub-block may be skipped. In one example, in which the encoded block is a transform block encoded using RRC, as shown in fig. 8A and 9A-9D, the variable rembinstpass 1 may be set to 0 to skip a first encoding process involving encoding all context-encoded bits of each position of each sub-block, including a valid flag (sig_coeff_flag), greater than 1 flag (abs_level_gtx_flag [ n ] [0 ]), a parity flag (par_level_flag), and greater than flag (abs_level_gtx_flag [ n ] [1 ]). In another example in which the encoded block is a transform skip block using TSRC encoding, as shown in fig. 8B, 10A, and 10B, the variable remcbs may be set to 0 to skip the first and second encoding processes involving encoding all context-encoded bits of each position of each sub-block, including a significant flag (sig_coeff_flag), a coefficient symbol flag (coeff_sign_flag), a greater-than flag (abs_level_gtx_flag [ n ] [ j ]), and a parity flag (par_level_flag).

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1106, an application that bypasses bit alignment is invoked. For example, as shown in fig. 3, the encoding module 320 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment is applied at the encoding block level. In one example where the residual encoded binary bits, which change from context encoded binary bits to bypass encoded binary bits, comprise encoded sub-block flags in RRC, for example, as shown in fig. 9A, the application of bypass bit alignment may be invoked after encoding the last significant coefficient prefix and before encoding the encoded sub-block flags. In another example where the residual encoded binary bits, which change from context encoded binary bits to bypass encoded binary bits, further include a last significant coefficient prefix in RRC, for example, as shown in fig. 9B, the application of bypass bit alignment may be invoked prior to encoding the last significant coefficient prefix. In another example where the residual encoded binary bits, which change from context encoded binary bits to bypass encoded binary bits, include encoded sub-block flags and/or coefficient symbol flags in the TSRC, for example, as shown in fig. 10A, an application of bypass bit alignment may be invoked prior to encoding the encoded sub-block flags. For example, when the sps_high_throughput_mode_enabled_flag is equal to 1, the request for the value of the syntax element may be a request for the first bypass decoding syntax element sb_coded_flag or abs_remain in the TSRC, or a request for the first bypass decoding syntax element last_sig_coeff_x_suffix, or last_sig_coeff_y_suffix, or dec_abs_level in the RRC, in the encoded block, bypass bit alignment may be invoked.

In some embodiments, bypass bit alignment may be invoked by a procedure with variable ivlCurrRange as input and updated variable ivlCurrRange as output. For coding block level alignment, this procedure may be applied before bypass coding of last sig coeff x suffix, or last sig coeff y suffix, or decabs level, or sb coded flag or abs remain. When ivlCurrRange is 256, the offset interval (ivlOffset) and the bit stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 since ivlOffset is smaller than the limit of ivlCurrRange).

In operation 1108, in the high throughput mode, the quantization levels of the encoded blocks are encoded into the bitstream. As shown in fig. 3, the encoding module 320 may be configured to encode the quantization levels at each location in the high throughput mode described in detail above using binary bit arithmetic coding, such as CABAC. In some embodiments, in the high throughput mode, each residual encoded binary bit of the encoded block is encoded as a bypass encoded binary bit.

Fig. 12 illustrates a flowchart of an exemplary method 1200 of video decoding according to some embodiments of the present disclosure. Method 1200 may be performed at the encoding block level by decoder 201 of decoding system 200 or any other suitable video decoding system. Method 1200 may include operations 1202, 1204, 1206, and 1208 as described below. It should be appreciated that some of these operations may be optional and some operations may be performed simultaneously or in a different order than shown in fig. 12.

In operation 1202, a high throughput mode is enabled. In the high throughput mode, at least one residual encoded binary bit of the encoded block is changed from a context encoded binary bit to a bypass encoded binary bit and bypass bit alignment is applied. For example, as shown in fig. 4, the decoding module 402 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sequence parameter set, sps) range extension syntax for indicating whether high throughput mode is enabled. For example, a spin_high_throughput_mode_enabled_flag equal to 1 may indicate that high throughput mode is enabled, and a spin_high_throughput_mode_enabled_flag equal to 0 may indicate that high throughput mode is not enabled. When there is no sps_high_through_mode_enabled_flag, it can be inferred that the value of sps_high_through_mode_enabled_flag is equal to 0. As described in detail above, the high-throughput mode for video decoding at the encoding block level may be the same as the high-throughput mode for video encoding.

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1204, an application that bypasses bit alignment is invoked. For example, as shown in fig. 4, the decode module 402 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment is applied at the encoding block level. In one example where the residual encoded binary bits, which change from context encoded binary bits to bypass encoded binary bits, comprise encoded sub-block flags in RRC, for example, as shown in fig. 9A, the application of bypass bit alignment may be invoked after the last significant coefficient prefix and before the encoded sub-block flags. In another example where the residual encoded binary bits, which change from context encoded binary bits to bypass encoded binary bits, also include the last significant coefficient prefix in RRC, for example, as shown in fig. 9B, the application of bypass bit alignment may be invoked before the last significant coefficient prefix. In another example where the residual encoded binary bits from the context encoded binary bits to bypass encoded binary bits include encoded sub-block flags and/or coefficient symbol flags in the TSRC, for example, as shown in fig. 10A, an application of bypass bit alignment may be invoked prior to the encoded sub-block flags. For example, when the sps_high_throughput_mode_enabled_flag is equal to 1, the request for the value of the syntax element may be a request for the first bypass decoding syntax element sb_coded_flag or abs_remain in the TSRC, or a request for the first bypass decoding syntax element last_sig_coeff_x_suffix, or last_sig_coeff_y_suffix or dec_abs_level in the RRC, in the encoded block, bypass bit alignment may be invoked.

In some embodiments, bypass bit alignment may be invoked by a procedure with variable ivlCurrRange as input and updated variable ivlCurrRange as output. For coding block level alignment, this procedure may be applied before bypass coding of last sig coeff x suffix, or last sig coeff y suffix, or decabs level, or sb coded flag or abs remain. When ivlCurrRange is 256, the offset interval (ivlOffset) and the bit stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 since ivlOffset is smaller than the limit of ivlCurrRange). That is, after the bypass bit alignment application, the bit stream may be decoded by a shift operation.

In operation 1206, the bitstream is decoded to obtain quantization levels for each position in the encoded block in the high throughput mode. As shown in fig. 4, the decoding module 402 may be configured to decode the bitstream to obtain a quantization level for each position in the high throughput mode described in detail above using binary arithmetic coding (e.g., CABAC).

In operation 1208, the quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block. As shown in fig. 4, the dequantization module 404 may be configured to dequantize the quantization level for each position to generate coefficients for the corresponding position in the encoded block.

Fig. 13 illustrates a flowchart of another exemplary method 1300 of video encoding according to some embodiments of the present disclosure. Method 1300 may be performed at the transform unit level by encoder 101 of encoding system 100 or any other suitable video encoding system. Method 1300 may include operations 1302, 1304, 1306, and 1308 as described below. It should be appreciated that some of these operations may be optional and some may be performed simultaneously or in a different order than shown in fig. 13.

In operation 1302, coefficients for each position in a transform unit are quantized to generate quantization levels for the respective position. For example, as shown in fig. 3, quantization module 310 may be configured to quantize coefficients of a current position in a current transform unit to generate a quantization level of the current position. In some embodiments, the transform unit comprises an encoded block.

In operation 1304, a high throughput mode is enabled. In the high throughput mode, the transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. For example, as shown in fig. 3, the encoding module 320 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sequence parameter set, sps) range extension syntax for indicating whether high throughput mode is enabled. For example, a spin_high_throughput_mode_enabled_flag equal to 1 may indicate that high throughput mode is enabled, and a spin_high_throughput_mode_enabled_flag equal to 0 may indicate that high throughput mode is not enabled. When the sps_high_through_mode_enabled_flag does not exist, it can be inferred that the value of the sps_high_through_mode_enabled_flag is equal to 0.

In some embodiments, the transform unit bits that change from context-encoded bits to bypass-encoded bits may include an encoded Cb transform block flag (tu_cb_encoded_flag), an encoded Cr transform block flag (tu_cr_encoded_flag), an encoded luma transform block flag (tu_y_encoded_flag), a quantization parameter increment value (cu_qp_delta_abs), a chroma quantization parameter offset flag (cu_chroma_qp_offset_flag), a chroma quantization parameter offset index (cu_chroma_qp_offset_idx), a joint chroma flag (tu_joint_cbcr_residual_flag), and a transform skip flag (transform_skip_flag). For example, as shown in fig. 9D and 10B, when the sps_high_through_mode_enabled_flag is equal to 1, tu_cb_coded_flag, tu_cr_coded_flag, tu_y_coded_flag, cu_qp_delta_abs, cu_chroma_qp_offset_flag, cu_chroma_qp_offset_idx, tu_joint_cbcr_residual_flag, and transform_skip_flag may change from context-encoded binary bits to bypass-encoded binary bits.

In some embodiments, bypass bit alignment is also applied to high throughput modes. In operation 1306, an application that bypasses bit alignment is invoked. For example, as shown in fig. 3, the encoding module 320 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment is applied at the transform unit level. In one example, for example, as shown in fig. 9D and 10B, the application of bypass bit alignment may be invoked prior to encoding a first one of the transform unit bits. For example, when the sps_high_through_mode_enabled_flag is equal to 1, the request for the value of the syntax element may be a request for the first bypass decoding syntax element tu_cb_coded_flag or tu_y_coded_flag in the transform unit, and bypass bit alignment may be invoked.

In some embodiments, bypass bit alignment may be invoked by a procedure with variable ivlCurrRange as input and updated variable ivlCurrRange as output. For transform unit level alignment, this process may be applied before the tu_cb_coded_flag or tu_y_coded_flag is bypass coded. When ivlCurrRange is 256, the offset interval (ivlOffset) and the bit stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 since ivlOffset is smaller than the limit of ivlCurrRange).

In operation 1308, in a high-throughput mode, quantization levels of transform units are encoded into a bitstream. As shown in fig. 3, the encoding module 320 may be configured to encode the quantization levels for each location in the high throughput mode described in detail above using binary arithmetic coding, such as CABAC. In some embodiments, in the high throughput mode, each transform unit binary bit of a transform unit is encoded as a bypass encoded binary bit.

It should be appreciated that in some examples, methods 1100 and 1300 may be combined such that high throughput mode may be enabled at both the transform unit level and the corresponding coding block level, e.g., as described above with respect to fig. 9D and 10B.

Fig. 14 illustrates a flowchart of an exemplary method 1400 of video decoding according to some embodiments of the present disclosure. Method 1400 may be performed at the transform unit level by decoder 201 of decoding system 200 or any other suitable video decoding system. Method 1400 may include operations 1402, 1404, 1406, and 1408 as described below. It should be appreciated that some of these operations may be optional and some operations may be performed simultaneously or in a different order than shown in fig. 14.

In operation 1402, a high throughput mode is enabled. In the high throughput mode, the transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. For example, as shown in fig. 4, the decoding module 402 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sequence parameter set, sps) range extension syntax for indicating whether high throughput mode is enabled. For example, a spin_high_throughput_mode_enabled_flag equal to 1 may indicate that high throughput mode is enabled, and a spin_high_throughput_mode_enabled_flag equal to 0 may indicate that high throughput mode is not enabled. When the sps_high_through_mode_enabled_flag does not exist, it can be inferred that the value of the sps_high_through_mode_enabled_flag is equal to 0. As described in detail above, the high-throughput mode for video decoding at the transform unit level may be the same as the high-throughput mode for video encoding.

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1404, an application that bypasses bit alignment is invoked. For example, as shown in fig. 4, the decode module 402 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment is applied at the transform unit level. In one example, for example, as shown in fig. 9D and 10B, the application of bypass bit alignment may be invoked before the first one of the transform unit bits. For example, when the sps_high_through_mode_enabled_flag is equal to 1, the request for the value of the syntax element may be a request for the first bypass decoding syntax element tu_cb_coded_flag or tu_y_coded_flag in the transform unit, and bypass bit alignment may be invoked.

In operation 1406, the bitstream is decoded to obtain a quantization level for each position in the transform unit in the high throughput mode. As shown in fig. 4, the decoding module 402 may be configured to decode the bitstream to obtain a quantization level for each position in the high throughput mode described in detail above using binary arithmetic coding (e.g., CABAC).

In operation 1408, the quantization level of the encoded block is dequantized to generate coefficients for each position in the transform unit. As shown in fig. 4, the dequantization module 404 may be configured to dequantize the quantization level for each location to generate coefficients for each location in the transform unit.

It should be appreciated that in some examples, methods 1200 and 1400 may be combined such that high throughput mode may be enabled at both the transform unit level and the corresponding coding block level, e.g., as described above with respect to fig. 9D and 10B.

It should be appreciated that any suitable additional changes may be made in the high throughput mode as well. The Rice parameter may be used to control how the remainder is binarized in residual coding. For a given level, the appropriate Rice parameter may binarize the value with a minimum number of binary digits. For example, the value of the variable (StatCoeff) is related to the value of the stage and is used to derive the Rice parameter. Larger StatCoeff values may map to larger Rice parameters. In high throughput mode, there may be many large stages that need to be encoded. Thus, in high throughput mode, statCoeff should be set larger. For example, a fixed offset of 2 may be added in the following equation:

In some embodiments, any other suitable context-encoded binary bits besides the context-encoded binary bits of the transform unit and the encoding block described above may also be changed to bypass-encoded binary bits in the high-throughput mode. These context-coded bits may include, for example, motion vector differential bits, such as abs_mvd_greater0_flag, abs_mvd_greater1_flag, abs_mvd_minus2, and mvd_sign_flag. Other possible context-coded bits that may be changed to bypass-coded bits in high-throughput mode may include, for example, alf_ctb_flag, alf_use_aps_flag, alf_ctb_filter_alt_idx, alf_ctb_cc_cr_idc, alf_ctb_cc_cb_idc, sao_merge_left_flag, sao_merge_up_flag, sao_type_idx_chroma, sao_type_idx_luma, split_cu_flag split_qt_flag, mtt_split_cu_vertical_flag, mtt_split_cu_binary_flag, non_inter_flag, cu_skip_flag, pred_mode_flag, pred_mode_ ibc _flag, pred_mode_plt_flag, cu_act_enabled_flag, intra_bdpment_luma_flag, intra_mid_idx, intra_luma_ref_idx, intra_sub_mode_flag, intra_sub_split_flag, intra_luma_flag intra_luma_not_player_flag, intra_bdpcm_chroma_flag, intra_bdpcm_chroma_dir_flag, cclm_mode_flag, cclm_mode_idx, intra_chroma_pred_mode, palette_transmit_flag, copy_above_palette_indices_flag, run_copy_flag, general_merge_flag, regular_merge_flag, mmvd_merge_flag, mmvd_distance_idx, merge_sub_flag, merge_sub_sub_idx, cis_sub_flag, run_sub_sub_idx, clip_sub_flag, run_sub_flag, run_flag, and the like merge_idx, merge_gpm_idx0, merge_gpm_idx1, inter_pred_idc, inter_affine_flag, cu_affine_type_flag, sym_mvd_flag, ref_idx_l0, ref_idx_l1, mvp_l0_flag, mvp_l1_flag, amvr_flag, amvr_precision_idx, bcw_idx, cu_coded_flag, cu_sbt_flag, cu_sbt_quad_flag, cu_sbt_horizontal_flag, cu_sbt_pos_flag, mfnst_idx, s_idx, amvr_idx, abs_mvd_greate0_flag and abs_mvd_greate1_flag.

In some embodiments, the application of bypass bit alignment is invoked after any context-encoded binary bit, before the first bypass-encoded binary bit, such that bypass alignment always occurs at the beginning of encoding of the first bypass-encoded binary bit.

In various aspects of the disclosure, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, these functions may be stored as instructions on a non-transitory computer-readable medium. Computer readable media includes computer storage media. Storage media may be any available media that may be accessed by a processor, such as processor 102 in fig. 1 and 2. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, HDD (e.g., magnetic disk storage or other magnetic storage devices), flash drives, SSD, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a processing system (e.g., a mobile device or computer). Disk and disc, as used herein, includes CD, laser disc, optical disc, digital video disc (digital video disc, DVD) and floppy disk wherein the disk typically reproduces data magnetically and the disc reproduces data optically with a laser. Combinations of the above should also be included within the scope of computer-readable media.

According to one aspect of the present disclosure, a method for encoding an image of a video including a transform unit is disclosed. The processor quantizes the coefficients for each position in the transform unit to generate a quantization level for the corresponding position. The high throughput mode is enabled. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. In the high throughput mode, the processor encodes the quantization levels of the transform unit into a bitstream.

In some embodiments, for encoding, each transform unit binary bit of a transform unit is encoded as a bypass encoded binary bit.

In some embodiments, the value of the current interval length is set to 256 to apply bypass bit alignment in high throughput mode.

In some embodiments, the application of bypass bit alignment is invoked prior to encoding a first one of the transform unit bits.

In some embodiments, the transform unit comprises an encoded block. In some embodiments, in the high throughput mode, the plurality of residual encoded bits of the encoded block change from context encoded bits to bypass encoded bits.

In some embodiments, for encoding, each residual encoded binary of the encoded block is encoded as a bypass encoded binary.

In some embodiments, the value of the remaining context-encoded binary bits is set to be less than a threshold value.

In some embodiments, the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

In some embodiments, the encoded block includes a plurality of sub-blocks. In some embodiments, for encoding, the context-encoded binary bits of each sub-block are skipped.

According to another aspect of the present disclosure, a system for encoding an image of a video including a transform unit includes a memory configured to store instructions and a processor coupled to the memory. The processor is configured to, upon execution of the instructions, quantize the coefficients of each position in the transform unit to generate a quantization level for the respective position. The processor is further configured to enable a high throughput mode when executing the instructions. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The processor is further configured to encode the quantization level of the transform unit into the bitstream in a high throughput mode when executing the instructions.

In some embodiments, for encoding, the processor is further configured to encode each transform unit binary bit of the transform unit as a bypass encoded binary bit.

In some embodiments, the processor is further configured to set the value of the current interval length to 256 to apply bypass bit alignment in the high throughput mode.

In some embodiments, the processor is further configured to invoke application of bypass bit alignment prior to encoding a first one of the transform unit bits.

In some embodiments, for encoding, the processor is further configured to encode each residual encoded binary of the encoded block as bypass encoded binary.

In some embodiments, the processor is further configured to set the value of the remaining context-encoded binary bits to be less than a threshold value.

In some embodiments, the encoded block includes a plurality of sub-blocks. In some embodiments, for encoding, the processor is further configured to skip encoding the context-encoded binary bits of each sub-block.

According to yet another aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor, perform a process for encoding an image of a video that includes a transform unit is disclosed. The process includes quantizing the coefficients for each position in the encoded block to generate a quantization level for the corresponding position. The process also includes enabling a high throughput mode. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The process further includes encoding quantization levels of the transform unit into the bitstream in a high throughput mode.

In some embodiments, each transform unit binary bit of a transform unit is encoded as a bypass encoded binary bit.

In some embodiments, the application of bypass bit alignment is invoked before the first one of the transform unit bits.

In some embodiments, the bit stream is decoded by a shift operation after the bypass bit alignment is applied.

In some embodiments, each residual encoded binary bit of the encoded block is encoded as a bypass encoded binary bit.

In some embodiments, the encoded block includes a plurality of sub-blocks. In some embodiments, the context-encoded binary bits encoding each sub-block are skipped.

According to yet another aspect of the present disclosure, a system for decoding an image of a video including a transform unit includes a memory configured to store instructions and a processor coupled to the memory. The processor is configured to enable a high throughput mode when executing instructions. In the high throughput mode, a plurality of transform unit bits of the transform unit are changed from context-coded bits to bypass-coded bits and bypass bit alignment is applied. The processor is further configured to, upon execution of the instructions, decode the bitstream to obtain a quantization level for each position in the transform unit in the high throughput mode. The processor is further configured to dequantize the quantization level of the transform unit when executing the instruction to generate coefficients for each position in the transform unit.

In some embodiments, the processor is further configured to invoke application of bypass bit alignment prior to a first one of the transform unit bits.

The foregoing description of the embodiments will reveal the general nature of the disclosure such that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such embodiments without undue experimentation and without departing from the general concept of the present disclosure. Accordingly, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.

Embodiments of the present disclosure have been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. For ease of description, the boundaries of these functional building blocks have been arbitrarily defined herein. Alternate boundaries may be defined so long as the specified functions and relationships thereof are appropriately performed.

The summary and abstract sections may set forth one or more, but not all exemplary embodiments of the disclosure as contemplated by the inventors, and thus are not intended to limit the disclosure and appended claims in any way.

Various functional blocks, modules, and steps have been described above. The arrangement provided is illustrative and not limiting. Accordingly, the functional blocks, modules, and steps may be reordered or combined in a manner different from the examples provided above. Similarly, some embodiments include only a subset of the functional blocks, modules, and steps, and any such subset is permitted.

The breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A method for encoding an image of a video, the image comprising a transform unit, the method comprising:

A processor quantizes the coefficients of each position in the transform unit to generate a quantization level for the corresponding position;

enabling a high throughput mode, wherein in the high throughput mode, a plurality of transform unit bits of the transform unit change from context-coded bits to bypass-coded bits and bypass bit alignment is applied; and

in the high throughput mode, the processor encodes the quantization levels of the transform unit into a bitstream.

2. The method of claim 1, wherein encoding comprises encoding each transform unit binary bit of the transform unit as a bypass encoded binary bit.

3. The method of claim 1, wherein the method further comprises: the value of the current interval length is set to 256 to apply the bypass bit alignment in the high throughput mode.

4. The method of claim 1, wherein the method further comprises: the application of the bypass bit alignment is invoked before encoding a first one of the transform unit bits.

5. The method according to claim 1, wherein:

The transform unit includes a coding block; and

in the high throughput mode, the plurality of residual encoded bits of the encoded block change from context encoded bits to bypass encoded bits.

6. The method of claim 5, wherein encoding comprises encoding each residual encoded binary of the encoded block as bypass encoded binary.

7. The method of claim 5, wherein the method further comprises: the value of the remaining context-encoded binary bits is set to be less than a threshold value.

8. The method of claim 7, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

9. The method according to claim 5, wherein:

the coding block includes a plurality of sub-blocks; and

encoding includes skipping over the encoding of the context-encoded binary bits for each sub-block.

10. A system for encoding an image of a video, the image comprising a transform unit, the system comprising:

a memory configured to store instructions; and

a processor coupled to the memory and configured to, when executing the instructions, perform the following:

Quantizing the coefficients of each position in the transform unit to generate a quantization level for the corresponding position;

in the high throughput mode, the quantization levels of the transform unit are encoded into a bitstream.

11. The system of claim 10, wherein for encoding, the processor is further configured to encode each transform unit binary bit of the transform unit as a bypass encoded binary bit.

12. The system of claim 10, wherein the processor is further configured to set a value of a current interval length to 256 to apply the bypass bit alignment in the high throughput mode.

13. The system of claim 10, wherein the processor is further configured to invoke the application of bypass bit alignment prior to encoding a first one of the transform unit bits.

14. The system of claim 10, wherein:

The transform unit includes a coding block; and

15. The system of claim 14, wherein for encoding, the processor is further configured to encode each residual encoded binary of the encoded block as a bypass encoded binary.

16. The system of claim 14, wherein the processor is further configured to set a value of the remaining context-encoded binary bits to be less than a threshold.

17. The system of claim 16, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

18. The system of claim 14, wherein:

the coding block includes a plurality of sub-blocks; and

for encoding, the processor is further configured to skip encoding the context-encoded binary bits of each sub-block.

19. A non-transitory computer readable medium storing instructions that, when executed by a processor, perform a process for encoding an image of a video, the image comprising a transform unit, the process comprising:

20. A method for decoding an image of a video, the image comprising a transform unit, the method comprising:

enabling a high throughput mode, wherein in the high throughput mode, a plurality of transform unit bits of the transform unit change from context-coded bits to bypass-coded bits and bypass bit alignment is applied;

a processor decodes the bit stream to obtain a quantization level for each position in the transform unit in the high throughput mode; and

the processor dequantizes the quantization levels of the transform unit to generate coefficients for each position in the transform unit.

21. The method of claim 20, wherein each transform unit binary bit of the transform unit is encoded as a bypass encoded binary bit.

22. The method of claim 20, wherein the method further comprises: the value of the current interval length is set to 256 to apply the bypass bit alignment in the high throughput mode.

23. The method of claim 20, wherein the method further comprises: the application of bypass bit alignment is invoked before a first one of the transform unit bits.

24. The method of claim 20, wherein the bit stream is decoded by a shift operation after the bypass bit alignment is applied.

25. The method according to claim 20, wherein:

the transform unit includes a coding block; and

26. The method of claim 25, wherein each residual encoded binary bit of the encoded block is encoded as a bypass encoded binary bit.

27. The method of claim 25, wherein the value of the remaining context-encoded binary bits is set to be less than a threshold value.

28. The method of claim 27, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

29. The method according to claim 25, wherein:

the coding block includes a plurality of sub-blocks; and

the context-encoded binary bits encoding each sub-block are skipped.

30. A system for decoding an image of a video, the image comprising a transform unit, the system comprising:

a memory configured to store instructions; and

decoding a bitstream to obtain a quantization level for each position in the transform unit in the high throughput mode; and

the quantization levels of the transform units are dequantized to generate coefficients for each position in the transform units.

31. The system of claim 30, wherein each transform unit binary bit of the transform unit is encoded as a bypass encoded binary bit.

32. The system of claim 30, wherein the processor is further configured to set a value of a current interval length to 256 to apply the bypass bit alignment in the high throughput mode.

33. The system of claim 30, wherein the processor is further configured to invoke the application of bypass bit alignment prior to a first one of the transform unit bits.

34. The system of claim 30, wherein the bit stream is decoded by a shift operation after the bypass bit alignment is applied.

35. The system of claim 30, wherein

The transform unit includes a coding block; and

36. The system of claim 35, wherein each residual encoded binary bit of the encoded block is encoded as a bypass encoded binary bit.

37. The system of claim 35, wherein the value of the remaining context-encoded binary bits is set to be less than a threshold value.

38. The system of claim 37, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

39. The system of claim 35, wherein

The coding block includes a plurality of sub-blocks; and

the context-encoded binary bits encoding each sub-block are skipped.

40. A non-transitory computer readable medium storing instructions that, when executed by a processor, perform a process for decoding an image of a video, the image comprising a transform unit, the process comprising:

dequantizing the quantization level of the transform unit to generate coefficients for each position in the transform unit.