CN117203960A

CN117203960A - Bypass alignment in video coding

Info

Publication number: CN117203960A
Application number: CN202280029693.XA
Authority: CN
Inventors: 余越; 于浩平
Original assignee: Innopeak Technology Inc
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-26
Filing date: 2022-04-25
Publication date: 2023-12-08
Also published as: CN117223285A

Abstract

In some aspects, a method for encoding a video image comprising an encoding block is disclosed. The processor quantizes the coefficients for each position in the encoded block to generate a quantization level for the corresponding position. The high throughput mode is enabled. In the high throughput mode, at least one residual code bit of the code block is changed from a context code bit to a bypass code bit and bypass bit alignment is applied. In the high throughput mode, the processor encodes the quantization levels of the encoded blocks into a code stream.

Description

Bypass alignment in video coding

Cross Reference to Related Applications

The present application claims the benefit of priority of U.S. provisional application No. 63/180,007 entitled "bypass alignment method for video coding" filed on month 4 of 2021, 26; the present application claims the benefit of priority of U.S. provisional patent application No. 63/215,862 entitled "bypass alignment method for video coding," filed on 28 th month 6 of 2021; the present application also claims the benefit of priority of U.S. provisional patent application No. 63/216,447 entitled "bypass alignment method for video coding," filed on 29, 6, 2021. The entire contents of the prior application are incorporated herein by reference in their entirety.

Background

Embodiments of the present invention relate to video coding.

Digital video has become mainstream and is widely used in various applications including digital television, video telephony, and video conferencing. These digital video applications are viable due to advances in computing and communication technology as well as efficient video coding techniques. Video data may be compressed using various video encoding techniques such that video data may be encoded using one or more video encoding standards. Exemplary video encoding criteria may include, but are not limited to: general video coding (Versatile Video Coding, H.266/VVC), high-efficiency video coding (High-Efficiency Video Coding, H.265/HEVC), advanced video coding (Advanced Video Coding, H.264/AVC), moving picture experts group (Moving Picture Expert Group, MPEG) coding, and the like.

Disclosure of Invention

According to one aspect of the present invention, a method for encoding an image of a video comprising an encoded block is disclosed. The processor quantizes the coefficients for each position in the encoded block to generate a quantization level for the corresponding position. The high throughput mode is enabled. In the high throughput mode, at least one residual coding bin (bin) of the coding block is changed from a context coding bin to a bypass coding bin, and bypass bit alignment is applied. In the high throughput mode, the processor encodes the quantization levels of the encoded blocks into a code stream.

According to another aspect of the present invention, a system for encoding an image of a video including an encoded block includes: a memory configured to store instructions and a processor coupled to the memory. The processor is configured to: upon execution of the instruction, the coefficients of each position in the encoded block are quantized to generate a quantization level for the corresponding position. The processor is further configured to: when executing instructions, the high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The processor is further configured to: the quantization level of the encoded block is encoded into a code stream in a high throughput mode when the instruction is executed.

According to yet another aspect of the invention, a non-transitory computer-readable medium storing instructions is disclosed. When executed by a processor, the instructions perform a process for encoding an image of a video comprising encoded blocks. The process comprises the following steps: the coefficients of each position in the encoded block are quantized to generate quantization levels for the respective position. The process also includes enabling a high throughput mode. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The process further includes: the quantization level of the encoded block is encoded into a code stream in a high throughput mode.

According to yet another aspect of the present invention, a method for decoding an image of a video comprising encoded blocks is disclosed. The high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. In high throughput mode, the processor decodes the code stream to obtain a quantization level for each position in the encoded block. The quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block.

According to yet another aspect of the present invention, a system for decoding an image of a video including an encoded block includes: a memory configured to store instructions and a processor coupled to the memory. The processor is configured to: when executing instructions, the high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The processor is further configured to: upon execution of the instructions, the code stream is decoded in a high throughput mode to obtain a quantization level for each position in the encoded block. The processor is further configured to: upon execution of the instruction, the quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block.

According to yet another aspect of the invention, a non-transitory computer-readable medium storing instructions is disclosed. When executed by a processor, the instructions perform a process for decoding an image of a video comprising encoded blocks. The process includes enabling a high throughput mode. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The process further includes: the code stream is decoded in a high throughput mode to obtain a quantization level for each position in the encoded block. The process further includes: the quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block.

These illustrative embodiments are mentioned not to limit or define the invention, but to provide examples to aid understanding of the invention. Additional embodiments are described in the detailed description, and further description is provided hereinafter.

Drawings

The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.

Fig. 1 illustrates a block diagram of an exemplary encoding system according to some embodiments of the invention.

Fig. 2 illustrates a block diagram of an exemplary decoding system according to some embodiments of the invention.

Fig. 3 illustrates a detailed block diagram of an exemplary encoder in the encoding system of fig. 1, according to some embodiments of the present invention.

Fig. 4 illustrates a detailed block diagram of an exemplary decoder in the decoding system of fig. 2, according to some embodiments of the invention.

Fig. 5 illustrates an exemplary image divided into Coding Tree Units (CTUs) according to some embodiments of the present invention.

Fig. 6 illustrates an exemplary CTU divided into Coding Units (CUs) according to some embodiments of the present invention.

Fig. 7A illustrates an exemplary transform block encoded using conventional residual coding (Regular Residual Coding, RRC) according to some embodiments of the invention.

Fig. 7B illustrates an exemplary transform skip block encoded using transform skip residual coding (Transform Skip Residual Coding, TSRC) according to some embodiments of the invention.

Fig. 8A and 8B show the code channels (pass) in RRC and TSRC, respectively.

Fig. 9A illustrates an exemplary bypass alignment scheme in RRC according to some embodiments of the invention.

Fig. 9B illustrates another exemplary bypass alignment scheme in RRC according to some embodiments of the invention.

Fig. 9C illustrates yet another exemplary bypass alignment scheme in RRC according to some embodiments of the invention.

Fig. 9D illustrates an exemplary bypass alignment scheme in Transform Unit (TU) coding and RRC according to some embodiments of the invention.

Fig. 10A illustrates an exemplary bypass alignment scheme in a TSRC according to some embodiments of the invention.

Fig. 10B illustrates an exemplary bypass alignment scheme in TU coding and TSRC according to some embodiments of the invention.

Fig. 11 illustrates a flowchart of an exemplary video encoding method according to some embodiments of the invention.

Fig. 12 illustrates a flowchart of an exemplary video decoding method according to some embodiments of the invention.

Fig. 13 illustrates a flowchart of another exemplary video encoding method according to some embodiments of the invention.

Fig. 14 illustrates a flowchart of another exemplary video decoding method according to some embodiments of the invention.

Embodiments of the present invention will be described with reference to the accompanying drawings.

Detailed Description

While some configurations and arrangements are discussed, it should be understood that such configurations and arrangements are for illustration purposes only. One skilled in the relevant art will recognize that other configurations and arrangements may be used without departing from the spirit and scope of the invention. It will be apparent to those skilled in the relevant art that the present invention may be used in a variety of other applications as well.

Note that references in the specification to "one embodiment," "an example embodiment," "some embodiments," etc., indicate that: the described embodiments may include a particular feature, structure, or characteristic, but every embodiment is not necessarily intended to include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the purview of one skilled in the art to effect such feature, structure, or characteristic in connection with other ones of the embodiments whether or not explicitly described.

Generally, terms may be understood, at least in part, from usage in the context. For example, the term "one or more" as used herein may be used to describe any feature, structure, or characteristic in the singular or may be used to describe a combination of features, structures, or characteristics in the plural, depending at least in part on the context. Similarly, terms such as "a," "an," or "the" may also be construed to express singular usage or plural usage, depending at least in part on the context. Furthermore, the term "based on" may be understood as not necessarily intended to express a set of exclusive factors, but rather may allow for other factors to be present that are not necessarily explicitly described, depending at least in part on the context.

Various aspects of a video encoding system will now be described with reference to various apparatus and methods. These apparatus and methods will be described in the following detailed description and illustrated in the accompanying drawings by various modules, components, circuits, steps, operations, processes, algorithms, etc. (collectively referred to as "elements"). These elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software will depend on the particular application and design constraints imposed on the overall system.

The techniques described herein may be used for various video coding applications. As described herein, video encoding includes video encoding and video decoding. Video encoding and video decoding may be performed in units of blocks. For example, encoding/decoding processes such as transform, quantization, prediction, in-loop filtering, reconstruction, etc. may be performed on the encoded block, the transformed block, or the predicted block. As described herein, a block to be encoded/decoded may be referred to as a "current block", e.g., a current block may represent an encoded block, a transformed block, or a predicted block according to a current encoding/decoding process. Furthermore, it should be understood that the term "unit" used in the present invention means a basic unit for performing a specific encoding/decoding process, and the term "block" means a sample array of a predetermined size. Unless otherwise indicated, "block" and "unit" may be used interchangeably.

In video coding, quantization is used to reduce the dynamic range of a video signal, either transformed or not, so that fewer bits are used to represent the video signal. Prior to quantization, the video signal, either transformed or not, at a particular location is referred to as "coefficients"; after quantization, the quantized value of the coefficient is referred to as a "quantization level" or "level". In the present invention, the quantization level of a position refers to the quantization level of a coefficient at that position. Residual coding is used to convert the quantization level of a location into a code stream in video coding. After quantization, the nxm coded blocks have nxm quantization levels. These nxm quantization levels may be zero or non-zero values. If the non-zero level is not binary, the non-zero level is further binarized into a binary bin.

For example, context-adaptive modeling based binary arithmetic coding (Context-Adaptive modeling based Binary Arithmetic Coding, CABAC) for h.266/VVC, h.265/HEVC and h.264/AVC uses bins to encode the quantization levels of a location as bits. CABAC uses two coding methods based on context modeling. The context-based method will adaptively update the context model based on neighboring coding information; the Bin encoded in this way is called Context-Coded Bin (CCB). In contrast, another bypass approach would assume that the probability of 1 or 0 is always 50% and therefore always uses fixed context modeling without adaptation; the Bin encoded by this method is called Bypass-encoded Bin (BCB).

Throughput becomes a more serious problem for high bit depth and high bit rate video coding. However, compared to bypass coding bins, coding methods using context coding bins require relatively complex hardware implementation and generally reduce the throughput of video coding, and thus have become a bottleneck to improving the throughput of video coding at high bit depths and high bit rates.

In order to improve the throughput of video coding, especially high bit depth and high bit rate video coding, the present invention provides various schemes for bypass coding and bit alignment in video coding. For example, for applications requiring high bit depth and high bit rate video coding to obtain better throughput, the high throughput mode may be enabled during residual coding as needed.

In some embodiments, in high throughput mode, some or all of the context coding bins for residual coding may be changed to bypass coding bins. In some embodiments, in high throughput mode, some or all of the context coding bins for residual coding may be skipped. Thus, in high throughput mode, only bypass coding bins may be used for residual coding.

Furthermore, since bypass coding can be implemented with a shift operation instead of performing a conventional CABAC operation after applying bit alignment, bypass bit alignment can be applied in a high throughput mode to further improve the throughput of bypass coding, which can be implemented using a plurality of bypass coding bins to achieve simultaneous coding. In high throughput mode, bypass bit alignment may be invoked at different stages of residual coding, such as the start of the coding process of the current coding block, the start of the coding process of the transform unit, etc., as desired.

The high throughput mode, e.g., the coding block level or the transform unit level, may be enabled at various levels during residual coding, as desired. The high throughput mode may be further extended to some or all other context-coded bins used in video coding other than residual coding, such as motion vector dependent bins.

Fig. 1 illustrates a block diagram of an exemplary encoding system 100 according to some embodiments of the invention. Fig. 2 illustrates a block diagram of an exemplary decoding system 200 according to some embodiments of the invention. Each system 100 or 200 may be applied or integrated into a variety of systems and devices capable of data processing, such as computers and wireless communication devices. For example, the system 100 or 200 may be: all or part of a mobile phone, desktop computer, laptop computer, tablet computer, vehicle computer, game console, printer, pointing device, wearable electronic device, smart sensor, virtual Reality (VR) device, augmented Reality (AR) device, or any other suitable electronic device with data processing capabilities. As shown in fig. 1 and 2, the system 100 or 200 may include a processor 102, a memory 104, and an interface 106. These components are shown as being interconnected by a bus, but other connection types are also permissible. It should be appreciated that the system 100 or 200 may include any other suitable components for performing the functions described herein.

The processor 102 may include a microprocessor, such as a graphics processing unit (Graphic Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), a tensor processing unit (Tensor Processing Unit, TPU), a vision processing unit (Vision Processing Unit, VPU), a neural processing unit (Neural Processing Unit, NPU), a synergistic processing unit (Synergistic Processing Unit, SPU), or a physical processing unit (Physics Processing Unit, PPU), a microcontroller unit (MicroController Unit, MCU), an Application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA), a programmable logic device (Programmable Logic Device, PLD), a state machine, gating logic, discrete hardware circuitry, and other suitable hardware configured to perform the various functions described in this disclosure. Although only one processor is shown in fig. 1 and 2, it will be appreciated that multiple processors may also be included. Processor 102 may be a hardware device having one or more processing cores. The processor 102 may execute software. Software should be construed broadly as instructions, instruction sets, code segments, program code, programs, subroutines, software modules, applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. The software may include computer instructions written in an interpreted language, compiled language, or machine code. Other techniques for indicating hardware are also permissible under a broad class of software.

Memory 104 may broadly include memory (also known as primary/system memory) and storage (also known as secondary memory). For example, the memory 104 may include: random-Access Memory (RAM), read-Only Memory (ROM), static RAM (SRAM), dynamic RAM (DRAM), ferroelectric RAM (FRAM), electrically erasable programmable ROM (Electrically Erasable Programmable ROM, EEPROM), optical Disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical Disk storage, hard Disk Drive (HDD), such as magnetic Disk storage or other magnetic storage devices, flash drives, solid State Drives (SSD), or any other medium that may be used to carry or store the desired program code in the form of instructions that may be accessed and executed by processor 102. Broadly, the memory 104 may be embodied as any computer-readable medium, such as a non-transitory computer-readable medium. Although only one memory is shown in fig. 1 and 2, it is understood that multiple memories may be included.

Interface 106 may broadly comprise a data interface and a communication interface configured to receive and transmit signals in the course of receiving and transmitting information with other external network elements. For example, interface 106 may include Input/Output (I/O) devices and wired or wireless transceivers. Although only one interface is shown in fig. 1 and 2, it will be appreciated that multiple interfaces may also be included.

The processor 102, memory 104, and interface 106 may be implemented in various forms in the system 100 or 200 for performing video encoding functions. In some embodiments, the processor 102, memory 104, and interface 106 of the System 100 or 200 are implemented (e.g., integrated) on one or more System-on-chips (SOCs). In one example, the processor 102, memory 104, and interface 106 may be integrated on an application processor (application processor, AP) SoC that processes application processing in an Operating System (OS) environment, including applications running video encoding and video decoding. In another example, the processor 102, memory 104, and interface 106 may be integrated on a dedicated processor chip for video encoding, such as a GPU or ISP chip dedicated to image and video processing in a Real-time operating system (Real-Time Operating System, RTOS).

As shown in fig. 1, in an encoding system 100, a processor 102 may include one or more modules, such as an encoder 101. Although fig. 1 shows encoder 101 located within one processor 102, it should be understood that encoder 101 may include one or more sub-modules, which may be implemented on different processors that are either close to or remote from each other. Encoder 101 (and any corresponding sub-modules or sub-units) may be a hardware unit (e.g., a portion of an integrated circuit) of processor 102 designed for use with other components or software units implemented by processor 102 by executing at least a portion of a program (i.e., instructions). The instructions of the program may be stored on a computer readable medium, such as the memory 104, and when executed by the processor 102, the processor 102 may perform processes having one or more functions related to video coding, such as image segmentation, inter-prediction, intra-prediction, transformation, quantization, filtering, entropy coding, etc., as described in detail below.

Similarly, as shown in fig. 2, in decoding system 200, processor 102 may include one or more modules, such as decoder 201. Although fig. 2 shows decoder 201 within one processor 102, it should be understood that decoder 201 may include one or more sub-modules that may be implemented on different processors that are either close to or remote from each other. Decoder 201 (and any corresponding sub-modules or sub-units) may be a hardware unit (e.g., a portion of an integrated circuit) of processor 102 designed for use with other components or software units implemented by processor 102 by executing at least a portion of a program (i.e., instructions). The instructions of the program may be stored on a computer readable medium, such as the memory 104, and when executed by the processor 102, the processor 102 may perform processes having one or more functions related to video decoding, such as entropy decoding, inverse quantization, inverse transformation, inter prediction, intra prediction, filtering, for details, see below.

Fig. 3 illustrates a detailed block diagram of an exemplary encoder 101 in the encoding system 100 of fig. 1, according to some embodiments of the invention. As shown in fig. 3, the encoder 101 may include: the partitioning module 302, the inter-prediction module 304, the intra-prediction module 306, the transform module 308, the quantization module 310, the dequantization module 312, the inverse transform module 314, the filter module 316, the buffering module 318, and the encoding module 320. It should be understood that each element shown in fig. 3 is shown separately to represent feature functions different from each other in the video encoder, and this does not mean that each element is constituted by a separate hardware or a configuration unit of a single software. That is, for convenience of explanation, each element is included as a listed element, and at least two elements may be combined to form a single element, or one element may be divided into a plurality of elements to perform a function. It should also be understood that some of the elements are not necessary elements to perform the functions described in the present invention, but may be optional elements for improving performance. It is also to be understood that these elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on encoder 101.

The dividing module 302 may be configured to divide the input image of the video into at least one processing unit. The image may be a frame of video or a field of video. In some embodiments, the image includes an array of luma samples in a monochrome format, or an array of luma samples and two corresponding arrays of chroma samples. At this time, the processing Unit may be a Prediction Unit (PU), a Transform Unit (TU), or a Coding Unit (CU). The division module 302 may divide an image into a plurality of combinations of coding units, prediction units, and transform units, and encode the image by selecting a combination of the coding units, the prediction units, and the transform units based on a predetermined criterion (e.g., a cost function).

Similar to H.265/HEVC, H.266/VVC is a block-based hybrid spatial-temporal prediction coding scheme. As shown in fig. 5, during encoding, an input image 500 is first divided into square blocks, CTUs 502, by a division module 302. For example, CTU 502 may be a 128 x 128 pixel block. As shown in fig. 6, each CTU 502 in the image 500 may be partitioned by a partitioning module 302 into one or more CUs 602, which may be used for prediction and transformation. Unlike h.265/HEVC, in h.266/VVC, CU 602 may be rectangular or square and may be encoded without further division into prediction units or transform units. For example, as shown in fig. 6, partitioning CTU 502 into CUs 602 may include quadtree partitioning (shown in solid lines), binary tree partitioning (shown in dashed lines), and trigeminal tree partitioning (shown in dashed lines). According to some embodiments, each CU 602 may be as large as its root CTU 502 or a subdivision of the root CTU 502, as small as 4 x 4 blocks.

Referring to fig. 3, the inter prediction module 304 may be configured to perform inter prediction on a prediction unit, and the intra prediction module 306 may be configured to perform intra prediction on a prediction unit. It is possible to determine whether inter prediction or intra prediction is performed for the prediction unit, and determine specific information (e.g., intra prediction mode, motion vector, reference picture, etc.) according to each prediction method. At this time, the processing unit for performing prediction may be different from the processing unit for determining a prediction method and specific contents. For example, a prediction method and a prediction mode may be determined in a prediction unit, and prediction may be performed in a transform unit. Residual coefficients in the residual block between the generated prediction block and the original block may be input into the transform module 308. In addition, prediction mode information, motion vector information, etc. for prediction may be encoded into the code stream by the encoding module 320 together with residual coefficients or quantization levels. It should be appreciated that in some coding modes, the original block may be encoded as is, without generating the prediction block by the prediction module 304 or 306. It should also be appreciated that in some coding modes, prediction, transform, and/or quantization may also be skipped.

In some embodiments, the inter prediction module 304 may predict the prediction unit based on information of at least one of the images before or after the current image, and in some cases, may predict the prediction unit based on information of a partial region encoded in the current image. The inter prediction module 304 may include sub-modules such as a reference image interpolation module, a motion prediction module, and a motion compensation module (not shown). For example, the reference image interpolation module may receive reference image information from the buffer module 318 and generate pixel information for an integer number of pixels or less based on the reference image. In the case of luminance pixels, an 8-tap interpolation filter based on discrete cosine transform (Discrete Cosine Transform, DCT) with varying filter coefficients may be used to generate an integer number of pixels or less pixel information in 1/4 of the pixels. In the case of color difference signals, integer pixels or less pixel information in 1/8 pixel units may be generated using a DCT-based 4-tap interpolation filter having varying filter coefficients. The motion prediction module may perform motion prediction based on the reference image interpolated by the reference image interpolation section. Various methods such as a Full Search-based block matching algorithm (FBMA), a Three-Step Search (TSS), and a New Three-Step Search algorithm (New Three-Step Search algorithm, NTS) may be used as a method of calculating a motion vector. The motion vector may have a motion vector value based on 1/2, 1/4, or 1/16 pixels of the interpolated pixel or an integer number of pixel units. The motion prediction module may predict the current prediction unit by changing a motion prediction method. As the motion prediction method, various methods such as a skip method, a merge method, an advanced motion vector prediction (Advanced Motion Vector Prediction, AMVP) method, an intra block copy method, and the like may be used.

In some embodiments, the intra prediction module 306 may generate the prediction unit based on information of reference pixels surrounding the current block (i.e., pixel information in the current image). When the block in the neighborhood of the current prediction unit is a block in which inter prediction has been performed and thus the reference pixel is a pixel in which inter prediction has been performed, the reference pixel information of the block in the neighborhood in which intra prediction has been performed may be replaced with the reference pixel included in the block in which inter prediction has been performed. That is, when reference pixels are not available, at least one of the available reference pixels may be used to replace the unavailable reference pixel information. In intra prediction, the prediction mode may have an angular prediction mode using reference pixel information according to a prediction direction and a non-angular prediction mode not using direction information when performing prediction. The mode for predicting luminance information may be different from the mode for predicting color difference information, and intra prediction mode information for predicting luminance information or predicting luminance signal information may be used for predicting color difference information. In performing intra prediction, if the size of the prediction unit is the same as the size of the transform unit, intra prediction may be performed on the prediction unit based on a pixel located at the left side, a pixel located at the upper left side, and a pixel located at the top of the prediction unit. However, in performing intra prediction, if the size of the prediction unit is different from the size of the transform unit, intra prediction may be performed using reference pixels based on the transform unit.

The intra prediction method may generate a prediction block after applying an adaptive intra smoothing (Adaptive Intra Smoothing, AIS) filter to the reference pixels according to a prediction mode. The type of AIS filter applied to the reference pixels may be different. In order to perform the intra prediction method, the intra prediction mode of the current prediction unit may be predicted according to the intra prediction modes of the prediction units existing in the neighborhood of the current prediction unit. When predicting a prediction mode of a current prediction unit using mode information predicted from the prediction units in the neighborhood, if an intra prediction mode of the current prediction unit is identical to the prediction units in the neighborhood, information indicating that the prediction mode of the current prediction unit is identical to the prediction mode of the prediction units in the neighborhood may be transmitted using predetermined flag information; and if the prediction mode of the current prediction unit and the prediction modes of the prediction units in the neighborhood are different from each other, the prediction mode information of the current block may be encoded by additional flag information.

As shown in fig. 3, a residual block including a prediction unit performing prediction based on the prediction unit generated by the prediction module 304 or 306 and residual coefficient information, which is a difference value of the prediction unit from the original block, may be generated. The generated residual block may be input into transform module 308.

Transform module 308 may be configured to Transform a residual block comprising the original block and residual coefficient information of the prediction units generated by prediction modules 304 and 306 using a Transform method such as DCT, discrete sine Transform (Discrete Sine Transform, DST), karhunen-Loeve Transform (KLT), or Transform skip. Whether to apply DCT, DST, or KLT to transform the residual block is determined based on intra prediction mode information of a prediction unit used to generate the residual block. The transform module 308 may transform the video signal in the residual block from the pixel domain to a transform domain (e.g., a frequency domain depending on the transform method). It should be appreciated that in some examples, the transform module 308 may be skipped and the video signal may not be transformed into the transform domain.

The quantization module 310 may be configured to quantize the coefficients of each position in the encoded block to generate a quantization level for the position. The current block may be a residual block. That is, the quantization module 310 may perform quantization processing on each residual block. The residual block may comprise N x M positions (samples), each position (sample) being associated with a transformed or non-transformed video signal/data, e.g. luminance information and/or chrominance information, where N and M are positive integers. In this disclosure, prior to quantization, the video signal, either transformed or not, at a particular location is referred to herein as a "coefficient". After quantization, the quantized value of the coefficient is referred to herein as a "quantization level" or "level".

Quantization may be used to reduce the dynamic range of a transformed or untransformed video signal, thereby using fewer bits to represent the video signal. Quantization typically involves dividing by a quantization step and subsequent rounding, while dequantization (also referred to as inverse quantization) involves multiplying by the quantization step. Such quantization process is called scalar quantization. Quantization of all coefficients within a coded block can be done independently and such quantization methods can be used in some existing video compression standards, such as h.264/AVC and h.265/HEVC.

For n×m encoded blocks, two-Dimensional (2D) coefficients of the blocks may be converted into One-Dimensional (1D) sequences using a specific encoding scan order for coefficient quantization and encoding. Typically, the encoding scan starts from the top left corner of the encoding block and stops at the last non-zero coefficient/level in the bottom right corner or bottom right direction of the encoding block. It should be appreciated that the encoding scan order may include any suitable order, such as a zig-zag scan order, a vertical (column) scan order, a horizontal (row) scan order, a diagonal scan order, or any combination thereof. Quantization of coefficients within a coded block may utilize coded scan order information. For example, it may depend on the state of the previous quantization level along the coding scan order. To further increase coding efficiency, quantization module 310 may use multiple quantizers, such as two scalar quantizers. Which quantizer to use to quantize the current coefficient may depend on information preceding the current coefficient in the encoding scan order. Such quantization process is called correlation quantization.

Referring to fig. 3, the encoding module 320 may be configured to encode a quantization level at each position in the encoded block into the bitstream. In some embodiments, the encoding module 320 may perform entropy encoding on the encoded blocks. Entropy encoding may use various binarization methods, such as exponential Golomb encoding; the binarized bin may be further encoded by Context-adaptive variable length coding (CAVLC), CABAC, etc. In addition to the quantization level, the encoding module 320 may encode various other information, such as block type information of an encoding unit, prediction mode information, partition unit information, prediction unit information, transmission unit information, motion vector information, reference frame information, block interpolation information, and filtering information input from, for example, the prediction modules 304 and 306, etc. In some embodiments, the encoding module 320 may perform residual encoding on the encoded blocks to convert the quantization levels into a code stream. For example, after quantization, there may be n×m quantization levels for n×m blocks. These nxm stages may be zero or non-zero values. If the non-zero order is not binary, it may be further binarized into binary bins, for example using CABAC.

Non-binary syntax elements may be mapped to binary codewords. Bijective mapping between symbols and codewords typically uses a simple structured code, known as binarization. Binary syntax elements and binary symbols (also referred to as bins) of codewords of non-binary data may be encoded using binary arithmetic coding. The core encoding engine of CABAC may support two modes of operation: a context coding mode that encodes bins with an adaptive probability model, and a less complex bypass mode that uses a fixed probability of 1/2. The adaptive probability model is also called context, and assigning probability models to individual bins is called context modeling.

In H.266/VVC, the encoded blocks are transform blocks encoded using RRC, according to some aspects of the invention. Transform blocks greater than 4 x 4 may be divided into disjoint 4 x 4 sub-blocks, which are processed using an inverse diagonal scan pattern. It will be appreciated that H.266/VVC supports non-4 x 4 sub-blocks due to the non-square rectangular shape of the transform blocks. For ease of description and without loss of generality, fig. 7A depicts an example of a 16 x 16 transform block, wherein the transform block may be further divided into 4 x 4 sub-blocks. The inverse diagonal scan pattern is used to process sub-blocks of the transform block and to process frequency locations within each sub-block.

In RRC, the last non-zero level position (also referred to as the last valid scan position) may be defined as the last non-zero level position along the code scan order. The last non-zero level 2D coordinates (last_sig_coeff_x and last_sig_coeff_y) may be encoded first with up to four syntax elements, i.e. using up to four residual coding bins: two context coding bin-two last significant coefficient prefixes (last_sig_coeff_x_prefix and last_sig_coeff_y_prefix), and two bypass coding bin-two last significant coefficient suffixes (last_sig_coeff_x_suffix and last_sig_coeff_x_suffix). Within a sub-block, the RRC may first encode a context-encoded sub-block flag (sb_encoded_flag) to indicate whether all levels of the current sub-block are equal to zero. For example, if the sb_coded_flag is equal to 1, there may be at least one non-zero coefficient in the current sub-block; if the sb_coded_flag is equal to 0, all coefficients in the current sub-block will be zero. It should be appreciated that the sb_coded_flag of the last non-zero sub-block with the last non-zero level may be derived from last_sig_coeff_x and last_sig_coeff_y according to the coding scan order without being coded into the code stream. Other sb_coded_flag may be encoded as context-encoded bin. The RRC may encode sub-blocks from the last non-zero sub-block in reverse coding scan order.

To guarantee worst-case throughput, the maximum number of context-encoding bins may be limited using the value of the remaining context-encoding bins (rembinstpass 1). The initial value of rembinstpass 1 may be calculated based at least in part on the length and width of the encoded block. Within the sub-block, the RRC may encode the level of each location in a reverse coding scan order. A predefined threshold may be compared to rembinstpass 1 to determine if the maximum number of context-encoding bins has been reached. For example, the threshold for remBinsPass1 in H.266/VVC may be predefined as 4.

As shown in FIG. 8A, if remBinsPass1 is not less than 4 ("residual CCB. Gtoreq.4" in FIG. 8A), then when encoding the quantization level for each position of the sub-block ("SB" in FIG. 8A), a significance flag ("sig_coeff_flag" in FIG. 8A, "sig" in FIG. 8A) may be encoded into the bitstream first to indicate whether the level is zero or non-zero. If the level is not zero, a greater than 1 flag (abs_level_gtx_flag [ n ] [0], where n is an index along the scan order of the current location within the sub-block, "gt1" in FIG. 8A) may be encoded into the code stream to indicate whether the absolute level is 1 or greater than 1. If the absolute level is greater than 1, then a parity flag ("par" in FIG. 8A) may be encoded into the code stream to indicate whether the level is odd or even, and then a greater than flag ("abs_level_gtx_flag [ n ] [1], FIG. 8A) may be present. The flags of par_level_flag and abs_level_gtx_flag [ n ] [1] may also be used together to indicate that the level is 2, 3, or greater than 3. After each of the aforementioned syntax elements is encoded with a context encoding method (i.e., context encoding bin), the value of rembinstpass 1 may be reduced by 1. In other words, for each position of each sub-block in the first encoding pass (channel 1 in fig. 8A), the significance flag, the greater than 1 flag, the parity flag, and the greater than flag may be encoded as a context encoding bin.

If the absolute level is greater than 5 or the value of rembinstpass 1 is less than 4, the other two syntax elements, hint (abs_residual, "rem" in fig. 8A) and absolute level (dec_abs_level, "decAbsLevel" in fig. 8A) may be encoded as bypass encoded bins in the second encoding pass (pass 2 "in fig. 8A) and the third encoding pass (pass 3" in fig. 8A), respectively, for the remaining level (remaining level) after encoding the aforementioned context encoded bin. In addition, a coefficient sign flag (coeff_sign_flag, fig. 8A) of each non-zero level may also be encoded as a bypass encoding bin in the fourth encoding pass (pass 4 in fig. 8A) to fully represent the quantization level.

In some embodiments, a more general residual coding method uses a greater than level flag (abs_level_gtxx_flag) and a remaining level bin to make it possible to conditionally parse syntax elements for level coding of transform blocks, the binarization of their corresponding level absolute values being shown in table I below. Abs_level_gtxx_flag herein describes whether the absolute value of a level is greater than X, where X is an integer, such as 0, 1, 2,. If abs_level_gtxX_flag is 0, where X is an integer between 0 and N-1, abs_level_gtx (X+1) _flag will not exist. If abs_level_gtxX_flag is 1, abs_level_gtx (X+1) _flag will exist. In addition, if abs_level_gtxn_flag is 0, there will be no remainder (remainders). When abs_level_gtxn_flag is 1, then there is a remainder, which represents the value after (n+1) is removed from the stage. In general, abs_level_gtxx_flag may be encoded as context-encoded bin, while the remaining-level bin may be encoded as bypass-encoded bin.

abs(lvl)	0	1	2	3	4	5	6	7	8	9	...
												abs_level_gtx0_flag	0	1	1	1	1	1	1	1	1	1	...
abs_level_gtx1_flag		0	1	1	1	1	1	1	1	1	...
												abs_level_gtx2_flag			0	1	1	1	1	1	1	1	...
abs_level_gtx3_flag				0	1	1	1	1	1	1	...
												abs_remainder					0	1	2	3	4	5	...

TABLE 1 residual coding based on abs_level_gtxX_flag bin and remainder bin

In h.266/VVC, the coding block is a transform skip block coded using TSRC, according to some aspects of the invention. Transform skip blocks greater than 4 x 4 may be divided into disjoint 4 x 4 sub-blocks that are processed using an inverse diagonal scan mode. It will be appreciated that H.266/VVC supports non-4 x 4 sub-blocks due to the non-square rectangular shape of the transform blocks. For ease of description and without loss of generality, fig. 7B depicts an example of a 16 x 16 transform skip block, wherein the transform skip block may be further divided into 4 x 4 sub-blocks. One difference between TSRC and RRC is the reversal of the scan order. As shown in fig. 7B, the diagonal scan may be used in a forward manner (rather than in reverse order as in RRC).

Furthermore, unlike RRC, which encodes the last valid scanning position into the code stream, in TSRC, the last valid scanning position may not be encoded and all scanning positions of the transform skip block may be encoded. Similar to RRC, in TSRC, an encoded sub-block flag (sb_coded_flag) may be used to indicate whether all levels of the current sub-block are equal to zero. Furthermore, to guarantee worst-case throughput, the maximum number of context-coded bins is limited using the value of the remaining context-coded bins (RemCcbs). The predefined threshold may be compared to remcbs to determine if the maximum number of context encoding bins has been reached. For example, the threshold for RemCcbs in H.266/VVC may be predefined as 4.

As shown in FIG. 8B, if RemCcbs is not less than 4 ("residual CCB. Gtoreq.4" in FIG. 8B), then for each stage in each sub-block, a significance flag ("sig") may be encoded into the code stream first to indicate whether the stage is zero or non-zero. If the level is not zero, a coefficient sign flag (sign in FIG. 8B) may be encoded to indicate whether the level is positive or negative. Then, a greater than 1 flag (abs_level_gtx_flag [ n ] [0], where n is an index along the scan order of the current position within the sub-block, "gt1" in fig. 8B), may be encoded to indicate whether the current absolute level of the current position is greater than 1. If abs_level_gtx_flag [ n ] [0] is not zero, a parity flag (par_level_flag, "par" in FIG. 8B) may be encoded. After each of the aforementioned syntax elements is encoded with a context encoding method (i.e., context encoding bin), the value of remcbs may be reduced by 1. In other words, for each position of each sub-block in the first encoding pass (channel 1 in fig. 8B), the significance flag, the coefficient symbol flag, the greater than 1 flag, and the parity flag may be encoded as a context encoding bin.

After encoding the aforementioned syntax elements for all positions within the current sub-block, if remcbs is still not less than 4, a maximum of 4 greater than flags (abs_level_gtx_flag [ n ] [ j ], where n is an index along the scan order of the current position within the sub-block, j is from 1 to 4, "gt3, gt5, gt7, and gt9" in fig. 8B may be encoded as context encoding bin in the second encoding pass (channel 2 "in fig. 8B). After each abs_level_gtx_flag [ n ] [ j ] is encoded in the second encoding pass, the value of remcbs may be reduced by 1. If remcbs is not less than 4, then in the third encoding pass ("pass 3" in fig. 8B), the hint ("rem" in fig. 8B) may be encoded as bypass encoded bin as necessary for the current position within the sub-block. For those positions where the absolute level (dec_abs_level, "decAbsLevel" in fig. 8B) is fully encoded as bypass encoded bin, coeff_sign_flags may also be encoded as bypass encoded bin in the fourth encoding pass (channel 4 "in fig. 8B).

Referring back to fig. 3, as shown in fig. 3, the dequantization module 312 may be configured to dequantize the quantization level by the dequantization module 312, and the inverse transformation module 314 may be configured to inverse transform the coefficients transformed by the transformation module 308. The reconstructed residual block generated by the dequantization module 312 and the inverse transform module 314 may be combined with the prediction unit predicted by the prediction module 304 or 306 to generate a reconstructed block.

The filter module 316 may include at least one of a deblocking filter, an offset correction module, and an adaptive loop filter (Adaptive Loop Filter, ALF). The deblocking filter may remove block distortion generated by inter-block boundaries in the reconstructed image. The offset correction module may correct an offset to the original video in units of pixels for the video on which deblocking has been performed. ALF may be performed based on values obtained by comparing the reconstructed and filtered video with the original video. The buffer module 318 may be configured to store the reconstructed block or the reconstructed image calculated by the filter module 316, and when performing inter prediction, the reconstructed and stored block or image may be provided to the inter prediction module 304.

Fig. 4 illustrates a detailed block diagram of an exemplary decoder 201 in the decoding system 200 in fig. 2 according to some embodiments of the invention. As shown in fig. 4, the decoder 201 may include: decoding module 402, dequantization module 404, inverse transform module 406, inter prediction module 408, intra prediction module 410, filter module 412, and buffer module 414. It should be understood that each element shown in fig. 4 is shown separately to represent feature functions different from each other in the video decoder, and this does not mean that each element is constituted by a separate hardware or a configuration unit of a single software. That is, for convenience of explanation, each element is included as a listed element, and at least two elements may be combined to form a single element, or one element may be divided into a plurality of elements to perform a function. It should also be understood that some of the elements are not necessary elements to perform the functions described in the present invention, but may be optional elements for improving performance. It is also to be understood that these elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on decoder 201.

When a video bitstream is input from a video encoder (e.g., encoder 101), the input bitstream may be decoded by decoder 201 in a process inverse to that of the video encoder. Accordingly, some details of decoding described above with respect to encoding may be skipped for ease of description. The decoding module 402 may be configured to decode the code stream to obtain various information encoded into the code stream, such as a quantization level at each position in the encoded block. In some embodiments, the decoding module 402 may perform entropy decoding corresponding to entropy encoding performed by an encoder (e.g., exponential Golomb encoding or context encoding, such as CAVLC, CABAC, etc.). In addition to the quantization levels of the positions in the encoded blocks, the decoding module 402 may also decode various other information, such as block type information, prediction mode information, partition unit information, prediction unit information, transmission unit information, motion vector information, reference frame information, block interpolation information, and filtering information of the encoding units. During the decoding process, the decoding module 402 may perform reordering on the code stream to reconstruct and reorder the data from a 1D order into 2D reordered blocks by a reverse scanning method based on the encoding scanning order used by the encoder.

The dequantization module 404 may be configured to dequantize a quantization level for each position of a coded block (e.g., a 2D reconstructed block) to obtain coefficients for each position. In some embodiments, the dequantization module 404 may also perform correlated dequantization based on quantization parameters provided by the encoder, including information related to the quantizers used in correlated quantization, such as quantization step sizes used by each quantizer.

The inverse transform module 406 may be configured to perform an inverse transform, such as an inverse DCT, an inverse DST, and an inverse KLT for DCTs, DSTs, and KLTs, respectively, performed by the encoder, to transform data from a transform domain (e.g., coefficients) back to a pixel domain (e.g., luminance information and/or chrominance information). In some embodiments, the inverse transform module 406 may selectively perform transform operations (e.g., DCT, DST, KLT) according to pieces of information such as a prediction method, a size of a current block, a prediction direction, and the like.

The inter prediction module 408 and the intra prediction module 410 may be configured to: the prediction block is generated based on information related to the generation of the prediction block provided by the decoding module 402 and information of previously decoded blocks or pictures provided by the buffering module 414. As described above, if the size of the prediction unit and the size of the transform unit are the same when intra prediction is performed in the same manner as the operation of the encoder, intra prediction may be performed on the prediction unit based on the pixel located at the left side, the pixel located at the upper left side, and the pixel located at the top of the prediction unit. However, in performing intra prediction, if the size of the prediction unit and the size of the transform unit are different, intra prediction may be performed using reference pixels based on the transform unit.

The reconstructed block or reconstructed image, which is formed by the combination of the outputs of the inverse transform module 406 and the prediction module 408 or 410, may be provided to a filter module 412. The filter module 412 may include a deblocking filter, an offset correction module, and an ALF. The buffer module 414 may store and use the reconstructed image or the reconstructed block as a reference image or a reference block of the inter prediction module 408, and may output the reconstructed image.

However, as described above, the encoding/decoding operations performed by the encoding module 320 and the decoding module 402 may not be suitable for some video encoding applications, such as high bit depth and high bit rate video encoding, due to their limited throughput. While counters rembinstpass 1 and remcbs are in RRC and TSRC, respectively, to limit the total number of context coding bins to help achieve worst case throughput, the higher computational cost of handling context coding bins and the undesirable switching between context coding bins and bypass coding bins in CABAC limit the throughput of video coding.

Encoding module 320 and decoding module 402 may be configured to enable a high throughput mode consistent with the scope of the invention. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin. Accordingly, the encoding module 320 may be configured to encode the quantization level of the encoded block and/or any other suitable information related to the encoded block into the code stream in the high throughput mode to improve throughput. Similarly, the decoding module 402 may be configured to decode the code stream in a high throughput mode to obtain the quantization level of the encoded block and/or any other suitable information related to the encoded block, thereby improving throughput.

Consistent with the scope of the invention, high throughput mode may be enabled by encoding module 320 and decoding module 402, not only at the encoding block level, but also at the transform unit level. In some embodiments, in high throughput mode, multiple transform unit bins of a transform unit may be changed from context encoded bins to bypass encoded bins. Thus, the encoding module 320 may be configured to: the quantization level of the transform unit and/or any other suitable information related to the transform unit is encoded into the code stream in a high throughput mode to improve throughput. Similarly, the decoding module 402 may be configured to: the code stream is decoded in a high throughput mode to obtain the quantization level of the transform unit and/or any other suitable information related to the transform unit, thereby improving throughput. Any other suitable bypass coding bin (e.g., motion vector differential bin) may be changed to bypass coding bin when encoding module 320 performs an encoding operation and when decoding module 402 performs a decoding operation when high throughput mode is enabled.

CABAC in h.266/VVC is a sequential process in which the evaluation of each iteration depends on the result of the previous iteration. At higher bit depths and higher bit rate operating ranges (especially in 16 bit inputs), the serial nature of the CABAC decoding process may affect codec throughput. Consistent with the scope of the invention, a bypass alignment method for VVC operating range extension may be used prior to starting encoding/decoding of the bypass encoded bin, for example, by setting the value of the current interval length R of the CABAC engine (e.g., a 9-bit variable called ivlCurrRange) to 256. After alignment of the ivlCurrRange to 256, for example, a shift operation (e.g., via a shift register) may be used to implement the decoding process of bypass encoded bins, rather than undergoing a conventional CABC operation. Thus, the aligned bypass encoded bins may be encoded simultaneously to further improve throughput.

To utilize bypass alignment and full bypass coding schemes, bypass bit alignment is also applied in high throughput mode consistent with the scope of the invention. For example, by setting the value of the current interval length to 256, the application of bypass bit alignment may be invoked at different stages of the encoding and decoding process, as described in more detail below.

Fig. 9A illustrates an exemplary bypass alignment scheme in RRC according to some embodiments of the invention. As shown in fig. 9A, the code stream may start from a transform unit bin of the transform unit. In CABAC, various transform unit bins may still be context-encoded bins for context encoding. The transforming unit bin may include: a coded Cb transform block Flag (tu_cb_coded_flag), a coded Cr transform block Flag (tu_cr_coded_flag), a coded luma transform block Flag (tu_y_coded_flag), a quantization parameter increment value (cu_qp_delta_abs), a chroma quantization parameter offset Flag (cu_chroma_qp_offset_flag), a chroma quantization parameter offset index (cu_chroma_qp_offset_idx), a Joint chroma Flag (tu_joint_cbcr_residu_flag), and a transform skip Flag (transform_skip_flag). It should be appreciated that the transform unit bin may also include bypass coding bins, such as quantization parameter delta sign flag (cu_qp_delta_sign_flag) in some examples.

As shown in fig. 9A, the transform unit may correspond to one encoded block (e.g., a transform block of RRC) of luminance samples (Y in fig. 9A) and two corresponding encoded blocks of chrominance samples (Cb and Cr in fig. 9A). Thus, the transform unit bin may comprise three transform_skip_flags for Y, cb and Cr coding blocks, respectively, each being a context coding bin. For each coded block, the first residual coding bin of the coded block to be coded/decoded in the code stream following the transform_skip_flag may be the last significant coefficient prefix (last_sig_coeff_x_prefix and last_sig_coeff_y_prefix), which is still the context coding bin. All other residual coding bins in each coding block may be bypass coding bins as shown in fig. 9A. For example, bypass encoded residual encoding bin may include: last significant coefficient suffix (last_sig_coeff_x_suffix and last_sig_coeff_y_suffix), coded sub-block flag (sb_coded_flag), hint (abs_main), absolute level (dec_abs_level) and coefficient sign flag (coeff_sign_flag).

That is, the high throughput mode may be enabled for each of the encoded blocks after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix and before sb_coded_flag. In some embodiments where it is also desirable to encode last_sig_coeff_x_sufix and last_sig_coeff_y_sufix, the high throughput mode may be enabled for each encoded block after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix and before last_sig_coeff_x_sufix and last_sig_coeff_y_sufix. In other words, the high throughput mode may be enabled for each of the encoding blocks immediately following last_sig_coeff_x_prefix and last_sig_coeff_y_prefix. In the high throughput mode, for each position of each sub-block, the residual coding bin, sb_coded_flag, may be changed from the context coding bin to the bypass coding bin. For example, by setting the value of the remaining context coding bin (rembinstpass 1) to be less than the threshold value 4, for example, to 0, coding of all other context coding bins such as the significance flag (sig_coeff_flag), greater than 1 flag (abs_level_gtx_flag [ n ] [0 ]), parity flag (par_level_flag), and greater than flag (abs_level_gtx_flag [ n ] [1 ]), can be skipped. In other words, in the high throughput mode, the first encoding channel of each position of each sub-block of the encoding block may be skipped so that the context encoding bin may not occur in the first encoding channel. Therefore, in the high throughput mode, each coding block may be coded using only bypass coding bin, except for last_sig_coeff_x_prefix and last_sig_coeff_y_prefix.

As shown in fig. 9A, for each encoded block, the application of bypass bit alignment may be invoked immediately after last_sig_coeff_x_prefix and last_sig_coeff_y_prefix, e.g., by setting the value of ivl currrange to 256, as part of a high throughput mode, so that all bypass encoded bins may be bit aligned to allow shift operations and parallel processing. As shown in fig. 9A, the high throughput mode may be enabled at the coding block level and 3 times bypass bit alignment may be invoked for three coding blocks corresponding to the transform unit.

Fig. 9B illustrates another exemplary bypass alignment scheme in RRC according to some embodiments of the invention. Unlike the bypass alignment scheme in fig. 9A, in the bypass alignment scheme of fig. 9B, last_sig_coeff_x_prefix and last_sig_coeff_y_prefix are further changed from the context encoding bin in the high throughput mode to the bypass encoding bin, so that each encoding block can be encoded using only the bypass encoding bin in the high throughput mode in fig. 9B. Further, as shown in fig. 9B, for each encoded block, for example, by setting the value of ivlCurrRange to 256, the application of bypass bit alignment may be invoked before last_sig_coeff_x_prefix and last_sig_coeff_y_prefix as part of the high throughput mode, so that all bypass encoded bins may be bit aligned to allow shift operations and parallel processing. For example, bypass bit alignment may be applied at the beginning of the code stream for each encoded block. As shown in fig. 9B, the high throughput mode may be enabled at the coding block level and 3 times bypass bit alignment may be invoked for three coding blocks corresponding to the transform unit.

Compared to the scheme in fig. 9A, the scheme in fig. 9B can further improve the throughput of video coding by changing the last significant coefficient prefix from context coding bin to bypass coding bin. For very high bit rates and high bit depth operating ranges, the bits of the position of the last significant coefficient may also be quite high, since most blocks are encoded into smaller block sizes. Since the index of the context variable is derived for each bin of last_sig_coeff_x_prefix and last_sig_coeff_y_prefix, the derivation of the context index of last_sig_coeff_x_prefix and last_sig_coeff_y_prefix may affect throughput.

Fig. 9C illustrates yet another exemplary bypass alignment scheme in RRC according to some embodiments of the invention. Unlike the bypass alignment scheme in fig. 9B, in the bypass alignment scheme of fig. 9C, in the high throughput mode, the transform_skip_flag is further changed from context encoding bin to bypass encoding bin, so that for each transform unit, the application of bypass bit alignment may be invoked before the first transform_skip_flag, for example, by setting the value of ivl currrange to 256 as part of the high throughput mode, so that bypass bit alignment may be invoked only once for all three encoding blocks corresponding to the transform unit.

Compared to the scheme in fig. 9B, the scheme in fig. 9C may further improve the throughput of video encoding by invoking bypass bit alignment only once per transform unit, instead of invoking bypass bit alignment 3 times for 3 encoded blocks individually.

Fig. 9D illustrates yet another exemplary bypass alignment scheme in RRC according to some embodiments of the invention. Unlike the bypass alignment scheme in fig. 9C, in the bypass alignment scheme of fig. 9D, the transform unit bins of the transform unit are further changed from context-coded bins to bypass code bins, so that in the high throughput mode, all the transform unit bins of the transform unit are also coded as bypass code bins. For example, in the high throughput mode, in addition to transform_skip_flags, tu_cb_coded_flag, tu_y_coded_flag, cu_qp_delta_abs, cu_chroma_qp_offset_flag, cu_chroma_qp_offset_idx, and tu_joint_cbcr_residual_flag may all be changed from context encoding bin to bypass encoding bin. Therefore, in the high throughput mode in fig. 9D = only the bypass coding bin may be used to encode the transform unit and the three corresponding coding blocks.

As shown in fig. 9D, for each transform unit, the application of bypass bit alignment may be invoked before the first transform unit bin (such as tu_cb_coded_flag), e.g., by setting the value of ivl currrange to 256 as part of the high throughput mode, so that bypass bit alignment may be invoked only once for each transform unit. For example, bypass bit alignment may be applied at the beginning of the code stream of the transform unit.

Compared to the scheme in fig. 9C, the scheme in fig. 9D may further improve the throughput of video coding by encoding the transform unit bin as bypass coding bin only, to avoid any switching between context coding and bypass coding by the CABAC coding engine when encoding the transform unit. The high throughput mode may be enabled at the transform unit level.

Fig. 10A illustrates an exemplary bypass alignment scheme in a TSRC according to some embodiments of the invention. As shown in fig. 10A, the code stream may start from a transform unit bin of the transform unit. In CABAC, various transform unit bins may still be context-encoded bins for context encoding. The transforming unit bin may include: a coded Cb transform block Flag (tu_cb_coded_flag), a coded Cr transform block Flag (tu_cr_coded_flag), a coded luma transform block Flag (tu_y_coded_flag), a quantization parameter increment value (cu_qp_delta_abs), a chroma quantization parameter offset Flag (cu_chroma_qp_offset_flag), a chroma quantization parameter offset index (cu_chroma_qp_offset_idx), a Joint chroma Flag (tu_joint_cbcr_residu_flag), and a transform skip Flag (transform_skip_flag). It should be appreciated that the transform unit bin may also include bypass coding bins, such as quantization parameter delta sign flag (cu_qp_delta_sign_flag) in some examples.

As shown in fig. 10A, the transform unit may correspond to one encoded block of luminance samples (e.g., a transform skip block of TSRC) and two corresponding encoded blocks of chrominance samples (Cb and Cr in fig. 10A). Thus, the transform unit bin may comprise three transform_skip_flags for Y, cb and Cr coding blocks, respectively, each being a context coding bin. For each coded block, all residual coded bins in each coded block may be bypass coded bins, as shown in fig. 10A. For example, bypass encoded residual encoding bin may include: coded sub-block flags (sb_coded_flag), hints (abs_remainders), and coefficient symbol flags (coeff_sign_flag).

That is, the high throughput mode may be enabled before the sb_coded_flag. In the high throughput mode, for each position of each sub-block, the residual coding bin (sb_coded_flag) may be changed from the context coding bin to the bypass coding bin. For example, by setting the value of the remaining context coding bin to be less than the threshold 4, e.g., to 0, coding of all other context coding bins, such as the significance flag (sig_coeff_flag), the greater-than-flag (abs_level_gtx_flag [ n ] [ j ]), and the parity flag (par_level_flag), may be skipped. In other words, in the high throughput mode, the first encoding channel and the second encoding channel of each position of each sub-block of the encoding block may be skipped such that the context encoding bin encoded during the first encoding channel and the second encoding channel may not be encoded. Thus, in high throughput mode, each coding block may be coded using only bypass coding bins.

As shown in fig. 10A, for each encoded block, the application of bypass bit alignment may be invoked before the sb_coded_flag, for example by setting the value of ivlCurrRange to 256 as part of the high throughput mode, so that all bypass encoding bins may be bit aligned to allow shift operations and parallel processing. As shown in fig. 10A, the high throughput mode may be enabled at the coding block level, and 3 times bypass bit alignment may be invoked for three coding blocks corresponding to the transform unit.

Fig. 10B illustrates another exemplary bypass alignment scheme in a TSRC according to some embodiments of the invention. Unlike the bypass alignment scheme in fig. 10A, in the bypass alignment scheme of fig. 10B, the transform unit bin of the transform unit is further changed from the context-coded bin to the bypass-coded bin, so that in the high-throughput mode, all the transform unit bins of the transform unit are also coded as bypass-coded bins. For example, in the high throughput mode, in addition to the transform_skip_flags, tu_cb_coded_flag, tu_y_coded_flag, cu_qp_delta_abs, cu_chroma_qp_offset_flag, cu_chroma_qp_offset_idx, tu_joint_cbcr_restore_flag, and transform_skip_flags may all be changed from context encoding bin to bypass encoding bin. Thus, in the high throughput mode in fig. 10B, the transform unit and the three corresponding coding blocks may be encoded using only bypass coding bins.

As shown in fig. 10B, for each transform unit, the application of bypass bit alignment may be invoked before the first transform unit bin (such as tu_cb_coded_flag), e.g., by setting the value of ivl currrange to 256 as part of the high throughput mode, so that bypass bit alignment may be invoked only once for each transform unit. For example, bypass bit alignment may be applied at the beginning of the code stream of the transform unit.

Compared to the scheme in fig. 10A, the scheme in fig. 10B can further improve the throughput of video coding by encoding the transform unit bin only as bypass coding bin to avoid any switching between context coding and bypass coding by the CABAC coding engine when encoding the transform unit. The high throughput mode may be enabled at the transform unit level.

Fig. 11 illustrates a flowchart of an exemplary method 1100 of video encoding according to some embodiments of the invention. Method 1100 may be performed at the encoding block level by encoder 101 of encoding system 100 or any other suitable video encoding system. As described below, method 1100 may include operations 1102, 1104, 1106, and 1108. It should be appreciated that some operations may be optional and some operations may be performed simultaneously or in a different order than shown in fig. 11.

At operation 1102, coefficients for each position in the encoded block are quantized to generate a quantization level for the corresponding position. For example, as shown in fig. 3, quantization module 310 may be configured to quantize coefficients of a current position in a current encoded block to generate a quantization level of the current position. In some embodiments, the encoded block includes a plurality of sub-blocks. The coding block may be a transform block coded using RRC or a transform skip block coded using TSRC.

At operation 1104, a high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block may be changed from a context coding bin to a bypass coding bin and bypass bit alignment applied. For example, as shown in fig. 3, the encoding module 320 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sps) range extension syntax for indicating whether high throughput mode is enabled. For example, sps_high_throughput_mode_enabled_flag=1 may indicate that high throughput mode is enabled, while sps_high_throughput_mode_enabled_flag=0 may indicate that high throughput mode is not enabled. When the sps_high_through_mode_enabled_flag does not exist, the value of the sps_high_through_mode_enabled_flag may be inferred to be equal to 0.

Various changes may be made in response to enabling the high throughput mode. In some embodiments, in high throughput mode, at least one residual coding bin of a coding block may be changed from a context coding bin to a bypass coding bin. In one example where the encoded block is a transform block encoded using RRC, a residual encoded bin that changes from a context encoded bin to a bypass encoded bin may include an encoded sub-block flag. For example, as shown in fig. 9A, when sps_high_through_mode_enabled_flag=1, the sb_coded_flag may be changed from the context-coded bin to the bypass-coded bin. In another example where the coding block is a transform block using RRC coding, the residual coding bin that changes from context coding bin to bypass coding bin may further include a last significant coefficient prefix. For example, as shown in fig. 9B-9D, when sps_high_through_mode_enabled_flag=1, last_sig_coeff_x_prefix and last_sig_coeff_y_prefix may also be changed from context encoding bin to bypass encoding bin. In yet another example where the encoded block is a transform skip block using TSRC encoding, a residual encoded bin that changes from a context encoded bin to a bypass encoded bin may include an encoded sub-block flag and/or coefficient symbol flag. For example, as shown in fig. 10A and 10B, when sps_high_through_mode_enabled_flag=1, sb_coded_flag and coeff_sign_flag may be changed from context encoding bin to bypass encoding bin.

In some embodiments, in high throughput mode, the value of the remaining context-encoding bin (e.g., counter) may be set to be less than a threshold. For example, the threshold may be equal to 4, and the value of the remaining context-encoding bin may be set to 0. Thus, in high throughput mode, encoding any context encoding bin for each position of each sub-block may be skipped. In one example where the coding block is a transform block coded using RRC, the variable rembinstpass 1 may be set to 0 to skip the first coding pass that includes coding all context coding bins: a significance flag (sig_coeff_flag), a greater than 1 flag (abs_level_gtx_flag [ n ] [0 ]), a parity flag (par_level_flag), and a greater than flag (abs_level_gtx_flag [ n ] [1 ]) for each position of each sub-block, as shown in fig. 8A and 9A-9D. In another example where the coding block is a transform skip block using TSRC coding, the variable RemCcbs may be set to 0 to skip the first coding pass and the second coding pass that include coding all context coding bins: a significance flag (sig_coeff_flag), a coefficient symbol flag (coeff_sign_flag), a greater than flag (abs_level_gtx_flag [ n ] [ j ]), and a parity flag (par_level_flag) for each position of each sub-block, as shown in fig. 8A, 10A, and 10B.

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1106, an application that bypasses bit alignment is invoked. For example, as shown in FIG. 3, the encoding module 320 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment may be applied at the encoding block level. In one example where the residual coding bin that changes from context-coded bin to bypass-coded bin includes a coded sub-block flag in RRC, for example, as shown in fig. 9A, the application of bypass bit alignment may be invoked after the last significant coefficient prefix is coded and before the coded sub-block flag is coded. In another example where the residual coding bin that changes from context-coded bin to bypass-coded bin also includes the last significant coefficient prefix in RRC, for example, as shown in fig. 9B, the application of bypass bit alignment may be invoked before the last significant coefficient prefix is coded. In another example where the residual coding bin that changes from context-coded to bypass-coded bin includes a coded sub-block flag and/or coefficient symbol flag in the TSRC, for example, as shown in fig. 10A, the application of bypass bit alignment may be invoked prior to coding the coded sub-block flag. For example, when sps_high_throughput_mode_enabled_flag=1, the request for the value of the first bypass decoding syntax element in the TSRC may be sb_coded_flag or abs_result, or the request for the first bypass decoding syntax element in the RRC may be last_sig_coeff_x_suffix, or last_sig_coeff_y_suffix or dec_abs_level, and bypass bit alignment may be invoked in the encoded block.

In some embodiments, bypass bit alignment may be invoked by a procedure having as input the variable ivlCurrRange and as output the updated variable ivlCurrRange. For coding block level alignment, this procedure may be applied before bypass coding of last_sig_coeff_x_suffix, or last_sig_coeff_y_suffix, or dec_abs_level, or sb_coded_flag, or abs_remanse. When ivlCurrRange is 256, the offset interval (ivlOffset) and the code stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 because ivlOffset is smaller than the limit of ivlCurrRange).

At operation 1108, the quantization level of the encoded block is encoded into a code stream in a high throughput mode. As shown in fig. 3, the encoding module 320 may be configured to encode the quantization levels for each location in the high throughput mode as described above using binary arithmetic coding, such as CABAC. In some embodiments, in high throughput mode, each residual coding bin of a coding block is coded as a bypass coding bin.

Fig. 12 illustrates a flowchart of an exemplary method 1200 of video decoding according to some embodiments of the invention. The method 1200 may be performed at the encoded block level by the decoder 201 of the decoding system 200 or any other suitable video decoding system. As described below, method 1200 may include operations 1202, 1204, 1206, and 1208. It should be appreciated that some operations may be optional and some operations may be performed simultaneously or may be performed in a different order than shown in fig. 12.

At operation 1202, a high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block may be changed from a context coding bin to a bypass coding bin and bypass bit alignment applied. For example, as shown in fig. 4, the decode module 402 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sps) range extension syntax for indicating whether high throughput mode is enabled. For example, sps_high_throughput_mode_enabled_flag=1 may indicate that high throughput mode is enabled, while sps_high_throughput_mode_enabled_flag=0 may indicate that high throughput mode is not enabled. When the sps_high_through_mode_enabled_flag does not exist, the value of the sps_high_through_mode_enabled_flag may be inferred to be equal to 0. As described above, the high throughput mode for video decoding at the coding block level may be the same as the high throughput mode for video coding.

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1204, an application of bypass bit alignment is invoked. For example, as shown in fig. 4, the decode module 402 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment may be applied at the encoding block level. In one example where the residual coding bin that changes from context-coded bin to bypass-coded bin includes a coded sub-block flag in RRC, for example, as shown in fig. 9A, the application of bypass bit alignment may be invoked after the last significant coefficient prefix and before the coded sub-block flag. In another example where the residual coding bin that changes from context-coded bin to bypass-coded bin also includes the last significant coefficient prefix in RRC, for example, as shown in fig. 9B, the application of bypass bit alignment may be invoked before the last significant coefficient prefix. In another example where the residual coding bin that changes from context-coded to bypass-coded bin includes a coded sub-block flag and/or coefficient symbol flag in the TSRC, for example, as shown in fig. 10A, an application of bypass bit alignment may be invoked before the coded sub-block flag. For example, when sps_high_throughput_mode_enabled_flag=1, the request for the value of the first bypass decoding syntax element in the TSRC may be sb_coded_flag or abs_result, or the request for the first bypass decoding syntax element in the RRC may be last_sig_coeff_x_suffix, or last_sig_coeff_y_suffix or dec_abs_level, and bypass bit alignment may be invoked in the encoded block.

In some embodiments, bypass bit alignment may be invoked by a procedure having as input the variable ivlCurrRange and as output the updated variable ivlCurrRange. For coding block level alignment, this procedure may be applied before bypass coding of last_sig_coeff_x_suffix, or last_sig_coeff_y_suffix, or dec_abs_level, or sb_coded_flag, or abs_remanse. When ivlCurrRange is 256, the offset interval (ivlOffset) and the code stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 because ivlOffset is smaller than the limit of ivlCurrRange). That is, after the bypass bit alignment is applied, the code stream may be decoded by a shift operation.

At operation 1206, in a high throughput mode, the code stream is decoded to obtain a quantization level for each position in the encoded block. As shown in fig. 4, the decoding module 402 may be configured to decode the code stream using binary arithmetic coding (e.g., CABAC) to obtain the quantization levels for each position in the high throughput mode as described above.

At operation 1208, the quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block. As shown in fig. 4, the dequantization module 404 may be configured to dequantize the quantization level for each position to generate coefficients for the corresponding position in the encoded block.

Fig. 13 illustrates a flowchart of another exemplary method 1300 of video encoding according to some embodiments of the invention. Method 1300 may be performed at the transform unit level by encoder 101 of encoding system 100 or any other suitable video encoding system. As described below, method 1300 may include operations 1302, 1304, 1306, and 1308. It should be appreciated that some operations may be optional and some operations may be performed simultaneously or may be performed in a different order than shown in fig. 13.

At operation 1302, coefficients for each location in the transform unit are quantized to generate a quantization level for each location. For example, as shown in fig. 3, quantization module 310 may be configured to quantize coefficients of a current position in a current transform unit to generate a quantization level of the current position. In some embodiments, the transform unit comprises an encoded block.

At operation 1304, a high throughput mode is enabled. In high throughput mode, the transform unit bin of the transform unit may be changed from a context-coded bin to a bypass-coded bin and bypass bit alignment applied. For example, as shown in fig. 3, the encoding module 320 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sps) range extension syntax for indicating whether high throughput mode is enabled. For example, sps_high_throughput_mode_enabled_flag=1 may indicate that high throughput mode is enabled, while sps_high_throughput_mode_enabled_flag=0 may indicate that high throughput mode is not enabled. When the sps_high_through_mode_enabled_flag does not exist, the value of the sps_high_through_mode_enabled_flag may be inferred to be equal to 0.

In some embodiments, changing from context-coded bin to bypass-coded bin may comprise: a coded Cb transform block Flag (tu_cb_coded_flag), a coded Cr transform block Flag (tu_cr_coded_flag), a coded luma transform block Flag (tu_y_coded_flag), a quantization parameter increment value (cu_qp_delta_abs), a chroma quantization parameter offset Flag (cu_chroma_qp_offset_flag), a chroma quantization parameter offset index (cu_chroma_qp_offset_idx), a Joint chroma Flag (tu_joint_cbcr_residu_flag), and a transform skip Flag (transform_skip_flag). For example, as shown in fig. 9D and 10B, when sps_high_through_mode_enabled_flag=1, tu_cb_coded_flag, tu_y_coded_flag, cu_qp_delta_abs, cu_chroma_qp_offset_flag, cu_chroma_qp_offset_idx, tu_joint_cbcr_residu_flag, and transform_skip_flag may be changed from context encoding bin to bypass encoding bin.

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1306, an application that bypasses bit alignment is invoked. For example, as shown in FIG. 3, the encoding module 320 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment is applied at the transform unit level. In one example, the application of bypass bit alignment may be invoked prior to encoding the first of the transform unit bins, e.g., as shown in fig. 9D and 10B. For example, when sps_high_through_mode_enabled_flag=1, the request for the value of the syntax element may be for the first bypass decoding syntax element tu_cb_coded_flag or tu_y_coded_flag in the transform unit, and bypass bit alignment may be invoked.

In some embodiments, bypass bit alignment may be invoked by a procedure having as input the variable ivlCurrRange and as output the updated variable ivlCurrRange. For transform unit level alignment, this process may be applied before the tu_cb_coded_flag or tu_y_coded_flag is bypass coded. When ivlCurrRange is 256, the offset interval (ivlOffset) and the code stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 because ivlOffset is smaller than the limit of ivlCurrRange).

At operation 1308, the quantization levels of the transform unit are encoded into a bitstream in a high-throughput mode. As shown in fig. 3, the encoding module 320 may be configured to encode the quantization levels for each location in the high throughput mode as described above using binary arithmetic coding, such as CABAC. In some embodiments, in high throughput mode, each transform unit bin of a transform unit is encoded as a bypass encoded bin.

It should be appreciated that in some examples, methods 1100 and 1300 may be combined such that high throughput mode may be enabled at both the transform unit level and the corresponding coding block level, e.g., as described above with reference to fig. 9D and 10B.

Fig. 14 illustrates a flowchart of an exemplary method 1400 of video decoding according to some embodiments of the invention. The method 1400 may be performed at the transform unit level by the decoder 201 of the decoding system 200 or any other suitable video decoding system. As described below, method 1400 may include operations 1402, 1404, 1406, and 1408. It should be appreciated that some operations may be optional and some operations may be performed simultaneously or in a different order than shown in fig. 14.

At operation 1402, a high throughput mode is enabled. In high throughput mode, the transform unit bin of the transform unit may be changed from a context-coded bin to a bypass-coded bin and bypass bit alignment applied. For example, as shown in fig. 4, the decode module 402 may be configured to enable a high throughput mode. In one example, a high throughput mode enable flag (sps_high_throughput_mode_enabled_flag) may be added as a new sequence parameter set (sps) range extension syntax for indicating whether high throughput mode is enabled. For example, sps_high_throughput_mode_enabled_flag=1 may indicate that high throughput mode is enabled, while sps_high_throughput_mode_enabled_flag=0 may indicate that high throughput mode is not enabled. When the sps_high_through_mode_enabled_flag does not exist, the value of the sps_high_through_mode_enabled_flag may be inferred to be equal to 0. As described in detail above, the high-throughput mode for video decoding at the transform unit level may be the same as the high-throughput mode for video encoding.

In some embodiments, bypass bit alignment is also applied to high throughput modes. At operation 1404, an application of bypass bit alignment is invoked. For example, as shown in fig. 4, the decode module 402 may be configured to invoke bypass bit alignment. The value of the current interval length (ivlCurrRange) may be set to 256 to apply bypass bit alignment in high throughput mode. In some embodiments, bypass bit alignment may be applied at the transform unit level. In one example, the application of bypass bit alignment may be invoked before the first of the transform unit bins, e.g., as shown in fig. 9D and 10B. For example, when sps_high_through_mode_enabled_flag=1, the request for the value of the syntax element may be for the first bypass decoding syntax element tu_cb_coded_flag or tu_y_coded_flag in the transform unit, and bypass bit alignment may be invoked.

In some embodiments, bypass bit alignment may be invoked by a procedure having as input the variable ivlCurrRange and as output the updated variable ivlCurrRange. For transform unit level alignment, this procedure may be applied before the bypass coding of tu_cb_coded_flag or tu_y_coded_flag. When ivlCurrRange is 256, the offset interval (ivlOffset) and the code stream can be regarded as a shift register, and the decoded value of the variable (binVal) can be regarded as the second most significant bit of the register (the most significant bit is always 0 because ivlOffset is smaller than the limit of ivlCurrRange).

At operation 1406, in a high throughput mode, the code stream is decoded to obtain a quantization level for each position in the transform unit. As shown in fig. 4, the decoding module 402 may be configured to decode the code stream using binary arithmetic coding (e.g., CABAC) to obtain the quantization level for each position in the high throughput mode as described above.

At operation 1408, the quantization level of the encoded block is dequantized to generate coefficients for each position in the transform unit. As shown in fig. 4, the dequantization module 404 may be configured to dequantize the quantization level for each position to generate coefficients for the corresponding position in the transform unit.

It should be appreciated that in some examples, methods 1200 and 1400 may be combined such that high throughput mode may be enabled at both the transform unit level and the corresponding coding block level, e.g., as described above with reference to fig. 9D and 10B.

It should be appreciated that any suitable additional changes may also be made in the high throughput mode. The Rice parameter may be used to control how the remainder is binarized in residual coding. For a given stage, the appropriate Rice parameter may be value binarized with a minimum number of bins. For example, the value of the variable (StatCoeff) may depend on the value of the stage and is used to derive the Rice parameter. Larger StatCoeff values may map to larger Rice parameters. In high throughput mode, there may be many large stages that need to be encoded. Thus, in high throughput mode, statCoeff should be set larger. For example, a fixed offset of 2 may be added in the following equation:

In some embodiments, in addition to the transform unit and the context-encoded bin of the encoded block as described above, any other suitable context-encoded bin may be changed to a bypass-encoded bin in high-throughput mode. These context-encoding bins may include, for example, motion vector difference bins such as abs_mvd_greate0_flag, abs_mvd_greate1_flag, abs_mvd_minus2, and mvd_sign_flag. Other possible context-encoded bins that are changed to bypass-encoded bins in high throughput mode may include, for example: alf_ctb_flag, alf_use_aps_flag, alf_ctb_filter_alt_idx, alf_ctb_cc_cr_idc, alf_ctb_cc_idc, sao_merge_left_flag, sao_merge_up_flag, sao_type_idx_chroma, sao_type_idx_luma, split_cu_flag, split_qt_flag, mt_split_cu_vertical_flag, mt_split_cu_binary_flag, non_inter_flag, cu_skip_flag pred_mode_flag, pred_mode_ ibc _flag, pred_mode_plt_flag, cu_act_enabled_flag, intra_bdpcm_luma_flag, intra_bdpcm_luma_dir_flag, intra_mipflag, intra_luma_ref_idx, intra_sub-options_mode_flag, intra_sub-options_split_flag, intra_luma_mpm_flag, intra_luma_non_player_flag, intra_bdpcm_chroma_flag, intra_bdpcm_chroma_dir_flag cclm_mode_flag, cclm_mode_idx, intra_chroma_pred_mode, palette_transmit_flag, copy_above_palette_indices_flag, run_copy_flag, general_merge_flag, regulation_merge_flag, mmvd_merge_flag, mmvd_cand_flag, mmvd_distance_idx, merge_sample_flag, merge_sample_idx, cip_flag, merge_gpm_idx0, merge_gpm_idx1, inter_pred_idc, mmvd_cand_flag, mmvd_distance_idx, merge_sample_flag, merge_gpm_idx inter_af_flag, cu_af_type_flag, sym_mvd_flag, ref_idx_l0, ref_idx_l1, mvp_l0_flag, mvp_l1_flag, amvr_flag, amvr_precision_idx, bcw_idx, cu_coded_flag, cu_sbt_flag, cu_sbt_quad_flag, cu_sbt_pos_flag, lfnst_idx, mts_idx, abs_mvd_great0_g, abs_mvd_great1_g.

In some embodiments, the application of bypass bit alignment is invoked after any context-encoded bin and before the first bypass-encoded bin, such that bypass alignment always occurs at the beginning of encoding of the first bypass-encoded bin.

In various aspects of the invention, it may be implemented in hardware, software, firmware, or any combination thereof. If the functions described herein are implemented in software, they may be stored as instructions on a non-transitory computer-readable medium. Computer readable media includes computer storage media. A storage medium may be any available medium that can be accessed by a processor, such as processor 102 in fig. 1 and 2. By way of example, and not limitation, such computer-readable media may comprise: RAM, ROM, EEPROM, CD-ROM or other optical disk storage, HDD (e.g., magnetic disk storage or other magnetic storage devices), flash memory drives, SSD, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a processing system (e.g., a mobile device or computer). Disk and disc, as used herein, includes: CD. Laser discs, optical discs, digital video discs (Digital Video Disc, DVD) and floppy discs, wherein magnetic discs typically reproduce data magnetically, while optical discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

According to one aspect of the present invention, a method for encoding a video image comprising an encoding block is disclosed. The coefficients for each position in the encoded block are quantized by a processor to generate a quantization level for each position. The high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The processor encodes the quantization levels of the encoded blocks into a code stream in a high throughput mode.

In some embodiments, the value of the remaining context-encoding bins is set to be less than a threshold.

I in some embodiments, the threshold is equal to 4 and the value of the remaining context-encoding bin is set to 0.

In some embodiments, the encoded block includes a plurality of sub-blocks. In some embodiments, for encoding, the context encoding bin for each sub-block is skipped.

In some embodiments, the value of the current interval length is set to 256 to apply bypass bit alignment in high throughput mode.

In some embodiments, the encoded block is a transform block encoded using RRC.

In some embodiments, the residual coding bin includes a coded sub-block flag.

In some embodiments, the application of bypass bit alignment is invoked after encoding the last significant coefficient prefix and before encoding the encoded sub-block flags.

In some embodiments, the residual coding bin further comprises a last significant coefficient prefix.

In some embodiments, the application of bypass bit alignment is invoked before encoding the last significant coefficient prefix.

In some embodiments, the encoded block is a transform skip block encoded using TSRC.

In some embodiments, the residual coding bin includes at least one of a coded sub-block flag or a coefficient symbol flag.

In some embodiments, the application of bypass bit alignment is invoked prior to encoding the encoded sub-block flags.

According to another aspect of the present invention, a system for encoding a video image including an encoding block includes: a memory configured to store instructions and a processor coupled to the memory. The processor may be configured to: upon execution of the instruction, the coefficients of each position in the encoded block are quantized to generate a quantization level for the corresponding position. The processor may also be configured to: when executing instructions, the high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The processor may also be configured to: the quantization level of the encoded block is encoded into a code stream in a high throughput mode when the instruction is executed.

In some embodiments, the processor is further configured to: the value of the remaining context-encoding bin is set to be less than the threshold.

In some embodiments, the threshold is equal to 4 and the value of the remaining context-encoding bin is set to 0.

In some embodiments, the encoded block includes a plurality of sub-blocks. In some embodiments, for encoding, the processor is further configured to skip encoding the context encoding bin for each sub-block.

In some embodiments, the processor is further configured to set the value of the current interval length to 256 to apply bypass bit alignment in the high throughput mode.

In some embodiments, the encoded block is a transform block encoded using RRC.

In some embodiments, the residual coding bin includes a coded sub-block flag.

In some embodiments, the processor is further configured to invoke application of bypass bit alignment after encoding the last significant coefficient prefix and before encoding the encoded sub-block flag.

In some embodiments, the processor is further configured to invoke application of bypass bit alignment prior to encoding the last significant coefficient prefix.

In some embodiments, the processor is further configured to invoke application of bypass bit alignment prior to encoding the encoded sub-block flag.

According to yet another aspect of the present invention, a non-transitory computer readable medium storing instructions is disclosed that when executed by a processor performs a process for encoding a video image comprising encoded blocks. The process includes quantizing coefficients for each position in the encoded block to generate a quantization level for the corresponding position. The process also includes enabling a high throughput mode. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The process also includes encoding the quantization levels of the encoded blocks into a code stream in a high throughput mode.

According to yet another aspect of the present invention, a method for decoding a video image comprising encoded blocks is disclosed. The high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. In the high throughput mode, the code stream is decoded by a processor to obtain a quantization level for each position in the encoded block. The quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block.

In some embodiments, the processor is further configured to set the value of the remaining context-encoding bin to be less than a threshold.

In some embodiments, after application of the bypass bit alignment, the code stream is decoded by a shift operation.

In some embodiments, the encoded block is a transform block encoded using RRC.

In some embodiments, the residual coding bin includes a coded sub-block flag.

In some embodiments, the application of bypass bit alignment is invoked after the last significant coefficient prefix and before the encoded sub-block flags.

In some embodiments, the application of bypass bit alignment is invoked before the last significant coefficient prefix.

In some embodiments, the application of bypass bit alignment is invoked before the encoded sub-block flags.

According to yet another aspect of the present invention, a system for decoding a video image including an encoded block includes: a memory configured to store instructions and a processor coupled to the memory. The processor may be configured to: when executing instructions, the high throughput mode is enabled. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The processor may also be configured to: in executing the instructions, the code stream is decoded in a high throughput mode to obtain a quantization level for each position in the encoded block. The processor may be further configured to: upon execution of the instruction, the quantization level of the encoded block is dequantized to generate coefficients for each position in the encoded block.

In some embodiments, the encoded block includes a plurality of sub-blocks. In some embodiments, encoding the context encoding bin for each sub-block is skipped.

In some embodiments, the encoded block is a transform block encoded using RRC.

In some embodiments, the residual coding bin includes a coded sub-block flag.

In some embodiments, the processor is further configured to: the application of bypass bit alignment is invoked after the last significant coefficient prefix and before the encoded sub-block flags.

In some embodiments, the processor is further configured to: before the last significant coefficient prefix, the application of bypass bit alignment is invoked.

In some embodiments, the coding block is a transform skip block coded using TSRC.

In some embodiments, the processor is further configured to invoke application of bypass bit alignment prior to the encoded sub-block flag.

According to yet another aspect of the present invention, a non-transitory computer readable medium storing instructions is disclosed that when executed by a processor performs a process for decoding a video image comprising encoded blocks. The process includes enabling a high throughput mode. In the high throughput mode, at least one residual coding bin of the coding block is changed from a context coding bin to a bypass coding bin and bypass bit alignment is applied. The process also includes decoding the code stream in a high throughput mode to obtain a quantization level for each position in the encoded block. The process also includes dequantizing the quantization level of the encoded block to generate coefficients for each position in the encoded block.

The foregoing description of the embodiments will so reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments without undue experimentation, without departing from the generic concept of the present invention. Accordingly, these adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance provided herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.

Embodiments of the present invention have been described above with the provision of functional building blocks illustrating the implementation of specified functions and relationships thereof. For ease of description, the boundaries of these functional building blocks have been arbitrarily defined herein. Alternate boundaries may be defined so long as the specified functions and relationships thereof are appropriately performed.

The summary and abstract sections may set forth one or more, but not all exemplary embodiments of the invention as contemplated by the inventors, and are therefore not intended to limit the invention and the appended claims in any way.

Various functional blocks, modules, and steps have been described above. The arrangement provided is illustrative and not limiting. Accordingly, the functional blocks, modules, and steps may be reordered or combined in a different manner than the examples provided above. Similarly, some embodiments include only a subset of the functional blocks, modules, and steps, and allow for any such subset.

The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A method for encoding an image of a video, the image comprising an encoding block, the method comprising:

A processor quantizes the coefficients for each position in the encoded block to generate a quantization level for the corresponding position;

enabling a high throughput mode, wherein in the high throughput mode at least one residual code bit of the code block is changed from a context code bit to a bypass code bit and bypass bit alignment is applied; and

the processor encodes quantization levels of the encoded blocks into a code stream in the high throughput mode.

2. The method of claim 1, further comprising: the value of the remaining context-encoded binary bits is set to be less than a threshold value.

3. The method of claim 2, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

4. The method of claim 1, wherein,

the coding block includes a plurality of sub-blocks; and

the encoding includes: skipping encoding of context-encoded binary bits for each of the sub-blocks.

5. The method of claim 1, further comprising: the value of the current interval length is set to 256 to apply the bypass bit alignment in the high throughput mode.

6. The method of claim 1, wherein the coding block is a transform block coded using conventional residual coding RRC.

7. The method of claim 6, wherein the residual encoded binary bits comprise encoded sub-block flags.

8. The method of claim 7, further comprising: the application of the bypass bit alignment is invoked after encoding the last significant coefficient prefix and before encoding the encoded sub-block flag.

9. The method of claim 6, wherein the residual encoded binary bits further comprise a last significant coefficient prefix.

10. The method of claim 9, further comprising: the application of bypass bit alignment is invoked before encoding the last significant coefficient prefix.

11. The method of claim 1, wherein the coding block is a transform skip block coded using a transform skip residual coding, TSRC.

12. The method of claim 11, wherein the residual encoded binary bits comprise at least one of encoded sub-block flags or coefficient symbol flags.

13. The method of claim 12, further comprising: the application of bypass bit alignment is invoked prior to encoding the encoded sub-block flag.

14. A system for encoding an image of a video, the image comprising an encoding block, the system comprising:

a memory configured to store instructions; and

a processor coupled to the memory and configured to, upon execution of the instructions:

quantizing the coefficients of each position in the encoded block to generate a quantization level for the corresponding position;

the quantization level of the encoded block is encoded into a code stream in the high throughput mode.

15. The system of claim 14, wherein the processor is further configured to: the value of the remaining context-encoded binary bits is set to be less than a threshold value.

16. The system of claim 15, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

17. The system of claim 14, wherein,

the coding block includes a plurality of sub-blocks; and

For encoding, the processor is further configured to skip encoding of context-encoded binary bits for each of the sub-blocks.

18. The system of claim 14, wherein the processor is further configured to: the value of the current interval length is set to 256 to apply the bypass bit alignment in the high throughput mode.

19. The system of claim 14, wherein the coding block is a transform block coded using conventional residual coding RRC.

20. The system of claim 19, wherein the residual encoded binary bits comprise encoded sub-block flags.

21. The system of claim 20, wherein the processor is further configured to: the application of the bypass bit alignment is invoked after encoding the last significant coefficient prefix and before encoding the encoded sub-block flag.

22. The system of claim 19, wherein the residual encoded binary bits further comprise a last significant coefficient prefix.

23. The system of claim 22, wherein the processor is further configured to: the application of bypass bit alignment is invoked before encoding the last significant coefficient prefix.

24. The system of claim 14, wherein the coding block is a transform skip block coded using a transform skip residual coding, TSRC.

25. The system of claim 24, wherein the residual encoded binary bits comprise at least one of encoded sub-block flags or coefficient symbol flags.

26. The system of claim 25, wherein the processor is further configured to: the application of bypass bit alignment is invoked prior to encoding the encoded sub-block flag.

27. A non-transitory computer readable medium storing instructions that, when executed by a processor, perform a process for encoding an image of a video, the image comprising an encoding block, the process comprising:

the quantization levels of the encoded blocks are encoded into a code stream in the high throughput mode.

28. A method for decoding an image of a video, the image comprising encoded blocks, the method comprising:

enabling a high throughput mode, wherein in the high throughput mode at least one residual code bit of the code block is changed from a context code bit to a bypass code bit and bypass bit alignment is applied;

in the high throughput mode, a processor decodes a code stream to obtain a quantization level for each position in the encoded block; and

the processor dequantizes a quantization level of the encoded block to generate coefficients for each position in the encoded block.

29. The method of claim 28, wherein the value of the remaining context-encoded binary bits is set to be less than a threshold value.

30. The method of claim 29, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

31. The method of claim 28, wherein,

the coding block includes a plurality of sub-blocks; and

skipping encoding of context-encoded binary bits for each of the sub-blocks.

32. The method of claim 28, further comprising: the value of the current interval length is set to 256 to apply the bypass bit alignment in the high throughput mode.

33. The method of claim 28, wherein the code stream is decoded by a shift operation after the application of the bypass bit alignment.

34. The method of claim 28, wherein the coding block is a transform block coded using conventional residual coding RRC.

35. The method of claim 34, wherein the residual encoded binary bits comprise encoded sub-block flags.

36. The method of claim 35, further comprising: the application of the bypass bit alignment is invoked after the last significant coefficient prefix and before the encoded sub-block flag.

37. The method of claim 34, wherein the residual encoded binary bits further comprise a last significant coefficient prefix.

38. The method of claim 37, further comprising: before the last significant coefficient prefix, invoking the bypass bit aligned application.

39. The method of claim 28, wherein the coding block is a transform skip block coded using a transform skip residual coding, TSRC.

40. A method as defined in claim 39, wherein the residual encoded binary bits comprise at least one of encoded sub-block flags or coefficient symbol flags.

41. The method of claim 40, further comprising: the bypass bit aligned application is invoked prior to the encoded sub-block flags.

42. A system for decoding an image of a video, the image comprising encoded blocks, the system comprising:

a memory configured to store instructions; and

decoding a code stream in the high throughput mode to obtain a quantization level for each position in the encoded block; and

dequantizing the quantization level of the encoded block to generate coefficients for each position in the encoded block.

43. A system according to claim 42 wherein the value of the remaining context-encoded binary bits is set to be less than a threshold value.

44. The system of claim 43, wherein the threshold is equal to 4 and the value of the remaining context-encoded binary bits is set to 0.

45. The system of claim 42, wherein,

the coding block includes a plurality of sub-blocks; and

skipping encoding of context-encoded binary bits for each of the sub-blocks.

46. The system of claim 42, wherein the processor is further configured to: the value of the current interval length is set to 256 to apply the bypass bit alignment in the high throughput mode.

47. The system of claim 42, wherein the code stream is decoded by a shift operation after the application of the bypass bit alignment.

48. A system according to claim 42 wherein the coding block is a transform block coded using conventional residual coding RRC.

49. The system of claim 48, wherein the residual encoded binary bits comprise encoded sub-block flags.

50. The system of claim 49, wherein the processor is further configured to: the application of the bypass bit alignment is invoked after the last significant coefficient prefix and before the encoded sub-block flag.

51. A system as defined in claim 48 wherein said residual encoded binary bits further comprise a last significant coefficient prefix.

52. The system of claim 51, wherein the processor is further configured to: before the last significant coefficient prefix, invoking the bypass bit aligned application.

53. A system according to claim 42 wherein the encoded block is a transform skip block encoded using a transform skip residual encoded TSRC.

54. A system as defined in claim 53, wherein the residual encoded binary bits comprise at least one of encoded sub-block flags or coefficient symbol flags.

55. The system of claim 54, wherein the processor is further configured to: the bypass bit aligned application is invoked prior to the encoded sub-block flags.

56. A non-transitory computer readable medium storing instructions that, when executed by a processor, perform a process for decoding an image of a video, the image comprising encoded blocks, the process comprising: