CN114556927A

CN114556927A - Entropy coding for video encoding and decoding

Info

Publication number: CN114556927A
Application number: CN202080055455.7A
Authority: CN
Inventors: 沈东圭; 朴时奈; 李钟石; 朴胜煜; 林和平
Original assignee: Hyundai Motor Co; Industry Academic Collaboration Foundation of Kwangwoon University; Kia Corp
Current assignee: Hyundai Motor Co; Industry Academic Collaboration Foundation of Kwangwoon University; Kia Corp
Priority date: 2019-08-06
Filing date: 2020-08-06
Publication date: 2022-05-27
Also published as: KR20210018140A

Abstract

The present embodiments provide methods for efficiently operating a buffer of binary data to limit the ratio of binary data to bits in entropy encoding and decoding associated with the generation and parsing of a bitstream. Further, a method for constructing a list including various entropy encoding/decoding methods and adaptively utilizing the entropy encoding/decoding method for each basic entropy encoding/decoding unit is provided.

Description

Entropy coding for video encoding and decoding

Technical Field

The present invention relates to encoding and decoding of video, and more particularly, to a method of efficiently operating a binary data buffer (bin buffer) and adaptively utilizing various encoding/decoding methods in order to efficiently perform entropy encoding and decoding.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Since the amount of video data is larger than the amount of voice data or the amount of still image data, a large amount of hardware resources (including memory) are required to store or transmit the video data without compression processing.

Accordingly, when storing or transmitting video data, the video data is typically compressed by an encoder for storage or transmission. Then, the decoder receives the compressed video data, decompresses and reproduces the video data. Compression techniques for such Video include H.264/AVC and High Efficiency Video Coding (HEVC), which is an improvement of approximately 40% over the Coding Efficiency of H.264/AVC.

However, video size, resolution, and frame rate are gradually increasing, and accordingly, the amount of data to be encoded is also increasing. Therefore, a new compression technique having better coding efficiency and higher picture quality than the existing compression technique is required.

In video coding, entropy coding is used to form a bitstream in which quantized transform coefficients, information on quantization parameters, intra prediction or inter prediction information according to a prediction type, information on block division, and the like are compressed. In addition, in video decoding, entropy decoding is used to parse the above information from a bitstream.

It is desirable to provide an efficient entropy encoding/decoding method.

Disclosure of Invention

Technical problem

The present invention provides a method of efficiently operating a binary data buffer by limiting a binary-to-bit ratio (bin-to-bit ratio) in entropy encoding and decoding related to generation and parsing of a bitstream. Further, the present invention provides a method of configuring a list including various entropy encoding/decoding methods and adaptively utilizing the entropy encoding/decoding method for each basic unit of entropy encoding/decoding.

Technical scheme

According to one aspect of the present invention, a method for entropy decoding is performed by a video decoding device, the method comprising: receiving a bitstream formed by encoding an image; performing an arithmetic decoding process to generate at least one binary string (bin string) by decoding a bitstream, each binary string including at least one binary data (bin); and generating a syntax element by inversely binarizing the binary string, wherein a number of binary data generated by decoding the bitstream satisfies a constraint that the number does not exceed a threshold, wherein the threshold is variably set according to a level or a level of the picture.

According to another aspect of the present invention, there is provided a method for entropy-encoding a syntax element generated according to predictive coding of each block constituting an image, the method including: binarizing each syntax element and generating at least one binary string, each binary string comprising at least one binary data; performing an arithmetic encoding process to generate encoded data from the binary string; and generating a bitstream configured with one or more Network Abstraction Layer (NAL) units from the encoded data, wherein a number of binary data is limited to not exceed a threshold value with respect to a length of the one or more NAL units, wherein the threshold value is variably set according to a level or a level of the picture.

According to another aspect of the present invention, there is provided an apparatus for entropy decoding, the apparatus including: an arithmetic decoder configured to receive a bitstream formed by encoding an image and perform an arithmetic decoding process to generate at least one binary string by decoding the bitstream, each binary string including at least one binary data; the inverse binarizer is configured to generate a syntax element by inversely binarizing the binary string, wherein a number of binary data generated by decoding the bitstream satisfies a constraint that the number does not exceed a threshold value, wherein the threshold value is variably set according to a level or a level of the picture.

Advantageous effects

As described above, according to the present invention, in entropy encoding and decoding related to generation and parsing of a bitstream, a method of efficiently operating a binary data buffer can be provided, thereby enabling a limitation of a binary data-to-bit ratio.

Further, according to the present invention, in entropy encoding and decoding related to generation and parsing of a bitstream, there is provided a method of configuring a list including various entropy encoding/decoding methods and adaptively utilizing the entropy encoding/decoding method for each basic unit of entropy encoding/decoding. Accordingly, entropy encoding/decoding may be performed according to applications and signal characteristics.

Drawings

FIG. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure.

FIG. 2 illustrates an example block partition structure that utilizes a QTBTTT structure.

Fig. 3 exemplarily illustrates a plurality of intra prediction modes including a wide-angle intra prediction mode.

Fig. 4 exemplarily shows neighboring blocks around the current block.

Fig. 5 is an exemplary block diagram of a video decoding device capable of implementing the techniques of this disclosure.

Fig. 6 is a flowchart of a method for entropy decoding according to an embodiment of the present invention.

Fig. 7 is a flow diagram of an adaptive binary arithmetic decoding process according to an embodiment of the present invention.

Fig. 8 is an exemplary diagram of a basic unit of entropy coding including a coding block.

Fig. 9 is a flowchart of a method for entropy decoding according to another embodiment of the present invention.

Fig. 10 is a flowchart of a binary arithmetic decoding process according to another embodiment of the present invention.

Detailed Description

Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that, when a reference numeral is added to a constituent element in each drawing, the same reference numeral also denotes the same element although the element is shown in different drawings. Further, in the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted to avoid obscuring the subject matter of the embodiments.

FIG. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure. Hereinafter, a video encoding apparatus and elements of the apparatus will be described with reference to fig. 1.

The video encoding device includes: the image divider 110, the predictor 120, the subtractor 130, the transformer 140, the quantizer 145, the rearrangement unit 150, the entropy encoder 155, the inverse quantizer 160, the inverse transformer 165, the adder 170, the loop filtering unit 180, and the memory 190.

Each element of the video encoding apparatus may be implemented in hardware or software, or a combination of hardware and software. The functions of the respective elements may be implemented as software, and the microprocessor may be implemented to implement the software functions corresponding to the respective elements.

A video is composed of one or more sequences including a plurality of images. Each image is divided into a plurality of regions, and encoding is performed on each region. For example, an image is segmented into one or more tiles (tiles) or/and slices (slices). One or more tiles may be defined as a tile group. Each tile or slice is partitioned into one or more Coding Tree Units (CTUs). Each CTU is divided into one or more Coding Units (CUs) by a tree structure. Information applied to each CU is encoded as syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as syntax of the CTU. In addition, information commonly applied to all blocks in one slice is encoded as syntax of a slice header, and information applied to all blocks constituting one or more pictures is encoded in a Picture Parameter Set (PPS) or a Picture header. In addition, information commonly referenced by one or more SPS's is encoded in a Video Parameter Set (VPS). Further, information commonly referred to by a plurality of pictures is encoded in a Sequence Parameter Set (SPS). In addition, information commonly applied to one tile or tile group may be encoded as syntax of a tile header or tile group header. The syntax included in the SPS, PPS, slice header, and tile header or tile group header may be referred to as a high level syntax.

The picture partitioner 110 determines the size of the Coding Tree Unit (CTU). Information on the size of the CTU (CTU size) is encoded into the syntax of the SPS or PPS and transmitted to the video decoding apparatus.

The image divider 110 divides each image constituting the video into a plurality of CTUs having a predetermined size, and then recursively divides the CTUs using a tree structure. In the tree structure, leaf nodes serve as Coding Units (CUs), which are basic units of coding.

The tree structure may be a QuadTree (QT), a Binary Tree (BT), i.e., a node (or parent node) divided into four slave nodes (or child nodes) of the same size, a Ternary Tree (TT), i.e., a node divided into two slave nodes, or a structure formed by a combination of two or more QT structures, BT structures, and TT structures, and the Ternary Tree (TT), i.e., a node divided into three slave nodes at a ratio of 1:2: 1. For example, a QuadTree plus binary tree (QTBT) structure may be utilized, or a QuadTree plus binary tree (QTBTTT) structure may be utilized. BTTT can be collectively referred to as multiple-type tree (MTT).

FIG. 2 is a schematic diagram illustrating a method for partitioning blocks using a QTBTTT structure.

As shown in fig. 2, the CTU may be first partitioned into QT structures. The QT split may be repeated until the size of the split block reaches the minimum block size MinQTSize of the leaf nodes allowed in QT. A first flag (QT _ split _ flag) indicating whether each node of the QT structure is divided into four nodes of a lower layer is encoded by the entropy encoder 155 and signaled to the video decoding apparatus. When the leaf node of QT is not larger than the maximum block size of the root node allowed in BT (MaxBTSize), it may be further partitioned into one or more BT structures or TT structures. The BT structure and/or the TT structure may have a plurality of splitting directions. For example, there may be two directions, i.e., a direction of dividing a block of a node horizontally and a direction of dividing a block vertically. As shown in fig. 2, when MTT segmentation starts, a second flag indicating whether a node is segmented (MTT _ split _ flag), a flag indicating a segmentation direction (vertical or horizontal) in the case of segmentation, and/or a flag indicating a segmentation type (binary or trifurcate) are encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

Alternatively, a CU partition flag (split _ CU _ flag) indicating whether a node is divided may be encoded before encoding a first flag (QT _ split _ flag) indicating whether each node is divided into 4 nodes of a lower layer. When the value of the CU partition flag (split _ CU _ flag) indicates that no partition is performed, the block of the node becomes a leaf node in the partition tree structure and serves as a Coding Unit (CU), which is an encoded basic unit. When the value of the CU partition flag (split _ CU _ flag) indicates that the partition is performed, the video encoding apparatus starts encoding the flag from the first flag in the above-described manner.

When using QTBT as another example of the tree structure, there may be two types of partitioning, i.e., a type of partitioning a block horizontally into two blocks of the same size (i.e., symmetric horizontal partitioning) and a type of partitioning a block vertically into two blocks of the same size (i.e., symmetric vertical partitioning). A partition flag (split _ flag) indicating whether each node of the BT structure is partitioned into blocks of a lower layer and partition type information indicating a partition type are encoded by the entropy encoder 155 and transmitted to the video decoding apparatus. There may be additional types of partitioning a block of nodes into two asymmetric blocks. The asymmetric division type may include a type in which a block is divided into two rectangular blocks at a size ratio of 1:3, or a type in which a block of a node is divided diagonally.

CUs may have various sizes according to QTBT or QTBTTT partitioning of CTUs. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of the QTBTTT) is referred to as a "current block". When QTBTTT partitioning is employed, the shape of the current block may be square or rectangular.

The predictor 120 predicts the current block to generate a prediction block. The predictor 120 includes an intra predictor 122 and an inter predictor 124.

In general, each current block in a picture may be predictively encoded, respectively. In general, prediction of a current block may be performed using an intra prediction technique, which uses data from an image including the current block, or an inter prediction technique, which uses data of an image encoded before the image including the current block. Inter prediction includes unidirectional prediction and bidirectional prediction.

The intra predictor 122 predicts pixels in the current block using pixels (reference pixels) located around the current block in a current picture including the current block. Depending on the prediction direction, there are multiple intra prediction modes. For example, as shown in fig. 3a, the plurality of intra prediction modes may include 2 non-directional modes and 65 directional modes, and the 2 non-directional modes include a plane (planar) mode and a DC mode. The adjacent pixels and equations to be used are defined differently for each prediction mode.

For efficient directional prediction of the rectangular-shaped current block, directional modes (intra prediction modes 67 to 80 and-1 to-14) indicated by dotted arrows in fig. 3b may be additionally used. These modes may be referred to as "wide angle intra-prediction modes". In fig. 3b, the arrows indicate the respective reference samples used for prediction, rather than the prediction direction. The prediction direction is opposite to the direction indicated by the arrow. The wide-angle intra prediction mode is a mode in which prediction is performed in a direction opposite to a specific direction mode without additional bit transmission when the current block has a rectangular shape. In this case, in the wide-angle intra prediction mode, some wide-angle intra prediction modes available for the current block may be determined based on a ratio of a width to a height of the rectangular current block. For example, when the height of the rectangular shape of the current block is smaller than the width, wide-angle intra prediction modes (intra prediction modes 67 to 80) having angles smaller than 45 degrees may be utilized. When the width of the rectangular shape of the current block is greater than the height, a wide-angle intra prediction mode (intra prediction modes-1 to-14) having an angle greater than-135 degrees may be utilized.

The intra predictor 122 may determine an intra prediction mode to be used when encoding the current block. In some examples, the intra predictor 122 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, the intra predictor 122 may calculate a rate-distortion value using a rate-distortion (rate-distortion) analysis of several tested intra prediction modes, and may select an intra prediction mode having the best rate-distortion characteristic among the tested modes.

The intra predictor 122 selects one intra prediction mode from among a plurality of intra prediction modes, and predicts the current block using neighboring pixels (reference pixels) determined according to the selected intra prediction mode and an equation. The information on the selected intra prediction mode is encoded by the entropy encoder 155 and transmitted to the video decoding apparatus.

The inter predictor 124 generates a prediction block of the current block through motion compensation. The inter predictor 124 searches for a block most similar to the current block in a reference picture that has been encoded and decoded earlier than the current picture, and generates a prediction block of the current block using the searched block. Then, the inter predictor generates a Motion Vector (MV) corresponding to a displacement (displacement) between the current block in the current picture and the prediction block in the reference picture. In general, motion estimation is performed on a luminance (luma) component, and a motion vector calculated based on the luminance component is used for both the luminance component and the chrominance component. The motion information including information on the reference picture and information on the motion vector for predicting the current block is encoded by the entropy encoder 155 and transmitted to the video decoding apparatus.

The inter predictor 124 may perform interpolation on a reference picture or a reference block to increase prediction accuracy. That is, the sub-samples are interpolated between two consecutive integer samples by applying filter coefficients to the plurality of consecutive integer samples including the two integer samples. When an operation of searching for a block most similar to the current block is performed on the interpolated reference image, the motion vector may be expressed at a precision level in fractional sample units, not at a precision level in integer sample units. The precision or resolution of the motion vector may be set differently for each target region to be encoded, e.g., each unit such as a slice, a tile, a CTU, or a CU. When applying such an Adaptive Motion Vector Resolution (AMVR), information on the motion vector resolution to be applied to each target area should be signaled for each target area. For example, when the target area is a CU, information on the motion vector resolution applied to each CU is signaled. The information on the resolution of the motion vector may be information indicating the accuracy of the motion vector difference, which will be described later.

The inter predictor 124 may perform inter prediction using bi-directional prediction. In bi-directional prediction, the inter predictor 124 uses two reference pictures and two motion vectors indicating the positions of blocks most similar to the current block in the respective reference pictures. The inter predictor 124 selects a first reference picture and a second reference picture from a reference picture list 0(RefPicList0) and a reference picture list 1(RefPicList1), respectively, searches respective reference pictures for blocks similar to the current block, and generates a first reference block and a second reference block. Then, the inter predictor 124 generates a prediction block for the current block by averaging or weighting the first reference block and the second reference block. Then, the inter predictor 124 transmits motion information including information on two reference pictures and two motion vectors used to predict the current block to the encoder 150. RefPicList0 may be composed of pictures that precede the current picture in display order in the reconstructed picture, and RefPicList1 may be composed of pictures that follow the current picture in display order in the reconstructed picture. However, the embodiments are not limited thereto. Previously reconstructed pictures that follow the current picture in display order may be further included in RefPicList0, and conversely, previously reconstructed pictures that precede the current picture may be further included in RefPicList 1.

Various methods may be utilized to minimize the number of bits required to encode the motion information.

For example, when the reference picture and the motion vector of the current block are identical to those of the neighboring blocks, the motion information on the current block may be transmitted to the video decoding apparatus through encoding information for identifying the neighboring blocks. This method is called "merge mode".

In the merge mode, the inter predictor 124 selects a predetermined number of merge candidate blocks (hereinafter, referred to as "merge candidates") from neighboring blocks of the current block.

As shown in fig. 4, all or part of the left block L, the upper block a, the upper right block AR, the lower left block BL, and the upper left block AL adjacent to the current block in the current picture may be used as the adjacent blocks for deriving the merge candidate. Also, a block located within a reference picture (which may be the same as or different from a reference picture used to predict the current block) may be used as a merge candidate in addition to the current picture in which the current block is located. For example, a co-located block (co-located block) at the same position as the current block in the reference picture or a block adjacent to the co-located block may be additionally used as a merging candidate.

The interframe predictor 124 configures a merge list including a predetermined number of merge candidates using such neighboring blocks. The inter predictor 124 selects a merge candidate to be used as motion information on the current block from among merge candidates included in the merge list, and generates merge index information for identifying the selected candidate. The generated merge index information is encoded by the encoder 155 and transmitted to the video decoding apparatus.

Another method of encoding motion information is the Advanced Motion Vector Prediction (AMVP) mode.

In the AMVP mode, the inter predictor 124 derives a motion vector candidate for prediction of a motion vector of the current block using neighboring blocks of the current block. In the current picture in fig. 2, all or part of a left-side block L, an upper-side block a, an upper-right-side block AR, a lower-left-side block BL, and an upper-left-side block AL adjacent to the current block may be used as the adjacent blocks for deriving the predicted motion vector candidates. Also, in addition to a current picture including a current block, a block located within a reference picture (which may be the same as or different from a reference picture used to predict the current block) may be used as a neighboring block for deriving a predicted motion vector candidate. For example, a co-located block at the same position as the current block in the reference picture or a block adjacent to the co-located block may be utilized.

The inter predictor 124 derives a predicted motion vector candidate using motion vectors of neighboring blocks, and determines a predicted motion vector for the motion vector of the current block using the predicted motion vector candidate. Then, a motion vector difference is calculated by subtracting the predicted motion vector from the motion vector of the current block.

The predicted motion vector may be obtained by applying a predefined function (e.g., a function for calculating a median, an average, etc.) to the predicted motion vector candidate. In this case, the video decoding apparatus also knows the predefined function. Since the neighboring blocks used to derive the predicted motion vector candidates have already been encoded and decoded, the video decoding apparatus also already knows the motion vectors of the neighboring blocks. Accordingly, the video encoding apparatus does not need to encode information for identifying a predicted motion vector candidate. Therefore, in this case, information on a motion vector difference and information on a reference picture used to predict the current block are encoded.

The predicted motion vector may be determined by selecting any one of the predicted motion vector candidates. In this case, the information for identifying the selected predicted motion vector candidate is further encoded together with information on a motion vector difference to be used for predicting the current block and information on a reference picture.

The subtractor 130 subtracts the prediction block generated by the intra predictor 122 or the inter predictor 124 from the current block to generate a residual block.

The transformer 140 transforms a residual signal in a residual block having a pixel value in a spatial domain into a transform coefficient in a frequency domain. The transformer 140 may transform the residual signal in the residual block using the entire size of the residual block as a transform unit. Alternatively, the residual block may be divided into a plurality of sub-blocks, and the transform is performed using the sub-blocks as transform units. Alternatively, the residual signal may be transformed by dividing the block into two sub-blocks, i.e., a transform region and a non-transform region, and using only the transform region sub-blocks as a transform unit. The transform region subblock may be one of two rectangular blocks having a size ratio of 1:1 based on a horizontal axis (or a vertical axis). In this case, a flag (cu _ sbt _ flag) indicating that only the sub-block is transformed, direction (vertical/horizontal) information (cu _ sbt _ horizontal _ flag), and/or position information (cu _ sbt _ pos _ flag) are encoded by the entropy encoding unit 155 and signaled to the video decoding apparatus. In addition, the size of the transform region subblock may have a size ratio of 1:3 based on a horizontal axis (or a vertical axis). In this case, a flag (cu _ sbt _ quad _ flag) distinguishing the corresponding partitions is additionally encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

The transformer 140 may transform the residual block separately in the horizontal direction and the vertical direction. For the transformation, various types of transformation functions or transformation matrices may be utilized. For example, a pair-wise transformation function for the horizontal transformation and the vertical transformation may be defined as a Multiple Transform Set (MTS). The transformer 140 may select a pair of transformation functions having the best transformation efficiency in the MTS and transform the residual block in the horizontal direction and the vertical direction, respectively. The information (MTS _ idx) on the pair of transform functions selected in the MTS is encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

The quantizer 145 quantizes the transform coefficient output from the transformer 140 using the quantization parameter, and outputs the quantized transform coefficient to the entropy encoder 155. For some blocks or frames, the quantizer 145 may quantize the associated residual block directly without transformation. The quantizer 145 may apply different quantization coefficients (scaling values) according to the positions of transform coefficients in the transform block. A matrix of quantized coefficients that applies to the quantized transform coefficients arranged two-dimensionally may be encoded and signaled to a video decoding apparatus.

The rearranging unit 150 may reclassify the coefficient values of the quantized residual values.

The rearranging unit 150 may change the 2-dimensional coefficient array into the 1-dimensional coefficient sequence by coefficient scanning (coefficient scanning). For example, the rearranging unit 150 may scan coefficients from DC coefficients to coefficients in a high frequency region using zigzag scanning (zig-zag scan) or diagonal scanning (diagonalscan) to output a 1-dimensional coefficient sequence. Depending on the size of the transform unit and the intra prediction mode, the zigzag scan may be replaced by a vertical scan, i.e. scanning the two-dimensional coefficient array in the column direction, or a horizontal scan, i.e. scanning the coefficients in the shape of a two-dimensional block in the row direction. That is, the scan mode to be used may be determined in zigzag scanning, diagonal scanning, vertical scanning, and horizontal scanning according to the size of the transform unit and the intra prediction mode.

The entropy encoder 155 encodes the one-dimensionally quantized transform coefficients output from the rearranging unit 150 using various encoding techniques such as Context-based Adaptive Binary Arithmetic Code (CABAC) and exponential Golomb (exponential Golomb) to generate a bitstream.

The entropy encoder 155 encodes information related to block division (e.g., CTU size, CU division flag, QT division flag, MTT division type, and MTT division direction) so that the video decoding apparatus can divide blocks in the same manner as the video encoding apparatus. In addition, the entropy encoder 155 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and encodes intra prediction information (i.e., information on an intra prediction mode) or inter prediction information (a merge index for a merge mode, information on a reference picture index for an AMVP mode and a motion vector difference) according to the prediction type. In addition, the entropy encoder 155 encodes quantization-related information (i.e., information on a quantization parameter and information on a quantization matrix).

The inverse quantizer 160 inversely quantizes the quantized transform coefficient output from the quantizer 145 to generate a transform coefficient. The inverse transformer 165 transforms the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain and reconstructs a residual block.

The adder 170 adds the reconstructed residual block and the prediction block generated by the predictor 120 to reconstruct the current block. The pixels in the reconstructed current block are used as reference pixels when performing intra prediction of a subsequent block.

The loop filtering unit 180 filters the reconstructed pixels to reduce block artifacts (blocking artifacts), ringing artifacts (ringing artifacts), and blurring artifacts (blurring artifacts) generated due to block-based prediction and transform/quantization. The loop filtering unit 180 may include at least one of a deblocking filter 182, a Sample Adaptive Offset (SAO) filter 184, and an Adaptive Loop Filter (ALF) 186.

The deblocking filter 182 filters boundaries between reconstructed blocks to remove block artifacts caused by block-wise encoding/decoding, and the SAO filter 184 performs additional filtering on the deblock filtered video. The SAO filter 184 is a filter for compensating a difference between reconstructed samples and original samples caused by lossy coding (lossy coding), and performs filtering in such a manner that a corresponding offset is added to each reconstructed sample. ALF 186 performs filtering on a target sample to be filtered by applying filter coefficients to the target sample and neighboring samples of the target sample. The ALF 186 may divide samples included in an image into predetermined groups and then determine one filter to be applied to the corresponding group to differentially perform filtering for each group. Information about the filter coefficients to be used for the ALF may be encoded and signaled to a video decoding apparatus.

The reconstructed block filtered by the loop filtering unit 180 is stored in the memory 190. Once all blocks in a picture are reconstructed, the reconstructed picture can be used as a reference picture for inter-predicting blocks in subsequent pictures to be encoded.

Fig. 5 is an exemplary functional block diagram of a video decoding device capable of implementing the techniques of this disclosure. Hereinafter, a video decoding apparatus and elements of the apparatus will be described with reference to fig. 5.

The video decoding apparatus may include: an entropy decoder 510, a reordering unit 515, an inverse quantizer 520, an inverse transformer 530, a predictor 540, an adder 550, a loop filtering unit 560, and a memory 570.

Similar to the video encoding apparatus of fig. 1, each element of the video decoding apparatus may be implemented in hardware, software, or a combination of hardware and software. Further, the function of each element may be implemented in software, and the microprocessor may be implemented to perform the software function corresponding to each element.

The entropy decoder 510 determines a current block to be decoded by decoding a bitstream generated by a video encoding apparatus and extracting information related to block division, and extracts prediction information required to reconstruct the current block, information regarding a residual signal, and the like.

The entropy decoder 510 extracts information on the CTU size from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), determines the size of the CTU, and partitions the picture into CTUs of the determined size. Then, the decoder determines the CTU as the highest layer (i.e., root node) of the tree structure and extracts partitioning information about the CTU to partition the CTU using the tree structure.

For example, when a CTU is divided using a QTBTTT structure, a first flag (QT _ split _ flag) related to the division of QT is extracted to divide each node into four nodes of a sub-layer. For nodes corresponding to leaf nodes of the QT, a second flag (MTT _ split _ flag) related to the splitting of the MTT and information on the splitting direction (vertical/horizontal) and/or the splitting type (binary/trifurcate) are extracted, thereby splitting the corresponding leaf nodes in the MTT structure. Thus, each node below the leaf node of the QT is recursively split in BT or TT structure.

As another example, when a CTU is divided using a QTBTTT structure, a CU division flag (split _ CU _ flag) indicating whether or not to divide a CU may be extracted. When the corresponding block is divided, a first flag (QT _ split _ flag) may be extracted. In a split operation, after zero or more recursive QT splits, zero or more recursive MTT splits may occur per node. For example, a CTU may undergo MTT segmentation directly without QT segmentation, or only multiple times.

As another example, when a CTU is divided using a QTBT structure, a first flag (QT _ split _ flag) related to QT division is extracted, and each node is divided into four nodes of a lower layer. Then, a partition flag (split _ flag) indicating whether or not a node corresponding to a leaf node of the QT is further partitioned with BT and partition direction information are extracted.

Once the current block to be decoded is determined through tree structure division, the entropy decoder 510 extracts information on a prediction type indicating whether the current block is intra-predicted or inter-predicted. When the prediction type information indicates intra prediction, the entropy decoder 510 extracts a syntax element of intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the entropy decoder 510 extracts syntax elements for the inter prediction information, that is, information indicating a motion vector and a reference picture referred to by the motion vector.

The entropy decoder 510 also extracts information regarding transform coefficients of the quantized current block as quantization-related information and information regarding a residual signal.

The rearranging unit 515 may change the sequence of one-dimensional quantized transform coefficients entropy-decoded by the entropy decoder 510 into a 2-dimensional coefficient array (i.e., a block) in the reverse order of the coefficient scanning performed by the video encoding apparatus.

The inverse quantizer 520 inversely quantizes the quantized transform coefficients and inversely quantizes the quantized transform coefficients using the quantization parameter. The inverse quantizer 520 may apply different quantization coefficients (scaling values) to the quantized transform coefficients arranged in two dimensions. The inverse quantizer 520 may perform inverse quantization by applying a quantization coefficient (scaling value) matrix from the video encoding apparatus to an array of 2-dimensionally quantized transform coefficients.

The inverse transformer 530 inverse-transforms the inverse-quantized transform coefficients from the frequency domain to the spatial domain to reconstruct a residual signal, thereby generating a residual block of the current block.

In addition, when the inverse transformer 530 inversely transforms only a partial region (sub-block) of the transform block, the inverse transform unit 530 extracts a flag (cu _ sbt _ flag) indicating that only the sub-block of the transform block has been transformed, directivity (vertical/horizontal) information (cu _ sbt _ horizontal _ flag) regarding the sub-block, and/or position information (cu _ sbt _ pos _ flag) regarding the sub-block, and inversely transforms the transform coefficients of the sub-block from the frequency domain to the spatial domain. Then, the residual signal is reconstructed and the non-inverse-transformed region is filled with a '0' value as a residual block, thereby generating a final residual block of the current block.

In addition, when the MTS is applied, the inverse transformer 530 determines a transform function or a transform matrix to be applied in the horizontal direction and the vertical direction, respectively, using MTS information (MTS _ idx) signaled from the video encoding apparatus, and inversely transforms transform coefficients in the transform block in the horizontal direction and the vertical direction using the determined transform function.

The predictor 540 may include an intra predictor 542 and an inter predictor 544. The intra predictor 542 is activated when the prediction type of the current block is intra prediction, and the inter predictor 544 is activated when the prediction type of the current block is inter prediction.

The intra predictor 542 determines an intra prediction mode of the current block among a plurality of intra prediction modes based on syntax elements of the intra prediction mode extracted from the entropy decoder 510, and predicts the current block using reference pixels surrounding the current block according to the intra prediction mode.

The inter predictor 544 determines a motion vector of the current block and a reference picture referred to by the motion vector using syntax elements of the intra prediction mode extracted from the entropy decoder 510 and predicts the current block based on the motion vector and the reference picture.

The adder 550 reconstructs the current block by adding the residual block output from the inverse transformer 530 to the prediction block output from the inter predictor 544 or the intra predictor 542. When intra-predicting a block to be subsequently decoded, pixels in the reconstructed current block are used as reference pixels.

Loop filtering unit 560 may include a deblocking filter 562, SAO filter 564, and ALF 566. The deblocking filter 562 deblock filters boundaries between reconstructed blocks to remove block artifacts caused by block-by-block decoding. The SAO filter 564 performs additional filtering on the reconstructed block after the deblocking filtering to compensate for a difference between the reconstructed pixel and the original pixel caused by the lossy coding. ALF 566 performs filtering on a target sample to be filtered by applying filter coefficients to the target sample and its neighboring samples. ALF 566 may divide samples in an image into predetermined groups and then determine one filter to apply to the respective group to differentially perform filtering for each group. The filter coefficients of the ALF are determined based on information about filter coefficients decoded from the bitstream.

The reconstructed block filtered by the loop filtering unit 560 is stored in the memory 570. When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of blocks in pictures to be subsequently encoded.

The present invention is directed to encoding and decoding of video as described above. More particularly, the present invention provides a method of efficiently operating a binary data buffer by limiting a ratio of binary data to bits in entropy encoding and decoding related to bitstream generation and parsing. Further, the present invention provides a method of configuring a list including various entropy decoding methods and adaptively utilizing the entropy encoding/decoding method for each basic unit of entropy encoding/decoding.

The video coding device utilizes the above coding operations to generate a bitstream comprised of one or more consecutive Network Abstraction Layer (NAL) units. The NAL unit includes a NAL unit header and a Raw Byte Sequence Payload (RBSP), and the RBSP includes a String Of Data Bits (SODB). The SODB corresponds to encoded video data or video parameters.

NAL units can be classified into a Video Coding Layer (VCL) NAL type or a non-VCL type. Parameters for video decoding may be included in non-VCL type NAL units in the coded state. In addition, data related to a video signal may be included in a NAL unit of a VCL type in a coded state.

The NAL units are generated by an entropy encoder 155 in the video coding device. Specifically, a context-based adaptive binary arithmetic coding (CABAC) algorithm is used as an arithmetic decoding process to generate a coded video signal included in a NAL unit of the VCL type. The CABAC algorithm is a method of binary arithmetic coding a symbol using previously encoded/decoded statistical information. A symbol is a value represented as a binary number having a value of 0 or 1. One binary data in a binary string formed by binarization of zero or a larger number may be a symbol. The symbol itself may represent a syntax element required for video decoding. The context of the CABAC algorithm refers to statistical information on previously encoded/decoded symbols and is used to improve prediction performance on following symbols. The compression ratio, which is the encoding performance of the CABAC algorithm, depends on the context modeling method. As the probability of the Most Probable Symbol (MPS) of the context model increases, the compression rate may be improved.

Other entropy coding schemes are used to generate the encoded parameters included in the non-VCL type NAL units or the VCL type NAL units, but further detailed description thereof will be omitted because other entropy coding schemes are not relevant to the scope of the present invention.

In the following, various methods are proposed to maintain constraints on the ratio of binary data to bits in order to efficiently manage the binary data buffer.

Fig. 6 is a flowchart of a method for entropy decoding according to an embodiment of the present invention. Although the various methods S602, S604, and S606 for maintaining the constraint on the ratio of binary data to bits are sequentially illustrated in fig. 6, the present invention is not necessarily limited thereto, and one or more methods may be used to maintain the constraint on the ratio of binary data to bits.

The entropy decoder 510 of the video decoding apparatus according to the present invention can decode the VCL type NAL unit based on an arithmetic decoding process.

The entropy decoder 510 checks the type of NAL unit from the bitstream (S600). The type of NAL unit can be identified by parsing the NAL unit header and the following process continues for NAL units of VCL type.

The entropy decoder 510 checks the number of zero words (zero _ word) (S602).

The entropy decoder 510 checks the number of zero words included in the VCL type NAL unit in order to efficiently manage the binary data buffer. The binary data buffer may include a buffer for storing the binary string and a buffer used by the entropy decoder 510 in the decoding process.

In order for the entropy decoder 510 to efficiently manage binary data buffers to cope with buffer overflows/underruns, the entropy encoder 155 of the video coding apparatus may determine the number of zero _ words and add one or more zero _ words of a predefined length when generating VCL type NAL units. The number of zero _ words may be adaptively determined according to at least one of a length of a NAL unit, a length of a binary string, horizontal and vertical size information on a picture, a bit depth, color space sampling information, a Temporal Identifier (TID), a Quantization Parameter (QP), and the like. That is, when the ratio of the number (or length) of binary data to the number (or length) of generated bits exceeds a threshold, the entropy encoder 155 generates NAL units by inserting one or more zero words of a predefined length to maintain the constraint on the ratio of binary data to bits. The threshold value may be determined according to at least one of the length (number of bits) of the NAL unit, horizontal and vertical size information of the picture, bit depth, color space sampling information (sampling ratio between luminance component and chrominance component), TID, QP, and the like. Alternatively, as described below, the threshold may be variably determined according to the level or grade of the picture.

In order to improve the efficiency of the encoding/decoding process, the entropy encoder 155 may perform video encoding using at least one of a method of omitting a zero _ word or a method of adaptively determining the length of the zero _ word. The entropy decoder 510 checks whether or not zero _ word and the number of zero _ word are omitted in the NAL unit, and generates a bitstream for performing entropy decoding by removing the zero _ word from the NAL unit. Further, the size of the binary data buffer for applying the arithmetic decoding process to the bit stream can be set with reference to the number of zero _ words. The entropy decoder 510 may adaptively maintain and manage the buffer according to the set size of the binary data buffer. The entropy decoder 510 generates a binary string by applying an arithmetic decoding process to a bitstream and then stores it in a binary data buffer. The arithmetic decoding process performed by the video decoding apparatus may include a context-based adaptive binary arithmetic decoding (hereinafter referred to as "adaptive binary arithmetic decoding") process, and/or a uniform probability-based binary arithmetic decoding process as a bypass mode.

In some embodiments, the constraint on the ratio of binary data to bits may be implemented using a method that selectively skips the arithmetic decoding process.

The entropy decoder 510 determines whether to skip the arithmetic decoding process based on the skip _ flag (S604).

The entropy decoder 510 may determine whether to perform adaptive binary arithmetic decoding or binary arithmetic decoding on the bitstream according to a preset order. To make the determination, a skip _ flag indicating whether to skip the arithmetic decoding process may be signaled at a high level. The high level may be SPS, PPS, CTU, VPDU, shared unit, etc. prior to the decoding time of the current unit. Alternatively, the high level may be determined based on skip _ flag of one or more units spatially, temporally, or hierarchically adjacent to the current unit. Alternatively, the high level may be determined based on a skip _ flag of a unit having a different color space at the same position as the current unit. Alternatively, a syntax element for skipping the application of the arithmetic decoding process may be determined based on the previously acquired statistical information. Alternatively, according to the protocol between the video encoding apparatus and the video decoding apparatus, the application of the arithmetic decoding process may be skipped for some syntax elements. For reference, when the skip _ flag is signaled, the video decoding apparatus may generate the binary string directly from the bitstream without applying the arithmetic decoding process to the bitstream.

In some other embodiments, the constraint on the ratio of binary data to bits may be achieved by a method that adaptively switches between adaptive binary arithmetic decoding and uniform probability based binary arithmetic decoding.

The entropy decoder 510 determines whether to skip the adaptive binary arithmetic decoding through a comparison between the length of the previously decoded binary string and a threshold value (S606).

When the length of the decoded binary string exceeds a preset threshold, the entropy decoder 510 skips the adaptive binary arithmetic decoding and performs the binary arithmetic decoding.

Alternatively, adaptive switching between adaptive binary arithmetic decoding and uniform probability based binary arithmetic decoding may be implemented using counters. For example, when the ratio of binary data to bits is kept at N/M, reading one bit increases the counter value by N (where N is a natural number), and decoding one binary data decreases the counter value by M (where M is a natural number). For example, when the value of the counter becomes greater than N, adaptive binary arithmetic decoding may be performed. In the opposite case, binary arithmetic decoding as the bypass mode may be performed. The counter may be calculated and initialized based on at least one or more pieces of information of picture/slice/tile/brick/CTU size, bit depth, color space sampling, and the like.

The thresholds (or, N and M) may be transmitted from the video encoding device by one of SPS/PPS/slice/CTU/VPDU/CU units. Alternatively, the threshold (or N and M) may be set for each layer unit according to a protocol between the video encoding apparatus and the video decoding apparatus. Alternatively, the threshold (or N and M) may be determined by the video decoding device based on image/slice/tile/CTU size, bit depth, color space sampling, prediction mode, transform mode, intra prediction mode, inter prediction mode, and so on. Further, the threshold (or, N and M) may be determined in consideration of the hierarchy and/or level of the picture. That is, the threshold (or, M and N) may be variably set according to the hierarchy and/or level of the picture. Alternatively, the ratio between N and M in the current cell may be calculated based on N and M of one or more cells around the current cell or a ratio thereof. Alternatively, N and M for one or more previous units may be stored and managed in a first-in first-out (FIFO) type lookup table, and N and M corresponding to an index transmitted from the video encoding apparatus may be used by the video decoding apparatus.

In another embodiment of the present invention, when the threshold or the counter is not available, the entropy decoder 510 may determine whether to skip the adaptive binary arithmetic decoding according to a protocol between the video encoding apparatus and the video decoding apparatus for each layer unit, syntax element, or binary data constituting the syntax element. In this case, the protocol between the video encoding apparatus and the video decoding apparatus may be expressed in the form of a lookup table.

The entropy decoder 510 performs adaptive binary arithmetic decoding (S608).

The entropy decoder 510 may decode binary data to which adaptive binary arithmetic decoding is applied. Details of the adaptive binary arithmetic decoding process will be described later with reference to fig. 7.

The entropy decoder 510 performs inverse binarization (S610).

By performing inverse binarization, the entropy decoder 510 may generate a value representing a syntax element of the video signal from the binary string.

Fig. 7 is a detailed flowchart of the adaptive binary arithmetic decoding process (operation S608) performed by the entropy decoder 510 of the video decoding apparatus.

The entropy decoder 510 determines whether to perform initialization for adaptive binary arithmetic decoding and performs initialization if necessary (S700).

First, when a syntax element to be decoded is a first syntax element of a slice/tile, an initialization operation for adaptive binary arithmetic decoding may be performed. In this case, an operation of initializing variables pStateIdx0 and pStateIdx1 representing the context model and an operation of initializing variables ivlCurrRange and ivlOffset representing the decoding state may be performed.

Further, when the current CTU is the first CTU of a line included in the tile and there are spatial neighboring blocks in order to refer to a value of a variable representing the context model, only an initialization operation for a variable indicating a decoding state is performed. On the other hand, when there is no such spatial neighboring block, an operation of initializing both the context model and a variable indicating a decoding state may be performed.

Here, the context models pStateIdx0 and pStateIdx1 for each syntax element may be probability models for the Maximum Probability Symbol (MPS), and may have different accuracies and adaptation rates. The context model for each syntax element may be initialized based on the initial values of the probabilities for the MPS and QP for the current unit. The initial value of the probability for MPS may be obtained from a ctxttable (which is a lookup table for each syntax element) using ctxIdx according to the prediction mode of the current unit. The ctxTable for each syntax element is a lookup table set according to an agreement between the video encoding device and the video decoding device, and provides an initial probability value for MPS according to ctxIdx and shiftIdx used in a context model update operation. ctxIdx is given as the sum of ctxInc and ctxIdxOffset, which is a value depending on a prediction mode, and ctxInc can be calculated based on various information such as syntax of adjacent blocks, depth of CU, and size of transform.

A variable ivlCurrRange (which indicates a state of adaptive binary arithmetic decoding) represents a current portion for decoding, and ivlOffset is an offset of a preset length acquired from a compressed bitstream and represents a position within the ivlCurrRange. In the initialization operation of the variable indicating the decoded state, ivlCurrRange may be initialized to a preset value (e.g., 510). Alternatively, ivlCurrRange may be calculated based on the bit depth representing the dynamic range of the probability. ivlOffset can be initialized to a value of n bits (n is a natural number) obtained from the bitstream. As provided herein, n may be determined based on the length of ivlCurrRange, or may be signaled from a video encoding device.

In the binary arithmetic decoding, the decoding may be performed by repeating an operation of dividing a section corresponding to the length of ivlCurrRange into two sections. For two probability values p0 and p1 (where p1 is 1-p0), the part of ivlCurrRange can be divided into two parts of length p0 × ivlCurrRange and (1-p0) × ivlCurrRange. One of the two portions may be selected according to a binary data value produced by binary arithmetic decoding.

The entropy decoder 510 generates a binary string by performing binary arithmetic decoding (S702).

The entropy decoder 510 may perform a binary arithmetic decoding process as part of an adaptive binary arithmetic decoding process. The entropy decoder 510 may update ivlCurrRange and ivlOffset with the context models pStateIdx0 and pStateIdx1 for each syntax element and generate binary data. First, ivlCurrRange is updated as shown in equations 1 to 3.

[ equation 1]

qRangeIdx＝ivlCurrRange＞＞a

pState＝pStateIdx1+b*pStateIdx0

valMps＝pState＞＞c

[ equation 2]

ivlLpsRange＝(qRangeIdx*((valMps？(c+1)-pState：pState)

＞＞(c-a))＞＞1)+d

[ equation 3]

ivlCurrRange＝ivlCurrRange-ivlLpsRange

Here, a, b, c, and d are all natural numbers, and c is a natural number greater than a. qRangeIdx has a value obtained by dividing ivlCurrRange (which is the current part) by (1 < a), and qRangeIdx may represent an index for a specific part. As described above, the two context models, pStateIdx0 and pStateIdx1, are probabilistic models for MPS, each having an accuracy of (c-b) bits and c bits. As shown in equation 1, using the probability values of the two probability models, a probability value pState of the current state may be calculated, and an MPS value may be determined according to the probability value of the current state. Further, as shown in equation 2, after calculating the partial length ivlLpsRange of the Least Probable Symbol (LPS), the length of the current part may be updated using equation 3.

The binary data value may be determined based on the updated ivlCurrRange and ivlOffset. When ivlOffset is greater than or equal to ivlcurrrrange, the binary data value is determined as an LPS value (| (MPS value)), and ivlOffset and ivlcurrrrange are updated as shown in equation 4.

[ equation 4]

ivlOffset＝ivlOffset--iVlCurrRange

ivlCurrRange＝ivlLpsRange

On the other hand, when ivlOffset is smaller than ivlcurrrrange, the binary data value is determined as the MPS value, and the update of ivlOffset and ivlcurrrrange is skipped. The determined binary data values may form a binary string.

The entropy decoder 510 stores the binary string in the binary data buffer (S704). As described above, the entropy decoder 510 may decode a value representing a syntax element of a video signal from a binary data value stored in the binary data buffer using the inverse binarization process S610.

The entropy decoder 510 updates the context variable (S706).

To update the context variables, the entropy decoder 510 may calculate adaptation rates shift0 and shift1 for the two probability models using shift idx, as shown in equation 5.

[ equation 5]

shift0＝(shiftIdx＞＞e)+f

shift1＝(shiftIdx&g)+h+shift0

Here, e, f, g, and h are all natural numbers.

The entropy decoder 510 may update the two probability models pStateIdx0 and pStateIdx1 with an adaptation rate as shown in equations 6 and 7.

[ equation 6]

pStateIdx0＝min(pStateIdx0-(pStateIdx0＞＞shift0)+

((1＜＜(c-b))*bin＞＞shift0)，pMax0)

[ equation 7]

pStateIdx1＝min(pStateIdx1-(pStateIdx1＞＞shift1)+

((1＜＜c)*bin＞＞Shift1)，pMax1)

Here, the min (x, y) function is a function that outputs a smaller value of x and y, and pMax0 and pMax1 are the maximum probability values that the probability model may have. Therefore, by making use of the min (x, y) function so that the probability value of each probability model is truncated to the maximum value, the adaptive binary arithmetic encoding/decoding process according to the present embodiment effectively limits the amount of binary data and operates the binary data buffer, as compared with the bypass mode.

pMax0 and pMax1 may be transmitted from a video encoding device. Alternatively, a look-up table based on ShiftIdx may be utilized to determine, calculate, or derive pMax0 and pMax 1. Alternatively, only one of pMax0 and pMax1 may be signaled, and pMax1 may be calculated based on pMax0, or pMax0 may be calculated based on pMax 1. Alternatively, pMax0 and pMax1 may be values predetermined according to a protocol between the video encoding device and the video decoding device. Alternatively, pMax0 and pMax1 of the current cell may be calculated based on pMax0 and pMax1 of the previous cell in the order of encoding and decoding. Alternatively, pMax0 and pMax1 of the current cell may be calculated based on pMax0 and pMax1 of the cell at the same location in one or more color spaces as the current cell.

The entropy decoder 510 determines and performs renormalization (S708).

When the length of the updated ivlCurrRange is less than the preset threshold T, the entropy decoder 510 may perform renormalization. The threshold T may be transmitted from the video encoding apparatus. Alternatively, the threshold T may be derived using high level information. Alternatively, the threshold T may be derived from one or more neighboring cells. When performing the renormalization operation, the entropy decoder 510 doubles the length of ivlCurrRange and appends 1 bit acquired from the bitstream to the Least Significant Bit (LSB) of ivlOffset.

When the adaptive binary arithmetic decoding is skipped, the entropy decoder 510 may perform the binary arithmetic decoding as a bypass mode. In the binary arithmetic decoding process, the probability value for MPS is set to 0.5, and accordingly, the Most Significant Bit (MSB) of ivlOffset may be determined as a binary data value.

Entropy decoding is mainly described above. Since entropy encoding corresponds to an inverse process of entropy decoding, a detailed description of entropy encoding will be skipped. Furthermore, the above-described method of maintaining the constraint on the binary data to bit ratio in order to efficiently operate the binary data buffer may be applied even to entropy coding.

As described above, according to the present embodiment, by providing a method of efficiently operating a binary data buffer in entropy encoding and decoding related to bitstream generation and parsing, a ratio of binary data to bits can be limited.

In one aspect of the present invention, the video encoding/decoding apparatus may configure a list including various entropy encoding/decoding methods, and adaptively use the entropy encoding/decoding method for each basic unit of entropy encoding/decoding.

In a method for entropy encoding (hereinafter referred to as an "entropy encoding method"), a syntax element for representing a video signal is binarized and converted into a binary string. Then, adaptive binary arithmetic coding (hereinafter referred to as "binary arithmetic coding") is performed on each binary data of the binary string to generate a bitstream. The syntax element may include information related to the segmentation and prediction of the current block, a signal generated by transforming/quantizing a residual signal of the current block, a differential signal, and information related to a quantized signal.

In order to perform encoding on each binary data of the input binary string, binary arithmetic encoding may be performed based on at least one of a bit depth of the encoded signal, information on partitioning of a probability range according to the bit depth, and state information on a current binary arithmetic coding process within the partitioned probability range, a syntax element, lps (mps) of each binary data of the string, or probability information on lps (mps) of each binary data. In the binary arithmetic coding process after the binarization, a probability range including the binary data to be coded can be determined within a current probability range using probability information about lps (mps) and lps (mps) of the binary data coded based on the above information. When the probability range of binary data to be encoded is determined, state information regarding the binary arithmetic encoding process may be updated within the split probability range based on context information related to the corresponding binary data. That is, context information such as probability information about lps (mps) of binary data to be encoded may be updated according to a binary arithmetic coding method. The updating of the context information of the binary data to be encoded may be performed based on one or more syntax elements that are binary arithmetically encoded without additional information according to a protocol between the video encoding apparatus and the video decoding apparatus, and index information on each binary data in the binary string obtained by binarizing the syntax elements.

In another embodiment of the present invention, a fixed value may be applied as the probability of lps (mps) and lps (mps) to binary data whose context information is not updated. For example, LPS of all binary data in a binary string for which context information is not updated may be fixed to 1, and a probability value of the LPS may be fixed to 1/2. The fixed probability values for LPS may be set to 1/2, 1/4, 3/8, etc., and various probability values whose denominator may be expressed as a power of 2 may be used. Alternatively, the probability of lps (mps) and lps (mps) may be encoded using a table with an index, the table may be transmitted to a video decoding apparatus, and binary arithmetic decoding may be performed based on the transmitted information.

In performing the binarization and the binary arithmetic coding process, the entropy encoder 155 of the video encoding apparatus may perform entropy encoding using an entropy encoding method limited by a basic unit (see fig. 8) including entropy encoding of the encoded block. The entropy encoder 155 transmits a list of entropy encoding methods using high-level information and transmits an index of the list in a basic unit of entropy encoding. In addition, the entropy encoder 155 performs entropy encoding using an entropy encoding method assigned to a corresponding index in an encoding block within a basic unit of entropy encoding. The basic unit of entropy coding may include one or more coding blocks. The entropy encoding method may be one or more K (where K is a natural number). When K is 2, the entropy coding method may be a 1-bit flag. Further, when K is 1, an entropy encoding method is provided, so information on a list index can be omitted, and a fixed method can be used as the entropy encoding method according to a protocol between a video encoding apparatus and a video decoding apparatus.

In one aspect of the present invention, binary arithmetic coding may be skipped in the entropy coding method. In performing the entropy encoding method, the entropy encoder 155 may output corresponding information as a bitstream while skipping binary arithmetic encoding of the binarization result. In addition, a method of skipping binary arithmetic coding may be included in the list of the above-described entropy coding methods. A list including skipping binary arithmetic coding as a first method of entropy coding, first binary arithmetic coding as a second method, and second binary arithmetic coding as a third method may be formed. There may be one or more, J binary arithmetic coding methods (where J is a natural number), and there may be K entropy coding methods according to the J binary arithmetic coding methods. K may be greater than or equal to J. Further, when K is 2, in the entropy coding method, the binary arithmetic coding method may be turned on/off using a 1-bit flag. When K is 1, there is only one entropy encoding method, so information on the list index can be omitted, and a fixed method according to a protocol between the video encoding apparatus and the video decoding apparatus can be used as the entropy encoding method.

In the present invention, an entropy encoding method may be configured as a combination of various binarization methods and binary arithmetic coding methods. In this case, the number K of encoding methods included in the list of entropy encoding methods may be the maximum number of combinations between the binarization method and the binary arithmetic encoding method. In addition, a specific binarization method may be combined with a specific binary arithmetic coding method. After determining the configuration of the list, the list may be transmitted from the video encoding apparatus to the video decoding apparatus, or the configuration of the list may be determined according to a protocol between the video encoding apparatus and the video decoding apparatus. Further, the binarization method may be different between a case of skipping binary arithmetic coding and a case of performing binary arithmetic coding.

Detailed information on the binarization method can be derived at the video decoding apparatus according to a protocol between the video encoding apparatus and the video decoding apparatus without transmitting additional information. Further, a part of the index information may be transmitted to the video decoding apparatus, and then the entire information on the binarization method may be derived.

In the entropy encoding method according to the present invention, the information regarding the binary arithmetic coding method transmitted at a high level may include at least one of context information regarding the binary data after binarizing each syntax element for encoding and an indication as to whether to update the context information, such as MPS of the binary data and probability information regarding the MPS, a size of a probability update window for updating a probability range, and a probability of lps (MPS) of the binary data. The binary arithmetic coding method is configured as a feature with context information on binary data after binarization of each syntax element, such as MPS of the binary data and probability information on the MPS, a size of a probability update window for updating the probability, and a probability of lps (MPS) of the binary data, and an update of the context information. A plurality of binary arithmetic coding methods constitute a list.

Accordingly, according to a binary arithmetic coding method used by binary data to be encoded, it can be discriminated whether to update a context, such as a probability update window size, MPS, and initial probability information on the MPS, and probability information on the lps (MPS) of the binary data. For example, when the MPS of binary data indicated as "0" is 1, the probability of the MPS is 3/4, and no update of context such as probability information regarding the lps (MPS) of binary data is performed, the corresponding binary data is 0 and is not the MPS, but the probability of the MPS may remain 3/4 without an update of context such as probability information regarding the lps (MPS) of binary data, and then encoding may be performed.

Further, whether to update the context of the binary data may be determined according to a protocol between the video encoding apparatus and the video decoding apparatus. Accordingly, when a protocol for skipping a specific syntax element or a context update of specific binary data of the specific syntax element is agreed between the video encoding device and the video decoding device, the context can be unconditionally updated for the remaining binary data.

For coding efficiency, probability information on MPS may be indicated as probability information on LPS and transmitted, and may be derived as "probability information of 1-LPS" by the video decoding apparatus.

In the entropy encoding method according to the present invention, a method of performing binary arithmetic encoding on binary data after binarization without context update may be set as a separate binary arithmetic encoding method. In this method, binary arithmetic coding is performed using only MPS of binary data after binarization and probability information on the MPS, and can be repeatedly performed without context update. Accordingly, such a binary arithmetic coding method may be defined by MPS and MPS probability information for each binary data of each syntax element. That is, each binary data for each syntax element may have independent MPS and MPS probability information, or may have the same MPS and MPS probability information. Alternatively, only one MPS and the probability information about the MPS may be the same.

When both the MPS and MPS probability information for each binary data of each syntax element are the same, the video encoding device may directly transmit to the video decoding device one combination of the MPS and MPS probability information, or may transmit derivable information to enable the video decoding device to derive the one combination of the MPS and MPS probability information. Further, when each binary data of each syntax element has an independent combination, the video encoding apparatus may directly transmit each combination to the video decoding apparatus, or may transmit derivable information to enable the video decoding apparatus to derive the combinations. Further, when MPS are all the same and probability information about the MPS is not the same, the video encoding device may directly transmit one MPS and probability information for each binary data of each syntax element to the video decoding device, or may transmit derivable information to enable the video decoding device to derive the one MPS and probability information for each binary data of each syntax element.

In the entropy encoding method according to the present invention, information on a list component of a binary arithmetic coding method may be transmitted from a video encoding apparatus to a video decoding apparatus using a high-level syntax element. In the video encoding/decoding structure, the high level syntax element may be transmitted at a level having a higher partitioning concept than the entropy encoding unit, or at one of the levels such as VPS, SPS, PPS, APS, or SEI message. The higher segmentation concept may mean a unit into which an image is segmented, and may be a set of blocks such as sub-images, a set of tiles, a tile, a slice, a brick, etc., and a set of pixels such as a grid.

In the entropy encoding method according to the present invention, the probability value of the initial MPS, the size of the probability range update window, and the range division depth for dividing the normalized value between 0 and 0.5 into probability ranges may be expressed in an exponential form of k. As provided herein, k can be a positive integer greater than or equal to 2, and the indices p, q, and r can be non-zero integers. At least one of the indices p, q, and r may be signaled from the video encoding apparatus to the video decoding apparatus. The video encoding/decoding apparatus may derive the start probability of the initial MPS and the value of the update window using the indices p, q, r, and k. For example, when k is 2, the probability index p of MPS is 3, the window size index q is 1, and the range depth index r is 7, the probability range between 0 and 0.5 is divided to correspond to 2⁷(0-127), the initial probability position of MPS is 8, namely 2³And by 2, and¹the probability range is updated in units. When expressed as a normalized value between 0 and 0.5, the initial probability of MPS may be 1/16, and the probability update may be performed in units of 1/128.

Further, the depth of the segment, the size of the update window, and the MPS and probability information about the MPS may be transmitted in the form of a table having an index. Information on the binary arithmetic coding method according to the present invention can be indicated by transmitting an index corresponding to each binary data in the binary string of each syntax element.

In the entropy encoding method according to the present invention, the entropy encoder 155 may have a list memory to store state information regarding binary arithmetic coding. The entropy encoder 155 performs binary arithmetic coding after binarization of the coded block according to a binary arithmetic coding method (see fig. 8) applied in a basic unit including entropy coding of the coded block. When the binary arithmetic coding method of the entropy-coded basic unit of the block to be coded performs binary arithmetic coding after binarization, probability information as a result of the execution may be stored. Since the N binary arithmetic coding methods constitute a list, storage and management of the memory are also performed according to the index of the binary arithmetic coding method list. The stored probability information can be used and updated when the same binary arithmetic coding method is applied later. The application of the same binary arithmetic coding method includes the same entropy-coded basic unit, and may include a case where the units are not the same entropy-coded basic unit. Accordingly, even after the encoding of one basic unit of entropy encoding is completed, the stored probability information can be stored and used.

In addition, initialization of binary arithmetic coding may be performed for each entropy coding unit. In this case, after probability information is stored and managed in one entropy coding unit, when one entropy coding is finished, the stored information may be initialized. In addition, when initialization of binary arithmetic coding is performed in each entropy coding unit, probability information may be stored and managed using one memory in the basic unit of entropy coding.

Hereinafter, a method for entropy decoding performed by a video decoding apparatus (hereinafter, referred to as an "entropy decoding method") for a bitstream generated by a video encoding apparatus according to the entropy encoding method as described above will be described.

The entropy decoder 510 parses information about an entropy decoding method included in a high level of the bitstream (S900). The parsed information may be stored in an associated memory.

The entropy decoder 510 derives entropy decoding information on the entropy-decoded basic unit (S902).

Based on the parsed high level information, the entropy decoder 510 may derive information on a binary arithmetic coding method, a size of an entropy-decoded basic unit (see fig. 8), a binary arithmetic coding method of the entropy-decoded basic unit, a binary arithmetic decoding method of a current decoded block according to an index, and the like. In addition, after the above information is derived in the entropy-decoded basic unit, additional information may be derived in the decoded block unit.

The entropy decoder 510 acquires information on binary arithmetic decoding (S904). Information on the binary arithmetic decoding method of the corresponding index may be acquired from a storage memory associated with the binary arithmetic decoding.

The entropy decoder 510 performs binary arithmetic decoding according to whether or not the binary arithmetic decoding is performed (S906), and performs inverse binarization (S908).

When the current decoding method of the basic unit of entropy decoding is a method of skipping binary arithmetic decoding, the entropy decoder 510 may perform entropy decoding by inversely binarizing the value of the bitstream.

In case of entropy decoding including inverse binarization, the entropy decoder 510 may generate an entropy decoding result through inverse binarization of the binary string. In the case of entropy decoding that does not include inverse binarization, the entropy decoder 510 may output a binary string as an entropy decoding result.

Hereinafter, a detailed description will be given of binary arithmetic decoding (operation S906) of each binary data of the bitstream, which is performed when the current decoding method of the entropy-decoded basic unit is a method of performing binary arithmetic decoding, with reference to fig. 10.

The entropy decoder 510 determines binary data based on the probability range information and the offset information (S1000). Based on the probability range information and the offset information, the entropy decoder 510 may decide a range including the decoded binary data between the probability ranges of the LPS and the MPS, and determine binary data suitable for the corresponding range.

The entropy decoder 510 determines whether the decoding on the binary data is to perform context-updated binary arithmetic decoding or skip-updated binary arithmetic decoding (S1002), and performs context updating (S1004).

For ease of calculation, the context update may be performed using one or more of a method of changing an index of a table and mapping the index of the changed table, a method of directly updating a probability to be updated using an equation, a method of performing update using a shift operation, or a method of multiplying a weight and performing a shift operation.

Further, context updates may be performed using one or more of the following methods: a method of applying binary arithmetic decoding information transmitted according to a protocol between video encoding/decoding apparatuses or at a high level to all binary data, a method of selectively applying some binary data according to a protocol between video encoding/decoding apparatuses, or a method of selectively applying binary arithmetic decoding information transmitted at a high level to each binary data.

In the context update, probability information about LPS and MPS may be changed using one or more of the following methods: a method of changing an index of a table according to a protocol between video encoding/decoding apparatuses, a method of adding a preset probability value or multiplying by a preset value or shifting a preset value according to a protocol between video encoding/decoding apparatuses, or a method of adaptively adding a probability value or multiplying by a preset value or shifting a preset value to each binary data according to context information transmitted at a high level.

The entropy decoder 510 stores the binary arithmetic decoded state (S1006), and outputs the determined binary data (S1008).

The detailed execution process of the entropy decoding method not described above may be performed using a method corresponding to the above-described entropy encoding method.

In the present embodiment, when an encoding/decoding method that skips binary arithmetic encoding/decoding is selected, entropy encoding/decoding may be adaptively performed according to information on a current encoding/decoding block. The adaptive performance means that the binary arithmetic coding/decoding method can be derived based on coding/decoding information on the coding/decoding block and the neighboring block, independently of information specified in a basic unit including entropy coding/decoding of the coding/decoding block. When the number K of entropy-encoded lists is 1 and a binary arithmetic encoding/decoding method that skips an entropy-encoding/decoding method applied in an entropy-encoded/decoded basic unit is used, the entropy-encoding/decoding methods of the entropy-encoded basic unit may all be the same.

Further, the entropy encoding/decoding method may be determined as a method of skipping binary arithmetic encoding/decoding when a specific condition is satisfied according to a size of a block currently being decoded, a prediction mode of the block, MV accuracy of the block, signal characteristics (luminance/chrominance) of the block, a division state of the block, a type of a transform kernel of the block, a state of a second transform of the block, a quantization coefficient value of the block, a quantization matrix value of the block, a state of transform of the block, and the like. In this case, the method of specific skipping may be a method not included in the transmission list of the entropy encoding/decoding method. The method may be a method according to a protocol between video encoding/decoding apparatuses, which can recognize an entropy encoding/decoding method without transmitted information.

For example, when the currently decoded block has a minimum size that can be encoded/decoded without division, the entropy encoding/decoding method of the block may be a method with skip binary arithmetic encoding/decoding. Further, the entropy encoding/decoding method of the block may be a method with skip binary arithmetic encoding/decoding when the currently decoded block is divided into the minimum-sized blocks. The foregoing is an example of specific conditions, and various embodiments may be possible depending on the circumstances in which such specific conditions may be derived.

As described above, according to the present invention, in entropy encoding and decoding related to generation and parsing of a bitstream, a method of configuring a list including various entropy encoding/decoding methods and adaptively utilizing the entropy encoding/decoding method for each basic unit of entropy encoding/decoding is provided. Accordingly, entropy encoding/decoding may be performed according to the application and characteristics of the signal.

Although it is shown in each flowchart that the respective operations are sequentially performed according to the present embodiment, the embodiment is not limited thereto. In other words, changes in operations in the execution of one or more operations or in parallel execution may be applicable, and thus the flowcharts are not limited to time series order.

Although the exemplary embodiments have been described for illustrative purposes, those skilled in the art will appreciate that various modifications and changes are possible without departing from the spirit and scope of the embodiments. For the sake of brevity and clarity, exemplary embodiments have been described. Accordingly, it will be understood by those of ordinary skill that the scope of the embodiments is not limited by the embodiments explicitly described above, but is included in the claims and their equivalents.

Reference numerals

155: entropy coder

510: an entropy decoder.

Cross Reference to Related Applications

The present application claims the priority of korean patent application No. 10-2019-.

Claims

1. A method for entropy decoding performed by a video decoding device, the method comprising:

receiving a bitstream generated by encoding an image;

performing an arithmetic decoding process to generate at least one binary string by decoding a bitstream, each binary string including at least one binary data; and

the syntax element is generated by inverse binarization of the binary string,

wherein a number of binary data generated by decoding the bitstream satisfies a constraint that the number does not exceed a threshold,

wherein the threshold value is variably set according to a hierarchy or level of the video.

2. The method of claim 1, wherein the threshold value is a value calculated based on a size of an image, a sampling ratio between a luminance component and a chrominance component, and a bit depth.

3. The method of claim 1, wherein to satisfy a constraint that the amount of binary data does not exceed the threshold, performing an arithmetic decoding process comprises:

incrementing the counter by N every read bit and decrementing the counter by M according to the occurrence of one binary data, where M and N are natural numbers and are determined based on the threshold;

selectively performing context-based adaptive binary arithmetic decoding or uniform probability-based binary arithmetic decoding according to whether the counter is greater than N.

4. The method of claim 3, wherein the adaptive binary arithmetic decoding comprises:

generating a specific part from a current part for decoding;

for each syntax element, obtaining a current probability value for the most probable symbol MPS using the first context model and the second context model, and generating an MPS value;

calculating a length of a section of the minimum probability symbol (LPS) using the specific section and the MPS value, and updating the current section using the length of the section of the LPS;

generating one binary data using an offset of a preset length and an updated current part acquired from a bitstream, and readjusting the offset and the current part when a value of an LPS generated from a value of an MPS is determined as one binary data;

updating the first context model and the second context model; and

renormalizing the current portion and the offset when the readjusted current portion is less than a preset threshold.

5. The method of claim 4, wherein updating the first context model and the second context model comprises:

calculating a first adaptation rate and a second adaptation rate for each syntax element; and

updating the first context model with a first adaptation rate, and updating the second context model with a second adaptation rate,

wherein the first context model is clipped to a maximum probability value that the first context model can have and the second context model is clipped to a maximum probability value that the second context model can have.

6. A method for entropy encoding a syntax element generated from predictive coding of each block constituting an image, the method comprising:

binarizing each syntax element and generating at least one binary string, each binary string comprising at least one binary data;

performing an arithmetic encoding process to generate encoded data from the binary string; and

generating a bitstream consisting of one or more network abstraction layer NAL units from the encoded data,

wherein the number of binary data is constrained with respect to the length of the one or more NAL units to not exceed a threshold,

wherein the threshold value is variably set according to the hierarchy or level of the video.

7. The method of claim 6, wherein each of the one or more NAL units is configured by inserting one or more zero words of a predefined length when a quantity of binary data exceeds a threshold relative to a length of the one or more NAL units.

8. The method of claim 6, wherein the threshold value is a value calculated based on a size of an image, a sampling ratio between a luminance component and a chrominance component, and a bit depth.

9. The method of claim 6, wherein to satisfy a constraint that the amount of binary data does not exceed a threshold, performing an arithmetic encoding process comprises:

incrementing a counter by N per bit and decrementing the counter by M per binary data, where M and N are natural numbers and are determined based on the threshold;

selectively performing context-based adaptive binary arithmetic coding or uniform probability-based binary arithmetic coding depending on whether the counter is greater than N.

10. An apparatus for entropy decoding, comprising:

an arithmetic decoder configured to:

receiving a bitstream generated by encoding an image; and

an inverse binarizer configured to generate syntax elements by inverse binarizing the binary string,

wherein the threshold value is variably set according to a hierarchy or a level of the picture.

11. The apparatus of claim 10, wherein the threshold value is a value calculated based on a size of an image, a sampling ratio between a luminance component and a chrominance component, and a bit depth.

12. The apparatus of claim 10, wherein to satisfy a constraint that a quantity of binary data does not exceed a threshold, the arithmetic decoder is configured to:

incrementing the counter by N every read bit and decrementing the counter by M according to the occurrence of one binary datum; and

selectively performing context-based adaptive binary arithmetic decoding or uniform probability-based binary arithmetic decoding depending on whether the counter is greater than N,

wherein M and N are natural numbers and are determined based on the threshold.