CN114128281A

CN114128281A - Method and apparatus for controlling coding tool

Info

Publication number: CN114128281A
Application number: CN202080045151.2A
Authority: CN
Inventors: 姜制远; 朴相孝; 朴胜煜; 林和平
Original assignee: Hyundai Motor Co; Industry Collaboration Foundation of Ewha University; Kia Corp
Current assignee: Hyundai Motor Co; Industry Collaboration Foundation of Ewha University; Kia Corp
Priority date: 2019-06-21
Filing date: 2020-06-22
Publication date: 2022-03-01
Anticipated expiration: 2040-06-22
Also published as: KR20200145773A; CN114128281B

Abstract

A method and apparatus for controlling an encoding tool are disclosed. According to an embodiment of the present invention, there is provided an image decoding method including the steps of: decoding an enable flag indicating whether at least one encoding tool is enabled from a higher level of the bitstream; setting an application flag indicating whether the at least one encoding tool can be applied to a predetermined value or decoding the application flag from a lower level of the bitstream to obtain a value of the application flag, according to a value of the enable flag; and executing the at least one encoding tool when the value of the application flag is a value indicating that the at least one encoding tool can be applied.

Description

Method and apparatus for controlling coding tool

Technical Field

The present invention relates to encoding and decoding of video, and more particularly, to a method and apparatus for improving encoding and decoding efficiency by organically controlling various encoding tools used for encoding and decoding of video.

Background

Since the amount of video data is larger than the amount of voice data or the amount of still image data, a large amount of hardware resources (including memory) are required to store or transmit the video data without compression processing. Accordingly, when storing or transmitting video data, the video data is typically compressed for storage or transmission using an encoder. Then, the decoder receives the compressed video data, decompresses and reproduces the video data. Compression techniques for such Video include H.264/AVC and High Efficiency Video Coding (HEVC), which is an improvement of approximately 40% over the Coding Efficiency of H.264/AVC.

However, video size, resolution, and frame rate are gradually increasing, and accordingly, the amount of data to be encoded is also increasing. Therefore, a new compression technique having better coding efficiency and higher picture quality than the existing compression technique is required.

Disclosure of Invention

To meet these needs, the present invention is directed to an improved video encoding and decoding technique. In particular, one aspect of the present invention relates to a technique for improving encoding and decoding efficiency by controlling the turning on or off of various encoding tools via syntax elements defined at a high level.

According to an aspect of the present invention, there is provided a video decoding method including: decoding, from a high level of the bitstream, an enable flag indicating whether one or more encoding tools are enabled, the one or more encoding tools including a first encoding tool configured to encode sample values with a luma component mapping based on a piecewise-linear model; obtaining a value of an application flag according to a value of an enable flag by setting an application flag indicating whether to apply one or more coding tools to a predetermined value or by decoding the application flag from a low level of the bitstream, the application flag including a first application flag indicating whether to apply a first coding tool; and executing the one or more encoding tools when the value of the application flag is a value indicating that the one or more encoding tools are applied.

When executing the first encoding tool according to the value of the first application flag, executing the one or more encoding tools includes: generating mapped luma prediction samples from the luma prediction samples based on a piecewise-linear model corresponding to the luma prediction samples, and generating luma reconstructed samples by adding luma residual samples reconstructed from the bitstream to the mapped luma prediction samples; and inverse mapping the luminance reconstructed samples using an inverse piecewise linear model having an inverse relationship to the piecewise linear model.

According to another aspect of the present invention, there is provided a video decoding apparatus including: an entropy decoder configured to decode, from a high level of a bitstream, an enable flag indicating whether one or more coding tools are enabled, the one or more coding tools including a first coding tool configured to code sample values with a luma component mapping based on a piecewise linear model; an acquisition unit configured to acquire a value of an application flag indicating whether to apply one or more coding tools by setting the application flag to a predetermined value or decoding the application flag from a lower level of a bitstream according to a value of an enable flag, wherein the application flag includes a first application flag indicating whether to apply a first coding tool; and an execution unit configured to execute the one or more coding tools when the value of the application flag is a value indicating that the one or more coding tools are applied.

When the first coding tool is executed according to the value of the first application flag, the execution unit is configured to: generating mapped luma prediction samples from the luma prediction samples based on a piecewise linear model corresponding to the luma prediction samples; generating luma reconstructed samples by adding luma residual samples reconstructed from the bitstream to the mapped luma prediction samples; and inverse mapping the luminance reconstructed samples using an inverse piecewise linear model having an inverse relationship to the piecewise linear model.

As apparent from the foregoing, according to exemplary embodiments of the present invention, whether various encoding tools are applied can be controlled at a high level, and thus compression performance of encoding and decoding can be improved.

Drawings

FIG. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure.

Fig. 2 exemplarily shows a block partition structure using the QTBTTT structure.

Fig. 3a exemplarily shows a plurality of intra prediction modes.

Fig. 3b exemplarily shows a plurality of intra prediction modes including a wide-angle intra prediction mode.

Fig. 4 is an exemplary block diagram of a video decoding device capable of implementing the techniques of this disclosure.

Fig. 5 is an exemplary block diagram of a video decoding apparatus capable of controlling an encoding tool.

Fig. 6 is a flowchart illustrating an example of a method of controlling an encoding tool.

Fig. 7 and 8 are flowcharts illustrating various examples of methods of controlling an encoding tool.

Fig. 9 is an exemplary block diagram illustrating a first encoding tool.

Fig. 10 is an exemplary block diagram of a video decoding apparatus capable of executing a first encoding tool.

Fig. 11 is a flowchart illustrating an example of a method of controlling execution of a first encoding tool.

Fig. 12 is a flowchart illustrating an example of a method of deriving mapped luma prediction samples by a first coding tool.

Fig. 13 is a flowchart illustrating an example of a method of deriving a scaling factor by a first encoding tool.

Fig. 14 is a flowchart illustrating an example of a method of deriving inverse mapped luma reconstruction samples by a first coding tool.

Fig. 15 is a flowchart illustrating an example of a method of determining scaled chroma residual samples by a first encoding tool.

Fig. 16 is a flowchart illustrating an example of a method of scaling chroma residual samples by a first encoding tool.

Fig. 17 is a flowchart showing an example of controlling execution of the second encoding tool.

Detailed Description

Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that, when a reference numeral is added to a constituent element in each drawing, the same reference numeral also denotes the same element although the element is shown in different drawings. In addition, in the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted to avoid obscuring the subject matter of the present invention.

FIG. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure. Hereinafter, a video encoding apparatus and elements of the apparatus will be described with reference to fig. 1.

The video encoding apparatus may include: the image divider 110, the predictor 120, the subtractor 130, the transformer 140, the quantizer 145, the rearrangement unit 150, the entropy encoder 155, the inverse quantizer 160, the inverse transformer 165, the adder 170, the filtering unit 180, and the memory 190. Each element of the video encoding apparatus may be implemented in hardware or software or a combination of hardware and software. The functions of the respective elements may be implemented as software, and the microprocessor may be implemented to perform the software functions corresponding to the respective elements.

One video includes a plurality of images. Each image is divided into a plurality of regions, and encoding is performed on each region. For example, an image is segmented into one or more tiles (tiles) or/and slices (slices). In particular, one or more tiles may be defined as a group of tiles. Each tile or slice is partitioned into one or more Coding Tree Units (CTUs). Each CTU is divided into one or more Coding Units (CUs) by a tree structure. Information applied to each CU is encoded as syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as syntax of the CTU. In addition, information commonly applied to all blocks in one slice is encoded as syntax of a slice header, and information applied to all blocks constituting a Picture is encoded in a Picture Parameter Set (PPS) or a Picture header. In addition, information commonly referred to by a plurality of pictures is encoded in a Sequence Parameter Set (SPS). In addition, information commonly referenced by one or more SPS's is encoded in a Video Parameter Set (VPS). Information commonly applied to one tile or tile group may be encoded as syntax of a tile header or tile group header.

The picture partitioner 110 is configured to determine the size of a Coding Tree Unit (CTU). Information on the size of the CTU (CTU size) is encoded into the syntax of the SPS or PPS and transmitted to the video decoding apparatus. The image divider 110 is configured to divide each image constituting a video into a plurality of CTUs having a predetermined size, and then recursively divide the CTUs using a tree structure. In the tree structure, leaf nodes serve as Coding Units (CUs), which are basic units of coding.

The tree structure may be a QuadTree (QT), i.e., a node (or parent node) divided into four slave nodes (or child nodes) of the same size, a Binary Tree (BT), i.e., a node divided into two slave nodes, a binary tree (TT), i.e., a node divided into three slave nodes having a ratio of 1:2:1, a BT, a trey (BT), i.e., a structure formed by a combination of two or more QT structures, BT structures, and TT structures. For example, a QuadTree plus binary tree (QTBT) structure may be utilized, or a QuadTree plus binary tree (QTBTTT) structure may be utilized. In particular, BTTTs may be collectively referred to as a multiple-type tree (MTT).

Fig. 2 exemplarily shows a QTBTTT split tree structure. As shown in fig. 2, the CTU may be first partitioned into QT structures. The QT split may be repeated until the size of the split block reaches the minimum block size MinQTSize of the leaf nodes allowed in QT. A first flag (QT _ split _ flag) indicating whether each node of the QT structure is divided into four nodes of a lower layer is encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

When the leaf node of QT is equal to or smaller than the maximum block size (MaxBTSize) of the root node allowed in BT, it may be further partitioned into one or more BT structures or TT structures. The BT structure and/or the TT structure may have a plurality of splitting directions. For example, there may be two directions, i.e., a direction of dividing a block of a node horizontally and a direction of dividing a block vertically. As shown in fig. 2, when MTT segmentation starts, a second flag indicating whether a node is segmented (MTT _ split _ flag), a flag indicating a segmentation direction (vertical or horizontal) in the case of segmentation, and/or a flag indicating a segmentation type (binary or trifurcate) are encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

Alternatively, a CU partition flag (split _ CU _ flag) indicating whether a node is divided may be encoded before encoding a first flag (QT _ split _ flag) indicating whether each node is divided into 4 nodes of a lower layer. When the value of the CU partition flag (split _ CU _ flag) indicates that no partition is performed, the block of the node becomes a leaf node in the partition tree structure and serves as a Coding Unit (CU), which is an encoded basic unit. When the value of the CU partition flag (split _ CU _ flag) indicates that the partition is performed, the video encoding apparatus starts encoding the flag from the first flag in the above-described manner.

When using QTBT as another example of the tree structure, there may be two types of partitioning, i.e., a type of partitioning a block horizontally into two blocks of the same size (i.e., symmetric horizontal partitioning) and a type of partitioning a block vertically into two blocks of the same size (i.e., symmetric vertical partitioning). A partition flag (split _ flag) indicating whether each node of the BT structure is partitioned into blocks of a lower layer and partition type information indicating a partition type are encoded by the entropy encoder 155 and transmitted to the video decoding apparatus. There may be additional types of partitioning a block of nodes into two asymmetric blocks. The asymmetric division type may include a type in which a block is divided into two rectangular blocks at a size ratio of 1:3, or a type in which a block of a node is divided diagonally.

CUs may have various sizes according to QTBT or QTBTTT partitioning of CTUs. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of the QTBTTT) is referred to as a "current block". When QTBTTT partitioning is employed, the shape of the current block may be square or rectangular. The predictor 120 is configured to predict the current block to generate a prediction block. The predictor 120 includes an intra predictor 122 and an inter predictor 124.

In general, each current block in an image may be predictively encoded. In addition, the prediction of the current block is performed using an intra prediction technique, which uses data from an image including the current block, or an inter prediction technique, which uses data of an image encoded before the image including the current block. Inter prediction includes both unidirectional prediction and bidirectional prediction.

The intra predictor 122 is configured to predict pixels in the current block using pixels (reference pixels) located around the current block in a current picture including the current block. Depending on the prediction direction, there are multiple intra prediction modes. For example, as shown in fig. 3a, the plurality of intra prediction modes may include two non-directional modes including a planar (planar) mode and a Direct Current (DC) mode, and 65 directional modes. The adjacent pixels and equations to be used are defined differently for each prediction mode. The following table lists the intra prediction mode numbers and their names.

For efficient directional prediction of the rectangular-shaped current block, directional modes (intra prediction modes 67 to 80 and-1 to-14) indicated by dotted arrows in fig. 3b may be additionally utilized. These modes may be referred to as "wide angle intra-prediction modes". In fig. 3b, the arrows indicate the respective reference samples used for prediction, rather than the prediction direction. The prediction direction is opposite to the direction indicated by the arrow. The wide-angle intra prediction mode is a mode in which prediction is performed in a direction opposite to a specific direction mode without additional bit transmission when the current block has a rectangular shape.

In particular, in the wide-angle intra prediction mode, some wide-angle intra prediction modes available for the current block may be determined based on a ratio of a width to a height of the rectangular current block. For example, when the height of the rectangular shape of the current block is smaller than the width thereof, wide-angle intra prediction modes (intra prediction modes 67 to 80) having an angle smaller than about 45 degrees may be utilized. When the height of the rectangular shape of the current block is greater than its width, wide-angle intra prediction modes (intra prediction modes-1 to-14) having angles greater than about-135 degrees may be utilized.

The intra predictor 122 may be configured to determine an intra prediction mode to be used when encoding the current block. In some examples, the intra predictor 122 may be configured to encode the current block using several intra prediction modes, and select an appropriate intra prediction mode to use from the tested modes. For example, the intra predictor 122 may be configured to calculate a rate distortion value using a rate distortion (rate distortion) analysis of several tested intra prediction modes, and select an intra prediction mode having the best rate distortion characteristics among the tested modes.

The intra predictor 122 is configured to select one intra prediction mode from a plurality of intra prediction modes and predict the current block using neighboring pixels (reference pixels) determined according to the selected intra prediction mode and an equation. The information on the selected intra prediction mode is encoded by the entropy encoder 155 and transmitted to the video decoding apparatus.

The inter predictor 124 is configured to generate a prediction block of the current block through motion compensation. The inter predictor 124 is configured to search for a block most similar to the current block in a reference picture that has been encoded and decoded earlier than the current picture, and generate a prediction block of the current block using the searched block. Then, the inter predictor is configured to generate a motion vector (motion vector) corresponding to a displacement (displacement) between the current block in the current picture and the prediction block in the reference picture. In general, motion estimation is performed on a luminance (luma) component, and a motion vector calculated based on the luminance component is used for both the luminance component and the chrominance component. Motion information including information on a reference image and information on a motion vector for predicting the current block is encoded by the entropy encoder 155 and transmitted to the video decoding apparatus.

The subtractor 130 is configured to subtract the prediction block generated by the intra predictor 122 or the inter predictor 124 from the current block to generate a residual block. The transformer 140 may be configured to divide the residual block into one or more transform blocks and apply a transform to the one or more transform blocks, thereby transforming the residual values of the transform blocks from the pixel domain to the frequency domain. In the frequency domain, a transform block is referred to as a coefficient block containing one or more transform coefficient values. Two-dimensional transform kernels may be used for the transform, while one-dimensional transform kernels may be used for the horizontal transform and the vertical transform, respectively. The transform kernel may be based on Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), or the like.

The transformer 140 may be configured to transform the residual signal in the residual block using the entire size of the residual block as a transform unit. In addition, the transformer 140 may be configured to partition the residual block into two sub-blocks in a horizontal direction or a vertical direction and transform only one of the two sub-blocks. Accordingly, the size of the transform block may be different from the size of the residual block (and thus the size of the prediction block). Non-zero residual sample values may not be present or may be very rare in untransformed sub-blocks. The residual samples of the untransformed sub-blocks are not signaled and may be considered as "0" by the video decoding apparatus. There may be a plurality of partition types according to the partition direction and the partition ratio.

The transformer 140 may be configured to provide information on an encoding mode (or a transform mode) of the residual block (e.g., information indicating whether to transform the residual block or the residual sub-block, information indicating a partition type selected to partition the residual block into sub-blocks, and information identifying the sub-block on which the transform is performed) to the entropy encoder 155. The entropy encoder 155 may be configured to encode information on an encoding mode (or a transform mode) of the residual block. The quantizer 145 may be configured to quantize the transform coefficient output from the transformer 140 and output the quantized transform coefficient to the entropy encoder 155. For some blocks or frames, the quantizer 145 may be configured to quantize the associated residual block directly without transformation.

The rearranging unit 150 may be configured to rearrange the coefficient values of the quantized residual values. In addition, the rearranging unit 150 may be configured to change the 2-dimensional coefficient array into the 1-dimensional coefficient sequence by coefficient scanning (coeffient scanning). For example, the rearranging unit 150 may be configured to scan coefficients from Direct Current (DC) coefficients to coefficients in a high frequency region using zigzag scanning (zig-zag scan) or diagonal scanning (diagonalscan) to output a 1-dimensional coefficient sequence. Depending on the size of the transform unit and the intra prediction mode, the zig-zag scan may be replaced by a vertical scan, i.e. scanning the two-dimensional array of coefficients in the column direction, or a horizontal scan, i.e. scanning the two-dimensional block-shaped coefficients in the row direction. In other words, the scan mode to be utilized may be determined in zigzag scanning, diagonal scanning, vertical scanning, and horizontal scanning according to the size of the transform unit and the intra prediction mode.

The entropy encoder 155 is configured to encode the one-dimensionally quantized transform coefficients output from the rearranging unit 150 using various encoding techniques such as Context-based Adaptive Binary Arithmetic Code (CABAC) and exponential Golomb (exponential Golomb) to generate a bitstream. The entropy encoder 155 is configured to encode information related to block division (e.g., CTU size, CU division flag, QT division flag, MTT division type, and MTT division direction) so that the video decoding apparatus can divide blocks in the same manner as the video encoding apparatus. In addition, the entropy encoder 155 is configured to encode information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and encode intra prediction information (i.e., information on an intra prediction mode) or inter prediction information (i.e., information on a reference picture index and a motion vector) according to the prediction type.

The inverse quantizer 160 may be configured to inverse-quantize the quantized transform coefficient output from the quantizer 145 to generate a transform coefficient. The inverse transformer 165 is configured to transform the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain and reconstruct a residual block. The adder 170 is configured to add the reconstructed residual block and the prediction block generated by the predictor 120 to reconstruct the current block. The reconstructed pixels in the current block are used as reference pixels in performing intra prediction of a subsequent block.

The filtering unit 180 is configured to filter the reconstructed pixels to reduce block artifacts (blocking artifacts), ringing artifacts (ringing artifacts), and blurring artifacts (blurring artifacts) due to block-based prediction and transform/quantization. The filtering unit 180 may include a deblocking filter 182 and a Sample Adaptive Offset (SAO) filter 184. The deblocking filter 180 is configured to filter boundaries between reconstructed blocks to remove block artifacts caused by block-wise encoding/decoding, and the SAO filter 184 is configured to perform additional filtering on the deblock filtered video. The SAO filter 184 is a filter for compensating a difference between a reconstructed pixel and an original pixel caused by a lossy coding (lossy coding).

The reconstructed block filtered through the deblocking filter 182 and the SAO filter 184 is stored in the memory 190. Once all blocks in a picture are reconstructed, the reconstructed picture can be used as a reference picture for inter prediction of blocks in a subsequent picture to be encoded.

Fig. 4 is an exemplary functional block diagram of a video decoding device capable of implementing the techniques of this disclosure. Hereinafter, a video decoding apparatus and elements of the apparatus will be described with reference to fig. 4.

The video decoding apparatus may include: an entropy decoder 410, a reordering unit 415, an inverse quantizer 420, an inverse transformer 430, a predictor 440, an adder 450, a filtering unit 460, and a memory 470. Similar to the video encoding apparatus of fig. 1, each element of the video decoding apparatus may be implemented in hardware, software, or a combination of hardware and software. In addition, the function of each element may be implemented in software, and the microprocessor may be implemented to perform the software function corresponding to each element.

The entropy decoder 410 is configured to determine a current block to be decoded by decoding a bitstream generated by a video encoding apparatus and extracting information related to block division, and extract prediction information required to reconstruct the current block, information regarding a residual signal, and the like. The entropy decoder 410 is configured to extract information about a CTU size from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), determine the size of the CTU, and partition a picture into the CTUs of the determined size. Then, the decoder is configured to determine the CTU as the highest layer (i.e., root node) of the tree structure, and extract partitioning information about the CTU to partition the CTU using the tree structure.

For example, when a CTU is divided using a QTBTTT structure, a first flag (QT _ split _ flag) related to the division of QT is extracted to divide each node into four nodes of a sub-layer. For nodes corresponding to leaf nodes of the QT, a second flag (MTT _ split _ flag) related to the splitting of the MTT and information on the splitting direction (vertical/horizontal) and/or the splitting type (binary/trifurcate) are extracted, thereby splitting the corresponding leaf nodes in the MTT structure. Thus, each node below the leaf node of the QT is recursively split in BT or TT structure.

As another example, when a CTU is divided using a QTBTTT structure, a CU division flag (split _ CU _ flag) indicating whether or not to divide a CU may be extracted. When the corresponding block is divided, a first flag (QT _ split _ flag) may be extracted. In a split operation, after zero or more recursive QT splits, zero or more recursive MTT splits may occur per node. For example, a CTU may undergo MTT segmentation directly without QT segmentation, or only multiple times. As another example, when a CTU is divided using a QTBT structure, a first flag (QT _ split _ flag) related to QT division is extracted, and each node is divided into four nodes of a lower layer. Then, a partition flag (split _ flag) indicating whether or not a node corresponding to a leaf node of the QT is further partitioned with BT and partition direction information are extracted.

Once the current block to be decoded is determined through tree structure partitioning, the entropy decoder 410 is configured to extract information on a prediction type indicating whether the current block is intra-predicted or inter-predicted. When the prediction type information indicates intra prediction, the entropy decoder 410 is configured to extract a syntax element for intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the entropy decoder 410 is configured to extract syntax elements for the inter prediction information, that is, information indicating a motion vector and a reference picture referred to by the motion vector.

The entropy decoder 410 is configured to extract information on an encoding mode of the residual block (e.g., information on whether to encode the residual block or encode only sub-blocks of the residual block, information indicating a partition type selected to partition the residual block into sub-blocks, information identifying encoded residual sub-blocks, a quantization parameter, etc.) from the bitstream. The entropy decoder 410 is further configured to extract information regarding transform coefficients of the quantized current block as information regarding a residual signal.

The rearranging unit 415 may be configured to change the sequence of one-dimensional quantized transform coefficients entropy decoded by the entropy decoder 410 into a 2-dimensional coefficient array (i.e., a block) in the reverse order of the coefficient scanning performed by the video encoding apparatus. The inverse quantizer 420 is configured to inverse quantize the quantized transform coefficients. The inverse transformer 430 is configured to inverse-transform the inverse-quantized transform coefficients from the frequency domain to the spatial domain based on information on a coding mode of the residual block to reconstruct a residual signal, thereby generating a reconstructed residual block of the current block.

When the information on the encoding mode of the residual block indicates that the residual block of the current block has been encoded by the video encoding apparatus, the inverse transformer 430 uses the size of the current block (and thus the size of the residual block to be reconstructed) as a transform unit of the inverse-quantized transform coefficient to perform inverse transformation, thereby generating a reconstructed residual block of the current block.

When the information on the encoding mode of the residual block indicates that only one sub-block of the residual block has been encoded by the video encoding apparatus, the inverse transformer 430 uses the size of the transformed sub-block as a transform unit of the inverse-quantized transform coefficient to perform inverse transformation, thereby reconstructing a residual signal of the transformed sub-block, and fills the residual signal of the untransformed sub-block with a value of "0" to generate a reconstructed residual block of the current block.

The predictor 440 may include an intra predictor 442 and an inter predictor 444. The intra predictor 442 is activated when the prediction type of the current block is intra prediction, and the inter predictor 444 is activated when the prediction type of the current block is inter prediction. The intra predictor 442 is configured to determine an intra prediction mode of the current block among a plurality of intra prediction modes based on syntax elements of the intra prediction mode extracted from the entropy decoder 410, and predict the current block using reference pixels around the current block according to the intra prediction mode.

The inter predictor 444 is configured to determine a motion vector of the current block and a reference picture referred to by the motion vector using syntax elements of an intra prediction mode extracted by the entropy decoder 410 and predict the current block based on the motion vector and the reference picture. The adder 450 is configured to reconstruct the current block by adding the residual block output from the inverse transformer 430 to the prediction block output from the inter predictor 444 or the intra predictor 442. When intra-predicting a block to be subsequently decoded, pixels in the reconstructed current block are used as reference pixels.

The filtering unit 460 may include a deblocking filter 462 and an SAO filter 464. The deblocking filter 462 deblocks the boundaries between the reconstructed blocks to remove block artifacts caused by block-by-block decoding. The SAO filter 464 is configured to perform additional filtering on the reconstructed block after deblocking filtering the corresponding offset to compensate for a difference between the reconstructed pixel and the original pixel caused by the lossy coding. The reconstructed block filtered through the deblocking filter 462 and the SAO filter 464 is stored in the memory 470. When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of blocks in pictures to be subsequently encoded.

In the present invention, a method of controlling various coding tools (i.e., coding tool control method) is proposed. The coding tool operating according to the method of the invention may comprise: a coding tool configured to skip a transform/inverse transform operation for residual samples, a coding tool configured to selectively apply one or more of various transform kernels to residual samples to which a transform operation is applied, a coding tool configured to acquire prediction information for a current block from another block located in a current image containing the current block, a coding tool configured to map the prediction samples and scale the residual samples using a piece-wise linear model (first coding tool), and a coding tool configured to apply differential coding (second coding) to the residual samples (second coding tool).

In the following, the coding tool operating according to the method of the invention will be referred to as "target coding tool". The target encoding tool may correspond to an encoding tool for screen content encoding (SCC). Operating the target coding tool may refer to enabling/disabling the target coding tool. Additionally, operating the target coding tool may refer to turning the target coding tool on or off. The operation of the target coding tool may be performed for each combination or group of one or more coding tools included in the target coding tool.

The enabling flag defined at a high level of the bitstream may be utilized to control whether the target coding tool is enabled/disabled and whether the target coding tool is turned on/off. The enable flag may indicate whether a target encoding tool is enabled or may indicate whether an SCC is applied to a target picture to be encoded/decoded. The latter case is provided in consideration of the case where there is a high possibility that "an encoding tool configured to skip a transform/inverse transform operation for residual samples" and "an encoding tool configured to acquire prediction information on a current block from another block located in a current image including the current block" are applied when the feature of the video is screen content. In addition, the latter case is provided in consideration of the case where "a coding tool configured to selectively apply one or more of various transform kernels to a residual sample to which a transform operation is applied" is applied with a high possibility when the feature of the video is not screen content ".

Hereinafter, whether the target encoding tool is enabled and whether the SCC is applied to the target encoding tool will be collectively referred to as "whether the target encoding tool is enabled". The high level of the bitstream defining the enable flag represents a higher level than other levels (low levels) in the bitstream described below. The high levels may be SPS level, PPS level, picture level (including picture header), slice level (including slice header), tile level (including tile header), block level, etc.

Fig. 5 illustrates an exemplary block diagram of a video decoding apparatus capable of controlling a target encoding tool using an enable flag, and fig. 6 illustrates a flowchart of an example of a method of controlling the target encoding tool. As shown in fig. 5, the video decoding apparatus may include: an entropy decoder 410, an acquisition unit 510, and an execution unit 520.

The video encoding device may be configured to determine whether one or more of the target encoding tools are enabled, and set a value of an enable flag according to the result. In addition, the video encoding apparatus may be configured to encode the enable flag in a high level of the bitstream to signal it to the video decoding apparatus. An enable flag equal to 1 indicates that the target coding tool is enabled, and an enable flag equal to 0 indicates that the target coding tool is not enabled. The entropy decoder 410 may be configured to decode an enable flag from a high level of the bitstream (S610).

The video coding device may be configured to determine whether to apply or execute the target coding tool. In addition, the video encoding apparatus may be configured to determine whether to signal an application flag indicating whether to apply or execute the target encoding tool based on a value of the enable flag. For example, when the enable flag is equal to 1, the application flag may not be signaled. When the enable flag is equal to 0, the application flag may be encoded in a low level of the bitstream and signaled to the video decoding apparatus. As another example, when the enable flag is equal to 1, the application flag may be encoded in a low level of the bitstream and signaled to the video decoding apparatus. When the enable flag is equal to 0, the application flag may not be signaled. An application flag equal to 1 indicates that the target coding tool is applied, and an application flag equal to 0 indicates that the target coding tool is not applied.

The application flag may indicate whether to apply all encoding tools included in the target encoding tool, or may indicate whether to apply one or more encoding tools included in the target encoding tool. In the latter case, the application flag may be divided into a first application flag indicating whether to apply the first coding tool and a second application flag indicating whether to apply the second coding tool.

The obtaining unit 510 may be configured to set the application flag to a predetermined value according to the value of the enable flag or may obtain the value of the application flag by decoding the application flag from a low level of the bitstream (S620). For example, when the enable flag is equal to 1, the apply flag may be implicitly set to 1. When the enable flag is equal to 0, the application flag may be decoded from a low level of the bitstream. As another example, when the enable flag is equal to 1, the application flag may be decoded from a low level of the bitstream. When the enable flag is equal to 0, the value of the apply flag may be implicitly set to 0.

The video encoding apparatus may be configured to encode the encoding target by executing the target encoding tool when the target encoding tool is applied, but may not execute the target encoding tool when the target encoding tool is not applied. The execution unit 520 may be configured to execute the target encoding tool when the acquired value of the application flag indicates that the target encoding tool is applied, and not execute the target encoding tool when the acquired value of the application flag indicates that the target encoding tool is not applied (S630).

Two examples relating to obtaining the value of the application flag from the value of the enable flag are shown in fig. 7 and 8, respectively. As shown in fig. 7, the obtaining unit 510 may be configured to determine whether the value of the enable flag is equal to 1 or 0 (S710). When the enable flag is equal to 1, the application flag may not be signaled from the video encoding apparatus, and thus the acquisition unit 510 may be configured to set the application flag to a value (1, a predetermined value) indicating an application target encoding tool (S720). In contrast, when the enable flag is equal to 0, the application flag is signaled from the video encoding apparatus, and thus the acquisition unit 510 may be configured to acquire a value by decoding the application flag from a low level of the bitstream (S730).

The execution unit 520 may be configured to determine whether the value of the acquired application flag is equal to 1 or 0 (S740). The execution unit 520 may be configured to execute the target encoding tool when the application flag is equal to 1 (S750), and not execute the target encoding tool when the application flag is equal to 0 (S760).

As shown in fig. 8, the obtaining unit 510 may be configured to determine whether the value of the enable flag is equal to 1 or 0 (S810). When the enable flag is equal to 1, the application flag is signaled from the video encoding apparatus, and thus the acquisition unit 510 may be configured to acquire a value by decoding the application flag from a low level of the bitstream (S820). In contrast, when the enable flag is equal to 0, the application flag may not be signaled from the video encoding apparatus, and thus the acquisition unit 510 may be configured to set the application flag to a value (0, a predetermined value) indicating that the target encoding tool is not applied (S830).

The execution unit 520 may be configured to determine whether the value of the acquired application flag is equal to 1 or 0 (S840). The execution unit 520 may be configured to execute the target encoding tool when the application flag is equal to 1 (S850), and not execute the target encoding tool when the application flag is equal to 0 (S860).

As described above, according to the coding tool control method of the present invention, since the application flag defined at a low level is determined to be signaled according to the value of the enable flag defined at a high level, the operation of the target coding tool can be controlled at a high level. Accordingly, bit efficiency can be improved. In addition, the improvement in bit efficiency can reduce the bit rate required for various contents such as game broadcasting, 360-degree video streaming, VR/AR video, and online lectures. As a result, the load on the network can be reduced, and improvement in energy efficiency (energy efficiency) and quick decoding of a video playback apparatus (video decoding apparatus) that decodes various contents can be achieved.

Hereinafter, an example of applying the coding tool control method to each target coding tool will be described one by one.

Coding tool configured to skip transform/inverse transform operations

The transform for the residual samples is a technique of transforming the residual samples from the pixel domain to the frequency domain in consideration of the importance of effective video compression and visual perception, and the inverse transform for the residual samples is a technique of transforming the residual samples from the frequency domain to the pixel domain.

However, in the case of an unnatural image such as screen content, the efficiency of such transform/inverse transform techniques may be low. In particular, the transform/inverse transform technique (transform skip) may be skipped. When the transform/inverse transform of the residual samples is skipped, scaling (quantization/inverse quantization) may be performed only on the residual samples, or entropy encoding/decoding may be performed only without scaling.

In the conventional encoding/decoding method, the size of a transform block is set to 4 × 4, 8 × 8, 16 × 16, or 32 × 32, and a transform or transform skip may be applied to the transform block. When applying the transform to the transform block, the video decoding apparatus may be configured to inverse-quantize the quantized transform coefficients (TransCoeffLevel [ x ] [ y ]), and inverse-transform the inverse-quantized transform coefficients (d [ x ] [ y ]) from the frequency domain to the spatial domain to reconstruct residual samples (r [ x ] [ y ]). In addition, the video decoding apparatus may be configured to shift the reconstructed residual samples according to a bit depth of the picture to derive the shifted residual samples.

In the conventional encoding/decoding method, the transform skip may be applied to a transform block having a size of 4 × 4, or may be applied to a transform block having a different size according to an additional syntax element. When a transform skip is applied to the transform block, the video decoding apparatus may be configured to inverse-quantize the quantized transform coefficient (TransCoeffLevel [ x ] [ y ]), and perform a shift operation on the inverse-quantized transform coefficient (d [ x ] [ y ]), to reconstruct the residual sample (r [ x ] [ y ]). In addition, the video decoding apparatus may be configured to shift the reconstructed residual samples according to a bit depth of the picture to derive the shifted residual samples. In particular, a shift operation performed on the inversely quantized transform coefficients is applied instead of the transform technique.

When the flag indicating whether to apply the rotation technique to the transform-skipped residual samples indicates that the rotation technique is applied, the transform-skipped residual samples may be rotated by about 180 degrees. Accordingly, the video decoding apparatus may be configured to scan the residual samples in opposite directions or in opposite order in consideration of symmetry (rotation).

According to the conventional encoding/decoding method, a syntax element (transform skip flag) indicating whether a transform technique is applied to a transform block is signaled for each transform unit, and accordingly bit efficiency is reduced. In addition, according to the conventional encoding/decoding method, a syntax element (presence flag) indicating whether a transform skip flag is present in a transform unit is additionally signaled, which accordingly further reduces bit efficiency.

As described above, when the enable flag (e.g., pic _ scc _ tool _ enabled _ flag) proposed in the present invention is signaled at a high level, the presence flag may be skipped from being signaled and the transform skip flag may be signaled. For example, when the enable flag is equal to 1, the application flags (transform skip flag and presence flag) may not be signaled but may be implicitly set to 1. When the enable flag is equal to 0, the application flag may be decoded from a low level of the bitstream. As another example, when the enable flag is equal to 1, the application flag may be decoded from a low level of the bitstream. When the enable flag is equal to 0, the application flag may not be signaled and may be implicitly set to 0. Accordingly, signaling the application flag may be skipped, thereby improving bit efficiency.

Coding tool configured to selectively apply transform kernels

When applying transform techniques to residual samples, a DCT-II transform kernel (transform type) is typically applied to the residual samples. However, in order to apply a more suitable transform technique according to various characteristics of the residual samples, an encoding tool configured to selectively apply one or two best transform kernels among several transform kernels to the residual samples may be performed.

The transform kernels that can be used in such coding tools are shown in table 1.

TABLE 1

Syntax elements of the coding tool configured to selectively apply the transform kernel may be encoded and signaled from the video encoding apparatus to the video decoding apparatus through a block level. When the coding tool is applied, syntax elements (horizontal flag and vertical flag) for selecting a transform core in the horizontal direction and a transform core in the vertical direction may be signaled. The transformation core applied to the horizontal direction and the transformation core applied to the vertical direction may be differently selected by the horizontal flag and the vertical flag. Table 2 shows the mapping between the application flag, the horizontal flag, and the vertical flag.

TABLE 2

As described above, when the enable flag proposed in the present invention is signaled at a high level, if the transform technique is skipped, the signaling of the application flag, the horizontal flag, and the vertical flag may be skipped. This is because, when the enable flag is equal to 1, there is a high probability that the transform technique is skipped in the relevant image.

For example, when the enable flag is equal to 1, the application flag may not be signaled, but may be implicitly set to 1. When the enable flag is equal to 0, the application flag may be decoded from a low level of the bitstream. As another example, when the enable flag is equal to 1, the application flag may be decoded from a low level of the bitstream. When the enable flag is equal to 0, the application flag may not be signaled and may be implicitly set to 0. In both examples, signaling the horizontal flag and the vertical flag may be skipped when the application flag is equal to 0, but may be signaled when the application flag is equal to 1. Accordingly, signaling the application flag, the horizontal flag, and the vertical flag may be skipped, thereby improving bit efficiency.

Some exemplary embodiments of the present invention may further determine whether to skip the transformation of DUAL TREE CHROMA. Generally, in the case of DUAL _ TREE _ CHROMA, no transform skipping is applied. In the present invention, on the other hand, tu _ cbf _ cb or tu _ cbf _ cr as a cbf flag for a chrominance component may be checked. Then, when there are chroma residual samples and the size of the transform block does not exceed the maximum size that allows the transform skip to be applied, the transform skip method applied to the luma component may be applied to the chroma component.

Coding tool configured to obtain prediction information about a current block from another block located in a current image

In the related art, a syntax element indicating whether an encoding tool is enabled is encoded and signaled at an SPS level, and when the encoding tool is enabled, a syntax element indicating whether a current block is encoded by the encoding tool is signaled at a block level (application flag).

With the conventional method as described above, the application flag is signaled for each block, and the bit efficiency is accordingly reduced. As described above, in the case where the high level execution signals whether the SCC is for a high level unit including the current block (pic _ SCC _ tool _ enabled _ flag), when the SCC is not applied, it is not necessary to signal the application of the flag for each block, and thus the bit efficiency can be improved.

For example, when the enable flag is equal to 1, the application flag may not be signaled, but may be implicitly set to 1. When the enable flag is equal to 0, the application flag may be decoded from a low level of the bitstream. As another example, when the enable flag is equal to 1, the application flag may be decoded from a low level of the bitstream. When the enable flag is equal to 0, the application flag may not be signaled and may be implicitly set to 0.

According to an exemplary embodiment, the enable flag may include an enable flag defined in a slice header level and an enable flag (pic _ scc _ tool _ enabled _ flag) defined at a level relatively higher than the slice header level. When the enable flag defined at the higher level is equal to 0, the enable flag defined at the slice header level may be encoded, signaled, and decoded by the video decoding apparatus. In particular, when the enable flag defined at the slice header level is not present in the bitstream, the signaling and decoding of the application flag may be skipped.

First coding tool

The first coding tool represents a coding tool configured to map prediction samples and scaled residual samples using a piecewise linear model. The first coding tool may be additionally applied to each block prior to in-loop filtering. For video with screen content characteristics, the first encoding tool may exhibit high compression efficiency.

Fig. 9 shows an exemplary block diagram illustrating a first encoding tool. The first encoding tool may include two main operations. One operation is to map luma prediction samples in-loop based on an adaptive piecewise linear model, and the other operation is to scale chroma residual samples according to the values of luma reconstructed samples. In fig. 9, the inverse quantization/inverse transform, the luminance sample reconstruction, and the intra prediction of luminance are processed in the map domain. The first coding tool is applied to chroma residual sample scaling, mapping, and luma inter prediction. The remaining blocks are functions that are performed in the original unmapped domain.

The mapping of luma prediction samples may include an operation of adjusting a dynamic range to improve compression efficiency by reallocating codewords for luma prediction samples, and an operation of inversely mapping luma reconstructed samples in a mapped domain to an unmapped domain. In particular, luma reconstructed samples may be derived by summing the mapped luma prediction samples and luma residual samples and signaled in the mapped domain by the video coding apparatus.

Whether to enable the first encoding tool may be determined based on enable flags defined at two or more levels in the bitstream. One of the enable flags may be signaled at a higher level than the other enable flag. The enable flag signaled at a higher level is referred to as a "first enable flag", and the enable flag signaled at a lower level is referred to as a "second enable flag".

A first enable flag equal to 1 may indicate that the first coding tool is enabled at a higher level, and a first enable flag equal to 0 may indicate that the first coding tool is not enabled at a higher level. When the first enable flag is equal to 1, a second enable flag may be encoded and signaled. A second enable flag equal to 1 indicates that the first coding tool is enabled at a lower level, and a second enable flag equal to 0 may indicate that the first coding tool is not enabled at a lower level.

Whether to apply the first encoding tool may be determined based on an application flag (first application flag) defined at a lower level than a level of an enable flag in the bitstream. When the second enable flag is equal to 1, the first application flag may be signaled or may be set to a predetermined value (or 1). In another example, the first application flag may be signaled when the second enable flag is equal to 0, or may be set to a predetermined value (or 0).

In order to determine whether to apply the first coding tool, a method of encoding, signaling, and decoding the first application flag for each lower level may be undesirable in terms of bit efficiency. The present invention is directed to improving the bit efficiency of a first coding tool by defining an enable flag (second enable flag) indicating whether the first coding tool is enabled at a high level of a bitstream.

Fig. 10 illustrates an exemplary block diagram of a video decoding apparatus capable of controlling a first encoding tool using a second enable flag, and fig. 11 illustrates an exemplary method of controlling the first encoding tool using the second enable flag. As shown in fig. 10, the video decoding apparatus may include an entropy decoder 410, an acquisition unit 510, an execution unit 520, and a derivation unit 1010. The execution unit 520 may comprise a section determination unit 522 and a sample derivation unit 524. The derivation unit 1010 may include a first derivation unit 1012, a second derivation unit 1014, and a factor derivation unit 1016.

The video encoding apparatus may be configured to determine whether the first encoding tool is enabled for a sequence including the current block, and set a value of the first enable flag according to a result of the determination. The first enable flag may be encoded and signaled to the video decoding device in the SPS level of the bitstream. In addition, when the first enable flag is equal to 1, the video encoding apparatus may be configured to determine whether the first encoding tool is enabled for an image including the current block, and set a value of the second enable flag according to a result of the determination. The second enable flag may be encoded and signaled to the video decoding apparatus in the picture level (including the picture header) of the bitstream.

In addition, when the second enable flag is equal to 1, the video encoding apparatus may be configured to determine whether to apply the first encoding tool to a slice including the current block, and set a value of the first application flag according to a determination result. The first application flag may be encoded and signaled to the video decoding apparatus by a slice level (including a slice header) of the bitstream.

The entropy decoder 410 may be configured to decode a first enable flag from an SPS level of the bitstream (S1110), and determine whether to enable the first encoding tool based on a value of the first enable flag (S1120). In addition, when the first enable flag is equal to 1, the entropy decoder 410 may be configured to decode a second enable flag from an image level of the bitstream (S1130), and determine whether the first encoding tool is enabled based on a value of the second enable flag (S1140).

When the second enable flag is equal to 1, the obtaining unit 510 may be configured to obtain a value of the first application flag by decoding the first application flag from a slice level of the bitstream (S1160). In this example, when the second enable flag is equal to 0, the first application flag may not be signaled and may be implicitly set to 0. In some example embodiments, when the second enable flag is equal to 0, the obtaining unit 510 may be configured to obtain a value of the first application flag by decoding the first application flag from a slice level of the bitstream (S1160). In this example, when the second enable flag is equal to 1, the first application flag may not be signaled, but may be implicitly set to 1.

The execution unit 520 may be configured to execute the first encoding tool when the first application flag is equal to 1 (S1170), and not execute the first encoding tool when the first application flag is equal to 0 (S1180). When the first enable flag is equal to 0 in S1120 or the second enable flag is equal to 0 in S1140, the first encoding tool is not executed (S1180).

Hereinafter, detailed operations performed in the first encoding tool will be described.

Luma prediction sample mapping

The piecewise linear model may represent a relationship between the dynamic range of the input signal and the dynamic range of the output signal. The dynamic range of the input signal is divided into a preset number of equal sections and a piecewise linear model for each section is represented based on the number of codewords assigned to each section. For example, when the bit depth of the input picture is 10 bits and the preset number of sections is 16, 64 codewords may be allocated to each of the 16 sections in general.

The mapping of the luma prediction samples is a method by which a dynamic range of inter-predicted luma prediction samples is divided into a preset number of sections, and codewords of the luma prediction samples are reallocated by applying a piecewise linear model corresponding to the sections to which the luma prediction samples belong, thereby mapping the luma prediction samples in a domain (original domain) before the mapping to a mapping domain. The luma prediction samples mapped from luma prediction samples based on the piecewise linear model are "mapped luma prediction samples". Since intra prediction is performed within the map domain, the mapping is not applied to blocks encoded in intra mode.

The video encoding apparatus is configured to perform mapping on the luma prediction samples based on the piecewise linear model (generate mapped luma prediction samples), and encode and signal information about a region (a second section) to which the mapped luma prediction samples may belong to the video decoding apparatus.

The information on the second section may include "index information (section index information) on a section that can be used for the first coding tool in the second section" and "codeword number information". The section index information may include index information on a section having a smallest index and index information on a section having a largest index among sections that can be used for the first encoding tool. The codeword number information may include information (absolute value and sign of difference) indicating a difference (codeword delta) between the number of codewords allocated to or included in the original section (first section) and the number of codewords allocated to each of the second sections that can be used for the first coding tool. Hereinafter, the absolute value of the difference will be referred to as "codeword absolute value", and the sign of the difference will be referred to as "codeword sign".

As shown in fig. 12, the entropy decoder 410 may be configured to decode information about the second section from the bitstream (S1210). The derivation unit 1010 may be configured to derive a piecewise linear model of each of the first sections based on the information about the second sections and the bit depth of the picture (i.e., a mapping relationship between the first sections and the second sections may be derived) (S1220). In particular, the piecewise linear model, that is to say the mapping relation, may comprise a "scaling factor" which represents a scaling relation between the number of codewords (number of bits) allocated to each of the first sections and the number of codewords (number of bits) allocated to each of the second sections. The scaling factor may represent a relationship between a length of each of the first sections and a length of each of the second sections.

The section determining unit 512 may be configured to determine a section (a first target section) to which a luma prediction sample belongs in the first section, and the sample deriving unit 514 may be configured to derive a mapped luma prediction sample by mapping the luma prediction sample to a mapped section using a piecewise linear model corresponding to the first target section (S1240). In other words, in S1240, the sample derivation unit 514 is configured to derive the mapped luma prediction samples by applying the mapping relationship between the first target section and the mapped section to the luma prediction samples. In particular, the mapping zone may be a zone of the second zone corresponding to the first target zone. The correspondence between each of the first sections and each of the second sections may be an index assigned to each section or a position of each section in order.

Fig. 13 shows an example of a specific method of deriving a scaling factor included in the mapping relationship. The first derivation unit 1012 may be configured to derive the number of codewords allocated to each of the first sections based on the bit depth (S1310). As described above, the same number of codewords may be allocated to or included in each of the first sections. In other words, the first sections may have the same size.

The second deriving unit 1014 may be configured to derive the number of codewords allocated to each of the second sections based on the number of codewords allocated to each of the first sections and information on the second sections (S1320). Specifically, a codeword delta corresponding to the second section may be derived by applying a codeword number to an absolute value of the codeword, and the number of codewords allocated to the second section may be derived by summing the number of codewords allocated to each of the first sections with the codeword delta. The factor derivation unit 1016 may be configured to derive a scaling factor based on the number of codewords allocated to each of the first sections and the number of codewords allocated to each of the second sections (S1330).

Inverse mapping of luma reconstructed samples

Inverse mapping is the operation of inverse mapping of the mapped luma reconstructed samples to the unmapped domain. The mapped luma reconstructed samples may be derived by summing the mapped luma prediction samples and luma residual samples, which are signaled in the mapped domain by the video coding apparatus.

As shown in fig. 14, the deriving unit 1010 may be configured to derive an inverse piecewise linear model of each of the second sections based on the bit depth and the information on the second sections (i.e., the deriving unit 1010 may derive an inverse mapping relationship between the first section and the second section) (S1410). For example, the deriving unit 1010 may be configured to derive the inverse scaling factor based on the number of codewords allocated to each of the first sections and the number of codewords allocated to each of the second sections. In particular, the inverse scaling factor is a parameter representing an inverse scaling relationship between each of the first sections and each of the second sections, and may be used to define an inverse piecewise linear model or inverse mapping relationship.

The inverse piecewise linear model may have an inverse relationship to the piecewise linear model used to map the luma prediction samples. The inverse mapping relationship may have an inverse relationship to the mapping relationship used to map the luma prediction samples. The section determining unit 522 may be configured to determine a second target section (S1420). The second target section may be a section to which the luma reconstructed samples in the second section belong (a section to which luma reconstructed samples before inverse mapping belong).

The sample derivation unit 524 may be configured to inversely map the luma reconstructed samples to the inversely mapped section using an inverse piecewise linear model corresponding to the second target section, thereby deriving or generating inversely mapped luma reconstructed samples (S1430). In other words, the sample derivation unit 524 may be configured to apply an inverse mapping relationship between the second target section and the inverse mapped section to the luma reconstructed samples (inverse mapping the luma reconstructed samples) to derive inverse mapped luma reconstructed samples. In particular, the inverse mapped section may be a section to which the inversely mapped luma reconstructed samples in the first section belong or a section corresponding to the second target section. The correspondence between each of the first sections and each of the second sections may be an index assigned to each section or a position of each section in order.

Chroma residual sample scaling

The scaling of the chroma residual samples refers to a method of scaling the chroma residual samples according to the values of the luma reconstructed samples. When the first application flag is equal to 1 or the chroma flag is equal to 1, scaling of chroma residual samples may be performed, which will be described below. In addition, when the size of the chroma residual samples (chroma residual block) exceeds a preset value (e.g., 4), scaling of the chroma residual samples may be performed.

The video encoding apparatus may be configured to determine whether to perform scaling on chroma residual samples of the current block, and set a value of a chroma flag according to a result of the determination. The chroma flag may indicate whether scaling of chroma residual samples is enabled and/or applied to a current picture containing the current block. A chroma flag equal to 1 indicates that scaling of chroma residual samples is enabled and/or applied, while a chroma flag equal to 0 may indicate that scaling of chroma residual samples is not enabled and/or applied. When the second enable flag is equal to 1, the chroma flag may be encoded at the picture level or slice level of the bitstream and signaled to the video decoding apparatus.

As shown in fig. 15, when the second enable flag is equal to 1 (S1510), the entropy decoder 410 may be configured to decode a chroma flag from an image level of the bitstream (S1520). When the application flag is equal to 1(S1530) and the chroma flag is equal to 1(S1540), the performing unit 520 may be configured to derive the scaled chroma residual samples by performing scaling on the chroma residual samples (S1550). The scaling of the chroma residual samples may be performed using chroma scaling information decoded from the bitstream.

Chroma scaling information, which is information for compensating for the correlation between the luma component and the chroma component, is arranged to adjust the dynamic range of the mapped second section for luma prediction samples to be suitable for the scaling of chroma residual samples. When the second enable flag is equal to 0 at operation S1510, the application flag is equal to 0 at S1530, or the chroma flag is equal to 0 at S1540, scaling of chroma residual samples may be skipped (S1560).

The video encoding device may be configured to perform scaling on the chroma residual samples and derive a difference between a number of codewords included in each of the second sections before the scaling and a number of codewords included in each of the second sections after the scaling. In addition, the video encoding apparatus may be configured to encode and signal information (chroma scaling information) on the derivation result (difference value) to the video decoding apparatus. The chroma scaling information may include a delta chroma codeword size (which is the amount of the difference) and a delta chroma codeword sign (which is the sign of the difference).

The entropy decoder 410 may be configured to decode chroma scaling information from the bitstream (S1610). The derivation unit 1010 may be configured to derive a scaling relationship based on the bit depth, the information on the second section, and the chroma scaling information (S1620). For example, the deriving unit 1010 may be configured to derive a delta chroma codeword (a difference between the number of codewords included in each of the second sections and the number of codewords included in each of the third sections) based on the delta chroma codeword size and the delta chroma codeword symbols, and adjust a dynamic range of the second sections by summing the number of codewords included in each of the second sections and the delta chroma codeword, thereby deriving a chroma scaling relationship between the second sections.

The chroma scaling relationship derived by the derivation unit 1010 may have an inverse relationship to the chroma scaling relationship used by the video coding device to scale the chroma residual samples. Accordingly, the chroma scaling relationship derived by the deriving unit 1010 may be an inverse chroma scaling relationship. The section determining unit 522 may be configured to derive an average value of the luma reconstructed sample located at the left side of the chroma residual sample (current block) and the luma reconstructed sample located above the chroma residual sample (current block) (S1630), and determine a section to which the average value belongs in the second section (S1640). The sample derivation unit 524 may be configured to generate or derive scaled chroma residual samples by scaling the chroma residual samples based on a chroma scaling relationship corresponding to a section to which the average value belongs (S1650).

Second coding tool

The second encoding tool is an encoding tool configured to apply differential encoding to the residual samples to improve compression performance of the residual samples to which the transform skip mode is applied. When a Transform Unit (TU) is encoded in a transform skip mode for lossy compression, differential encoding techniques may be applied to residual samples after intra prediction and inter prediction. In the transform skip mode, the differential encoding technique may provide more improved compression performance by reducing the total energy of the residual components used for entropy encoding.

In horizontal differential encoding, a current sample is predicted using a residual sample of a left-side column closest in the horizontal direction among samples encoded by a video encoding apparatus. After applying horizontal differential encoding to residual samples of an M × N block, the residual samples may be represented by equation 1 (where 0 ≦ i < M, i being a natural number). In other words, the residual sample at location (i, j) is modified by subtracting the residual sample at location (i, j-1) from the residual sample at location (i, j).

Equation 1

In equation 1, (i, j) denotes the ith row and jth column, Q (r)_i，j) Represents the residual sample at position (i, j), and

representing modified residual samples.

As shown in equation 1, in horizontal differential encoding, a video encoding apparatus entropy-encodes modified residual samples and then transmits them to a video decoding apparatus. This sample is reconstructed and inverted for prediction of the residual sample in the next column. The horizontal prediction operation may be sequentially performed on all columns of the block.

In vertical differential encoding, a current sample is predicted using a residual sample of an upper row closest in the vertical direction among residual samples encoded by a video encoding apparatus. After vertical differential encoding is applied to residual samples of an mxn block, the residual samples can be represented by equation 2 (where 0 ≦ j < N, j being a natural number).

Equation 2

As shown in equation 2, in the vertical differential encoding, the video encoding apparatus entropy-encodes the modified residual samples and then transmits them to the video decoding apparatus. This sample is reconstructed and inverted for prediction of residual samples in the next row. The vertical prediction operation may be sequentially performed on all rows of the block.

When applying horizontal differential encoding, the video decoding apparatus is configured to reconstruct residual samples as shown in equation 3. In other words, the residual samples in the residual block reconstructed from the bitstream by the video decoding apparatus are modified according to horizontal differential encoding. The target residual sample to be modified in the reconstructed residual block is modified in such a way that a left-hand residual sample located to the left of the target residual sample in the same row as the target residual sample is added to the target residual sample.

Equation 3

The video decoding apparatus may be configured to reconstruct the residual samples of the j-th column by sequentially adding the reconstructed residual samples. The horizontal reconstruction operation may be sequentially performed on all columns of the block.

When vertical differential encoding is applied, the video decoding apparatus reconstructs residual samples as shown in equation 4. In other words, the residual samples in the residual block reconstructed from the bitstream by the video decoding apparatus are modified according to vertical differential coding. The target residual sample to be modified in the reconstructed residual block is modified in such a way that the left residual sample located above the target residual sample of the same column as the target residual sample is added to the target residual sample.

Equation 4

The video decoding apparatus may be configured to reconstruct the residual samples of the ith row by sequentially adding the reconstructed residual samples. The vertical reconstruction operation may be sequentially performed on all rows of the block. Whether to execute the second encoding tool may be determined by a second application flag defined at a block level. However, with conventional methods, bit efficiency may be reduced because the second application flag is encoded, signaled, and decoded on a block-by-block basis.

As described above, since the second encoding tool is applied to transform skipped residual samples, there is a high possibility that it is applied to a video having a screen content characteristic. Therefore, around these features, the present invention aims to define an enable flag indicating whether the second encoding tool is enabled (or whether the video corresponds to screen content) at a higher level than the block level, and to control whether the second application flag is signaled according to the value of the enable flag to improve bit efficiency.

The video encoding apparatus may be configured to determine whether the second encoding tool is enabled (whether the video corresponds to screen content), and set a value of the enable flag according to a result of the determination. In addition, the video encoding device may be configured to encode the enable flag and signal it to the video decoding device at a high level of the bitstream (e.g., SPS level). The video decoding apparatus may be configured to decode an enable flag from a high level of the bitstream (S1710), and determine whether a value of the decoded enable flag is equal to 1 or 0 (S1720).

The video encoding apparatus may be configured to determine whether to apply a second encoding tool to the current block, and determine whether to signal a second application flag based on a value of the enable flag. For example, the video encoding apparatus may be configured to encode and signal the second application flag to the video decoding apparatus when the enable flag is equal to 1, and not signal the second application flag when the enable flag is equal to 0. In particular, the video decoding apparatus may be configured to decode the second application flag from the bitstream due to signaling the second application flag when the enable flag is equal to 1 (S1730), and not decode the second application flag due to not signaling the second application flag when the enable flag is equal to 0. When the second application flag is not decoded, the second application flag may be set to a value (0) that implicitly indicates that the second coding tool is not applied.

As another example, the video encoding apparatus may not signal the second application flag when the enable flag is equal to 1, and may be configured to signal the second application flag when the enable flag is equal to 0. In particular, the video decoding apparatus may be configured to decode the second application flag from the bitstream due to signaling the second application flag when the enable flag is equal to 0, and not decode the second application flag due to not signaling the second application flag when the enable flag is equal to 1. When the second application flag is not decoded, the second application flag may be set to a value (1) that implicitly indicates that the second coding tool is applied.

The video decoding apparatus may be configured to determine that the value of the decoded or set second application flag is equal to 1 or equal to 0(S1740), execute the second encoding tool when the second application flag is equal to 1 (S1750), and not execute the second encoding tool when the second application flag is equal to 0 (S1760). According to an example embodiment, the enable flag may include pic _ scc _ tool _ enabled _ flag indicating whether the picture corresponds to the screen content, and a flag (slice flag) indicating whether the second encoding tool is enabled on a slice-by-slice basis.

When the current picture corresponds to screen content (pic _ scc _ tool _ enabled _ flag is equal to 1), the slice flag may be encoded and signaled from the video encoding device to the picture decoding device and decoded by the video decoding device. In particular, whether to signal and decode the second application flag may be determined according to the value of the slice flag.

For example, the video encoding apparatus may be configured to encode and signal the second application flag to the video decoding apparatus when the slice flag is equal to 1, and not signal the second application flag when the slice flag is equal to 0. In particular, the video decoding apparatus may be configured to decode the second application flag from the bitstream due to signaling the second application flag when the slice flag is equal to 1 (S1730), and not decode the second application flag due to not signaling the second application flag when the slice flag is equal to 0. When the second application flag is not decoded, the value of the second application flag may be set to a value (0) implicitly indicating that the second coding tool is not applied.

As another example, the video encoding apparatus may not signal the second application flag when the slice flag is equal to 1, and may signal the second application flag when the slice flag is equal to 0. In particular, the video decoding apparatus may be configured to decode the second application flag from the bitstream due to signaling the second application flag when the slice flag is equal to 0, and not decode the second application flag due to not signaling the second application flag when the slice flag is equal to 1. When the second application flag is not decoded, the second application flag may be set to a value (1) that implicitly indicates that the second coding tool is applied.

Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications and changes are possible without departing from the spirit and scope of the invention. For the sake of brevity and clarity, exemplary embodiments have been described. Accordingly, it should be understood by those of ordinary skill that the scope of the exemplary embodiments is not limited by the exemplary embodiments explicitly described above, but is included in the claims and their equivalents.

Cross Reference to Related Applications

The present application claims the priority of korean patent application No.10-2019-0074231, which was filed on day 21/6/2019, korean patent application No.10-2019-0079652, which was filed on day 2/7/2019, and korean patent application No.10-2020-0075560, which is filed on day 22/6/2020, the entire contents of which are incorporated herein by reference.

Claims

1. A video decoding method, comprising:

decoding, from a high level of the bitstream, an enable flag indicating whether one or more encoding tools are enabled, the one or more encoding tools including a first encoding tool configured to encode sample values with a luma component mapping based on a piecewise-linear model;

obtaining a value of an application flag according to a value of an enable flag by setting an application flag indicating whether to apply one or more coding tools to a predetermined value or by decoding the application flag from a low level of the bitstream, the application flag including a first application flag indicating whether to apply a first coding tool; and

when the value of the application flag is a value indicating that one or more coding tools are applied, the one or more coding tools are executed,

wherein, when executing the first encoding tool according to the value of the first application flag, executing the one or more encoding tools comprises:

generating mapped luma prediction samples from the luma prediction samples based on a piecewise-linear model corresponding to the luma prediction samples, and generating luma reconstructed samples by adding luma residual samples reconstructed from the bitstream to the mapped luma prediction samples; and

the luminance reconstructed samples are inverse mapped using an inverse piecewise linear model having an inverse relationship to the piecewise linear model.

2. The method of claim 1, wherein obtaining comprises:

when the enable flag indicates that the one or more encoding tools are enabled, setting the application flag to a value indicating that the one or more encoding tools are applied; and

when the enable flag indicates that the one or more encoding tools are not enabled, the application flag is decoded from a lower level of the bitstream.

3. The method of claim 1, wherein obtaining comprises:

decoding the application flag from a low level of the bitstream when the enable flag indicates that one or more encoding tools are enabled; and

when the enable flag indicates that the one or more encoding tools are not enabled, the apply flag is set to a value indicating that the one or more encoding tools are not applied.

4. The method of claim 1, wherein enabling a flag comprises:

a first enable flag decoded from a sequence parameter set level of the bitstream; and

a second enabling flag, which is decoded from an image level of the bitstream when the first enabling flag indicates that the first encoding tool is enabled,

wherein the first application flag is decoded from a slice level of the bitstream when the second enable flag indicates that the first encoding tool is enabled.

5. The method of claim 4, further comprising:

decoding information on a second section to which the mapped luma prediction samples belong from the bitstream; and

deriving piecewise linear models respectively corresponding to the first sections to which luma prediction samples are allowed to belong based on a bit depth of a current picture including the current block and information on the second sections,

wherein the mapped luma prediction samples are derived by determining a first target section in the first section to which the luma prediction samples belong, and mapping the luma prediction samples to a mapped section using a piecewise linear model corresponding to the first target section, the mapped section being a section of the second section corresponding to the first target section.

6. The method of claim 5, wherein each of the piecewise linear models includes a scaling factor indicative of a scaling relationship between a number of codewords assigned to each of the first sections and a number of codewords assigned to each of the second sections, wherein deriving the piecewise linear model comprises:

deriving a number of codewords allocated to each of the first sections based on the bit depth, and deriving a number of codewords allocated to each of the second sections based on the number of codewords allocated to each of the first sections and information on the second sections; and

the scaling factor is derived based on the number of codewords assigned to each of the first sections and the number of codewords assigned to each of the second sections.

7. The method of claim 5, further comprising:

deriving inverse piecewise linear models respectively corresponding to the second sections based on the bit depths and the information about the second sections,

wherein the reverse mapping comprises:

determining a second target section to which the luma reconstructed samples belong in the second section; and

the luma reconstructed samples are inversely mapped to an inversely mapped section, which is a section of the first section corresponding to the second target section, using an inverse piecewise linear model corresponding to the second target section to derive inversely mapped luma reconstructed samples.

8. The method of claim 5, further comprising:

decoding, from an image level of the bitstream, a chroma flag indicating whether scaling of chroma residual samples is enabled when the second enable flag indicates that the first encoding tool is enabled; and

the chroma residual samples of the current block are scaled based on chroma scaling information decoded from the bitstream according to the value of the chroma flag.

9. The method of claim 8, wherein scaling chroma residual samples comprises:

deriving chroma scaling relationships respectively corresponding to the second sections based on the bit depth, the information about the second sections, and the chroma scaling information;

determining a section in which the average values of the luminance reconstructed samples located above and to the left of the current block are located in the second section; and

the chroma residual samples are scaled based on a chroma scaling relationship corresponding to the segment to which the average belongs.

10. The method of claim 1, wherein the encoding tool further comprises:

a second encoding tool configured to apply differential encoding to residual samples of the current block,

wherein the enable flag is decoded from a sequence parameter set level of the bitstream,

wherein, when the enable flag indicates that the second encoding tool is enabled, the application flag includes a second application flag decoded from a block level of the bitstream, the second application flag indicating whether the second encoding tool is applied.

11. A video decoding apparatus, comprising:

an entropy decoder configured to decode, from a high level of a bitstream, an enable flag indicating whether one or more coding tools are enabled, the one or more coding tools including a first coding tool configured to code sample values with a luma component mapping based on a piecewise linear model;

an acquisition unit configured to acquire a value of an application flag indicating whether to apply one or more coding tools by setting the application flag to a predetermined value or decoding the application flag from a lower level of a bitstream according to a value of an enable flag, wherein the application flag includes a first application flag indicating whether to apply a first coding tool; and

an execution unit configured to execute the one or more encoding tools when the value of the application flag is a value indicating that the one or more encoding tools are applied,

wherein, when executing the first encoding tool according to the value of the first application flag, the execution unit is configured to:

generating mapped luma prediction samples from the luma prediction samples based on a piecewise linear model corresponding to the luma prediction samples;

generating luma reconstructed samples by adding luma residual samples reconstructed from the bitstream to the mapped luma prediction samples; and

12. The apparatus of claim 11, wherein the obtaining unit is configured to:

13. The apparatus of claim 11, wherein the obtaining unit is configured to:

when the enable flag indicates that one or more coding tools are not enabled, the apply flag is set to a value indicating that coding tools are applied.

14. The apparatus of claim 11, wherein the enable flag comprises:

15. The apparatus of claim 14, further comprising:

a deriving unit configured to derive piecewise linear models respectively corresponding to first sections to which luma prediction samples are allowed to belong based on a bit depth of a current picture including the current block and information on a second section,

wherein the execution unit includes:

a section determination unit configured to determine a first target section to which a luma prediction sample belongs in a first section; and

a sample derivation unit configured to derive a mapped luma prediction sample by mapping the luma prediction sample to a mapped section using a piecewise linear model corresponding to a first target section, the mapped section being a section of a second section corresponding to the first target section.

16. The apparatus of claim 15, wherein each of the piecewise linear models includes a scaling factor indicative of a scaling relationship between a number of codewords assigned to each of the first sections and a number of codewords assigned to each of the second sections,

wherein the derivation unit includes:

a first derivation unit configured to derive a number of codewords allocated to each of the first sections based on a bit depth;

a second derivation unit configured to derive a number of codewords allocated to each of the second sections based on the number of codewords allocated to each of the first sections and information on the second sections; and

a factor derivation unit configured to derive a scaling factor based on the number of codewords allocated to each of the first sections and the number of codewords allocated to each of the second sections.

17. The apparatus according to claim 15, wherein the deriving unit is configured to derive inverse piecewise linear models respectively corresponding to the second sections based on the bit depth and the information on the second sections,

wherein the section determining unit is configured to determine a second target section to which a sample luminance-reconstructed in the second section belongs; and

wherein the sample derivation unit is configured to derive the inverse mapped luma reconstructed samples by inverse mapping the luma reconstructed samples to an inverse mapped section by using an inverse piecewise linear model corresponding to the second target section, the inverse mapped section being a section of the first section corresponding to the second target section.

18. The apparatus of claim 15, wherein the entropy decoder is configured to decode a chroma flag indicating whether scaling of chroma residual samples is enabled from an image level of the bitstream when the second enable flag indicates that the first encoding tool is enabled, and wherein the execution unit is configured to scale the chroma residual samples of the current block based on chroma scaling information decoded from the bitstream according to a value of the chroma flag.

19. The apparatus of claim 18, wherein the deriving unit is configured to derive chroma scaling relationships respectively corresponding to the second sections based on the bit depth, the information on the second sections, and the chroma scaling information,

wherein the section determining unit is configured to determine a section in which an average value of luminance reconstructed samples located above and to the left of the current block in the second section is located, an

Wherein the sample derivation unit is configured to scale chroma residual samples based on a chroma scaling relationship corresponding to a section to which the average value belongs.

20. The apparatus of claim 11, wherein the encoding tools further comprise second encoding tools configured to apply differential encoding to residual samples of a current block,