WO2021137445A1

WO2021137445A1 - Method for determining transform kernels for video signal processing and apparatus therefor

Info

Publication number: WO2021137445A1
Application number: PCT/KR2020/017198
Authority: WO
Inventors: 이범식
Original assignee: (주)휴맥스; 조선대학교 산학협력단
Priority date: 2019-12-31
Filing date: 2020-11-27
Publication date: 2021-07-08

Abstract

A video signal decoding apparatus comprises a processor. The processor: determines a transform kernel for width direction transform of a current block and a transform kernel for height direction transform of the current block on the basis of preset conditions; and acquires a residual signal for the current block by using the transform kernels, wherein the preset conditions are conditions based on whether or not an intra subblock partitioning (ISP) prediction method is applied to the current block and whether or not low frequency non-separable transform (LFNST) is applied to the current block.

Description

Method for determining transform kernel for video signal processing and apparatus therefor

The present invention relates to a video signal processing method and an apparatus therefor, and more particularly, to a method for determining a transform kernel for video signal processing and an apparatus therefor.

Compression coding refers to a series of signal processing techniques for transmitting digitized information through a communication line or storing it in a form suitable for a storage medium. Targets of compression encoding include audio, video, and text. In particular, a technique for performing compression encoding on an image is called video compression. Compression encoding of a video signal is performed by removing redundant information in consideration of spatial correlation, temporal correlation, stochastic correlation, and the like. However, due to the recent development of various media and data transmission media, a method and apparatus for processing a video signal with higher efficiency are required.

An object of the present specification is to increase coding efficiency of a video signal by providing a video signal processing method and an apparatus therefor.

The present specification provides a video signal processing apparatus.

Specifically, a video signal decoding apparatus includes a processor, wherein the processor determines a transform kernel for horizontal transformation of the current block and a transformation kernel for vertical transformation of the current block based on a preset condition. and obtaining a residual signal for the current block using the transform kernel, wherein the preset condition includes whether an intra subblock partitioning (ISP) prediction method is applied to the current block and whether the current block It is characterized in that it is a condition based on whether a low frequency non-separable transform (LFNST) is applied to .

In addition, in the present specification, a video signal encoding apparatus includes a processor, wherein the processor includes a transform kernel for horizontal transformation of the current block and a transformation for vertical transformation of the current block based on a preset condition. Determining a kernel, and obtaining a transform block for the current block using the transform kernel, wherein the preset condition includes whether an intra subblock partitioning (ISP) prediction method is applied to the current block; It is characterized in that the condition is based on whether a low frequency non-separable transform (LFNST) is applied to the current block.

In addition, in the present specification, in a non-transitory computer-readable medium storing a bitstream, the bitstream is the current block of the current block based on a preset condition. determining a transformation kernel for horizontal transformation and a transformation kernel for vertical transformation of the current block; and obtaining a transform block for the current block by using the transform kernel. is encoded through an encoding method including, and the preset condition is whether an intra subblock partitioning (ISP) prediction method is applied to the current block and a low-band non-separated transform (Low Frequency) to the current block. Non-Separable Transform (LFNST) is characterized in that it is a condition based on whether or not applied.

In the present specification, when the ISP prediction method is applied and the LFNST is applied, both a transform kernel for horizontal transformation of the current block and a transform kernel for vertical transformation of the current block are DCT (Discrete Cosine) Transform) It is characterized in that it is a type 2 (DCT-2) transform kernel.

In addition, in the present specification, when the ISP prediction method is not applied and the LFNST is not applied, the transform kernel is determined based on a horizontal size and a vertical size of the current block.

Also, in the present specification, when the horizontal size of the current block is less than 4 or greater than 16, the transform kernel for the horizontal transform of the current block is a DCT type 2 (DCT-2) transform kernel, and When the horizontal size is greater than or equal to 4 and less than or equal to 16, the transformation kernel for horizontal transformation of the current block is not a DCT type 2 (DCT-2) transformation kernel.

Also, in the present specification, when the vertical size of the current block is less than 4 or greater than 16, a transform kernel for vertical transformation of the current block is a DCT type 2 (DCT-2) transform kernel, and When the vertical size is greater than or equal to 4 and less than or equal to 16, the transformation kernel for vertical transformation of the current block is not a DCT type 2 (DCT-2) transformation kernel.

Also, in the present specification, the preset condition is a condition based on a division direction of the current block when a sub-block transform (SBT) is applied to the current block, and the current block moves in a vertical direction. It is divided into two subblocks, and when SBT is applied to the left subblock among the two subblocks, the transform kernel for the horizontal direction of the left subblock is a DCT type 8 (DCT-8) transform kernel, The transform kernel for the vertical direction of the left subblock is a Discrete Sine Transform (DST) type 7 (DST-7) transform kernel.

Also, in the present specification, when SBT is applied to the right subblock among the two subblocks, the transform kernel for the horizontal direction of the left subblock and the transform kernel for the vertical direction of the left subblock are both DST type 7 (DST-7).

Also, in the present specification, when the current block is horizontally divided and composed of two subblocks, and SBT is applied to an upper subblock among the two subblocks, a transform kernel for the horizontal direction of the upper subblock is a DST type 7 (DST-7) transform kernel, and the transform kernel for the vertical direction of the upper subblock is a DCT type 8 (DCT-8) transform kernel.

Also, in the present specification, when SBT is applied to a lower subblock among the two subblocks, the transform kernel for the horizontal direction of the lower subblock and the transform kernel for the vertical direction of the lower subblock are DST type 7 ( DST-7) is characterized as a conversion kernel.

The present specification has an effect that efficient video signal processing is possible by providing a method and an apparatus for determining a transform kernel for video signal decoding/encoding.

The effects obtainable in the present specification are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present invention pertains from the description below. will be.

1 is a schematic block diagram of a video signal encoding apparatus according to an embodiment of the present invention.

2 is a schematic block diagram of a video signal decoding apparatus according to an embodiment of the present invention.

3 shows an embodiment in which a Coding Tree Unit (CTU) is divided into Coding Units (CUs) within a picture.

4 shows a method of signaling the division of a quad tree and a multi-type tree according to an embodiment of the present invention.

5 shows a general directional intra prediction method according to an embodiment of the present invention.

6 illustrates a matrix-based intra prediction method according to an embodiment of the present invention.

7 illustrates index mapping of a low frequency non-separable transform (LFNST) according to an intra prediction mode according to an embodiment of the present invention.

8 shows an MTS according to an ISP block size according to an embodiment of the present invention.

9 shows that when LFNST is not used and ISP is used according to an embodiment of the present invention, different transform kernels are applied according to block sizes.

10 illustrates a main transformation kernel according to a block size when an explicit primary transformation kernel selection is applied according to an embodiment of the present invention.

11 illustrates a transform kernel determined according to a subblock transform size and a partition type according to an embodiment of the present invention.

The terms used in this specification have been selected as currently widely used general terms as possible while considering their functions in the present invention, but these may vary depending on the intention of those skilled in the art, customs, or emergence of new technologies. Also, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in the description of the relevant invention. Therefore, it is intended to clarify that the terms used in this specification should be interpreted based on the actual meaning of the terms and the contents of the entire specification, rather than the names of simple terms.

Some terms herein may be interpreted as follows. Coding can be interpreted as encoding or decoding as the case may be. In the present specification, an apparatus for generating a video signal bitstream by performing encoding (encoding) of a video signal is referred to as an encoding apparatus or an encoder, and an apparatus for reconstructing a video signal by performing decoding (decoding) of a video signal bitstream is decoding referred to as a device or decoder. Also, in this specification, a video signal processing apparatus is used as a term that includes both an encoder and a decoder. Information is a term including all values, parameters, coefficients, elements, and the like, and the meaning may be interpreted differently in some cases, so the present invention is not limited thereto. The 'unit' is used to refer to a basic unit of image processing or a specific position of a picture, and refers to an image area including both a luma component and a chroma component. In addition, 'block' refers to an image region including a specific component among the luma component and the chroma component (ie, Cb and Cr). However, terms such as 'unit', 'block', 'partition' and 'region' may be used interchangeably according to embodiments. Also, in the present specification, a unit may be used as a concept including all of a coding unit, a prediction unit, and a transform unit. A picture indicates a field or a frame, and according to embodiments, the terms may be used interchangeably.

1 is a schematic block diagram of a video signal encoding apparatus according to an embodiment of the present invention. Referring to FIG. 1 , the encoding apparatus 100 of the present invention includes a transform unit 110 , a quantizer 115 , an inverse quantizer 120 , an inverse transform unit 125 , a filtering unit 130 , and a prediction unit 150 . ) and an entropy coding unit 160 .

The transform unit 110 converts a residual signal that is a difference between the input video signal and the prediction signal generated by the prediction unit 150 to obtain a transform coefficient value. For example, a discrete cosine transform (DCT), a discrete sine transform (DST), or a wavelet transform may be used. In the discrete cosine transform and the discrete sine transform, the transform is performed by dividing the input picture signal into blocks. In the transform, the coding efficiency may vary according to the distribution and characteristics of values in the transform region. The quantization unit 115 quantizes the transform coefficient values output from the transform unit 110 .

In order to increase coding efficiency, the picture signal is not coded as it is, but the picture is predicted using the region already coded through the prediction unit 150, and a residual value between the original picture and the prediction picture is added to the predicted picture to obtain a reconstructed picture. method is used to obtain In order to prevent mismatch between the encoder and the decoder, when the encoder performs prediction, information that is also available to the decoder should be used. To this end, the encoder performs a process of reconstructing the encoded current block. The inverse quantization unit 120 inversely quantizes the transform coefficient value, and the inverse transform unit 125 restores the residual value using the inverse quantized transform coefficient value. Meanwhile, the filtering unit 130 performs a filtering operation for improving the quality of the reconstructed picture and improving the encoding efficiency. For example, a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter may be included. The filtered picture is output or stored in a decoded picture buffer (DPB, 156) to be used as a reference picture.

The prediction unit 150 includes an intra prediction unit 152 and an inter prediction unit 154 . The intra prediction unit 152 performs intra prediction within the current picture, and the inter prediction unit 154 predicts the current picture using the reference picture stored in the decoded picture buffer 156 Inter prediction. carry out The intra prediction unit 152 performs intra prediction on reconstructed samples in the current picture, and transmits intra encoding information to the entropy coding unit 160 . The intra encoding information may include at least one of an intra prediction mode, a Most Probable Mode (MPM) flag, and an MPM index. The inter prediction unit 154 may include a motion estimation unit 154a and a motion compensation unit 154b. The motion estimator 154a obtains a motion vector value of the current region by referring to a specific region of the reconstructed reference picture. The motion estimation unit 154a transfers motion information (reference picture index, motion vector information, etc.) on the reference region to the entropy coding unit 160 . The motion compensation unit 154b performs motion compensation using the motion vector value transmitted from the motion estimation unit 154a. The inter prediction unit 154 transmits inter encoding information including motion information on the reference region to the entropy coding unit 160 .

When the above picture prediction is performed, the transform unit 110 obtains a transform coefficient value by transforming a residual value between the original picture and the predicted picture. In this case, the transformation may be performed in units of a specific block within the picture, and the size of the specific block may vary within a preset range. The quantization unit 115 quantizes the transform coefficient values generated by the transform unit 110 and transmits the quantized values to the entropy coding unit 160 .

The entropy coding unit 160 entropy-codes the quantized transform coefficients, intra-encoding information, inter-encoding information, and the like to generate a video signal bitstream. In the entropy coding unit 160 , a Variable Length Coding (VLC) scheme and an arithmetic coding scheme may be used. The variable length coding (VLC) method converts input symbols into continuous codewords, and the length of the codewords may be variable. For example, symbols that occur frequently are expressed as short codewords, and symbols that do not occur frequently are expressed as long codewords. As the variable length coding scheme, a context-based adaptive variable length coding (CAVLC) scheme may be used. Arithmetic coding converts consecutive data symbols into one prime number, and the arithmetic coding can obtain an optimal fractional bit required to represent each symbol. Context-based Adaptive Binary Arithmetic Code (CABAC) may be used as arithmetic coding.

The generated bitstream is encapsulated in a Network Abstraction Layer (NAL) unit as a basic unit. The NAL unit includes an integer number of coded coding tree units. In order to decode the bitstream in the video decoder, first, the bitstream is divided into NAL units, and then each divided NAL unit must be decoded. Meanwhile, information necessary for decoding a video signal bitstream is a higher level set such as a picture parameter set (PPS), a sequence parameter set (SPS), a video parameter set (VPS), and the like. It can be transmitted through the RBSP (Raw Byte Sequence Payload).

On the other hand, the block diagram of FIG. 1 shows the encoding apparatus 100 according to an embodiment of the present invention. Separately displayed blocks are logically separated and illustrated elements of the encoding apparatus 100 . Accordingly, the elements of the above-described encoding apparatus 100 may be mounted as one chip or a plurality of chips according to the design of the device. According to an embodiment, an operation of each element of the above-described encoding apparatus 100 may be performed by a processor (not shown).

2 is a schematic block diagram of a video signal decoding apparatus 200 according to an embodiment of the present invention. Referring to FIG. 2 , the decoding apparatus 200 of the present invention includes an entropy decoding unit 210 , an inverse quantization unit 220 , an inverse transform unit 225 , a filtering unit 230 , and a prediction unit 250 .

The entropy decoding unit 210 entropy-decodes the video signal bitstream to extract transform coefficients for each region, intra-encoding information, inter-encoding information, and the like. The inverse quantizer 220 inverse quantizes the entropy-decoded transform coefficient, and the inverse transform unit 225 restores a residual value using the inverse quantized transform coefficient. The video signal processing apparatus 200 restores the original pixel value by adding the residual value obtained by the inverse transform unit 225 with the prediction value obtained by the prediction unit 250 .

Meanwhile, the filtering unit 230 improves picture quality by filtering the picture. This may include a deblocking filter for reducing block distortion and/or an adaptive loop filter for removing distortion from the entire picture. The filtered picture is output or stored in the decoded picture buffer DPB 256 to be used as a reference picture for the next picture.

The prediction unit 250 includes an intra prediction unit 252 and an inter prediction unit 254 . The prediction unit 250 generates a prediction picture by using the encoding type decoded through the entropy decoding unit 210, transform coefficients for each region, intra/inter encoding information, and the like. In order to reconstruct a current block on which decoding is performed, a current picture including the current block or a decoded area of other pictures may be used. A picture (or tile/slice) that uses only the current picture for reconstruction, that is, only performs intra prediction, is an intra picture or an I picture (or tile/slice), and a picture that can perform both intra prediction and inter prediction (or, A tile/slice) is called an inter picture (or a tile/slice). A picture (or tile/slice) using at most one motion vector and a reference picture index to predict the sample values of each block among inter-pictures (or tile/slice) is a predictive picture or a P picture (or , tile/slice), and a picture (or tile/slice) using up to two motion vectors and a reference picture index is called a bi-predictive picture or a B picture (or tile/slice). In other words, a P picture (or tile/slice) uses at most one set of motion information to predict each block, and a B picture (or tile/slice) uses up to two sets of motion information to predict each block. use a set Here, the motion information set includes one or more motion vectors and one reference picture index.

The intra prediction unit 252 generates a prediction block by using the intra encoding information and reconstructed samples in the current picture. As described above, the intra encoding information may include at least one of an intra prediction mode, a Most Probable Mode (MPM) flag, and an MPM index. The intra prediction unit 252 predicts pixel values of the current block by using the reconstructed pixels located on the left and/or above the current block as reference pixels. According to an embodiment, the reference pixels may be pixels adjacent to a left boundary and/or pixels adjacent to an upper boundary of the current block. According to another embodiment, the reference pixels may be pixels adjacent within a preset distance from a left boundary of the current block among pixels of a neighboring block of the current block and/or pixels adjacent within a preset distance from an upper boundary of the current block. In this case, the neighboring blocks of the current block are a left (L) block, an upper (A) block, a lower left (BL) block, an above right (AR) block, or an above left (Above Left) block adjacent to the current block. AL) blocks.

The inter prediction unit 254 generates a prediction block by using the reference picture stored in the decoded picture buffer 256 and the inter encoding information. The inter encoding information may include motion information (reference picture index, motion vector information, etc.) of the current block with respect to the reference block. Inter prediction may include L0 prediction, L1 prediction, and bi-prediction. L0 prediction is prediction using one reference picture included in the L0 picture list, and L1 prediction means prediction using one reference picture included in the L1 picture list. For this, one set of motion information (eg, a motion vector and a reference picture index) may be required. In the bi-prediction method, a maximum of two reference regions may be used, and the two reference regions may exist in the same reference picture or in different pictures, respectively. That is, in the bi-prediction method, a maximum of two sets of motion information (eg, a motion vector and a reference picture index) may be used, and the two motion vectors may correspond to the same reference picture index or may correspond to different reference picture indexes. may correspond. In this case, the reference pictures may be temporally displayed (or output) before or after the current picture.

The inter prediction unit 254 may obtain the reference block of the current block by using the motion vector and the reference picture index. The reference block exists in the reference picture corresponding to the reference picture index. Also, a pixel value of a block specified by the motion vector or an interpolated value thereof may be used as a predictor of the current block. For motion prediction having sub-pel unit pixel accuracy, for example, an 8-tap interpolation filter may be used for a luma signal and a 4-tap interpolation filter may be used for a chroma signal. However, the interpolation filter for motion prediction in units of subpels is not limited thereto. As such, the inter prediction unit 254 performs motion compensation for predicting the texture of the current unit from the previously reconstructed picture using motion information.

A reconstructed video picture is generated by adding the prediction value output from the intra prediction unit 252 or the inter prediction unit 254 and the residual value output from the inverse transform unit 225 . That is, the video signal decoding apparatus 200 reconstructs the current block by using the prediction block generated by the prediction unit 250 and the residual obtained from the inverse transform unit 225 .

On the other hand, the block diagram of FIG. 2 shows the decoding apparatus 200 according to an embodiment of the present invention. Separately displayed blocks are logically separated and illustrated elements of the decoding apparatus 200 . Accordingly, the elements of the decoding apparatus 200 described above may be mounted as one chip or a plurality of chips according to the design of the device. According to an embodiment, the operation of each element of the above-described decoding apparatus 200 may be performed by a processor (not shown).

3 shows an embodiment in which a Coding Tree Unit (CTU) is divided into Coding Units (CUs) within a picture. In the coding process of a video signal, a picture may be divided into a sequence of coding tree units (CTUs). A coding tree unit consists of an NXN block of luma samples and two blocks of corresponding chroma samples. A coding tree unit may be divided into a plurality of coding units. The coding unit refers to a basic unit for processing a picture in the process of processing the video signal described above, that is, intra/inter prediction, transformation, quantization, and/or entropy coding. The size and shape of the coding unit in one picture may not be constant. The coding unit may have a square or rectangular shape. The rectangular coding unit (or rectangular block) includes a vertical coding unit (or a vertical block) and a horizontal coding unit (or a horizontal block). In the present specification, a vertical block is a block having a height greater than a width, and a horizontal block is a block having a width greater than a height. Also, in the present specification, a non-square block may refer to a rectangular block, but the present invention is not limited thereto.

Referring to FIG. 3 , the coding tree unit is first divided into a quad tree (QT) structure. That is, in the quad tree structure, one node having a size of 2NX2N may be divided into four nodes having a size of NXN. In this specification, a quad tree may also be referred to as a quaternary tree. Quad tree partitioning can be performed recursively, and not all nodes need to be partitioned to the same depth.

Meanwhile, a leaf node of the aforementioned quad tree may be further divided into a multi-type tree (MTT) structure. According to an embodiment of the present invention, in the multi-type tree structure, one node may be divided into a binary (binary) or ternary (ternary) tree structure of horizontal or vertical division. That is, in the multi-type tree structure, there are four partitioning structures: vertical binary partitioning, horizontal binary partitioning, vertical ternary partitioning, and horizontal ternary partitioning. According to an embodiment of the present invention, both a width and a height of a node in each tree structure may have a value of a power of two. For example, in a binary tree (BT) structure, a node having a size of 2NX2N may be divided into two NX2N nodes by vertical binary division and divided into two 2NXN nodes by horizontal binary division. In addition, in a ternary tree (TT) structure, a node of size 2NX2N is divided into nodes of (N/2)X2N, NX2N, and (N/2)X2N by vertical ternary division, and horizontal binary division can be divided into nodes of 2NX(N/2), 2NXN, and 2NX(N/2) by This multi-type tree splitting can be performed recursively.

A leaf node of a multi-type tree may be a coding unit. If the coding unit is not too large for the maximum transform length, the coding unit is used as a unit of prediction and transform without further splitting. Meanwhile, in the aforementioned quad tree and multi-type tree, at least one of the following parameters may be predefined or transmitted through an RBSP of a higher level set such as PPS, SPS, or VPS. 1) CTU size: root node size of quad tree, 2) min QT size (MinQtSize): min allowed QT leaf node size, 3) max BT size (MaxBtSize): max BT root node size allowed, 4) Maximum TT size (MaxTtSize): Maximum allowed TT root node size, 5) Maximum MTT depth (MaxMttDepth): Maximum allowable depth of MTT split from leaf node of QT, 6) Minimum BT size (MinBtSize): Allowed Minimum BT Leaf Node Size, 7) Minimum TT Size (MinTtSize): Minimum allowed TT leaf node size.

4 shows a method of signaling the division of a quad tree and a multi-type tree according to an embodiment of the present invention. Preset flags may be used to signal the division of the aforementioned quad tree and multi-type tree. 4 , a flag 'qt_split_flag' indicating whether to split a quad tree node, a flag 'mtt_split_flag' indicating whether to split a multi-type tree node, and a flag 'mtt_split_vertical_flag' indicating a split direction of a multi-type tree node ' or a flag 'mtt_split_binary_flag' indicating a split shape of a multi-type tree node may be used.

According to an embodiment of the present invention, a coding tree unit is a root node of a quad tree, and may be first divided into a quad tree structure. In the quad tree structure, 'qt_split_flag' is signaled for each node 'QT_node'. When the value of 'qt_split_flag' is 1, the corresponding node is divided into four square nodes, and when the value of 'qt_split_flag' is 0, the corresponding node becomes a leaf node 'QT_leaf_node' of the quad tree.

Each quad tree leaf node 'QT_leaf_node' may be further divided into a multi-type tree structure. In the multi-type tree structure, 'mtt_split_flag' is signaled for each node 'MTT_node'. When the value of 'mtt_split_flag' is 1, the corresponding node is divided into a plurality of rectangular nodes, and when the value of 'mtt_split_flag' is 0, the corresponding node becomes the leaf node 'MTT_leaf_node' of the multi-type tree. When the multi-type tree node 'MTT_node' is split into a plurality of rectangular nodes (that is, when the value of 'mtt_split_flag' is 1), 'mtt_split_vertical_flag' and 'mtt_split_binary_flag' for the node 'MTT_node' will be additionally signaled. can When the value of 'mtt_split_vertical_flag' is 1, vertical splitting of the node 'MTT_node' is indicated, and when the value of 'mtt_split_vertical_flag' is 0, horizontal splitting of the node 'MTT_node' is indicated. Also, when the value of 'mtt_split_binary_flag' is 1, the node 'MTT_node' is divided into two rectangular nodes, and when the value of 'mtt_split_binary_flag' is 0, the node 'MTT_node' is divided into three rectangular nodes.

This specification relates to multiple transform selection (MTS) applied to a residual signal generated by a matrix-based prediction method and an intra sub-partition (ISP) intra prediction method in a video codec. A distribution of the residual signal of the hole may be different for each region. For example, a distribution of values of a residual signal within a specific region may vary according to a prediction method. When transform is performed on a plurality of different transform regions using the same transform kernel, coding efficiency may vary for each transform region according to distribution and characteristics of values in the transform region. Accordingly, when a transform kernel used for transforming a specific transform block is adaptively selected from among a plurality of available transform kernels, coding efficiency may be further improved. That is, the encoder and the decoder may set a transform kernel other than the basic transform kernel to be additionally usable in transforming the video signal. In this specification, a method of adaptively selecting a transform kernel is referred to as adaptive multiple core transform (AMT) or multiple transform selection (MTS).

In addition, the matrix-based intra prediction method (MIP) in the present specification predicts the pixels of the neighboring blocks using a predefined matrix and offset values, unlike the existing prediction methods having directionality from the pixels of the neighboring blocks. It refers to an intra prediction method that generates a residual signal by obtaining a signal.

5 shows a general directional intra prediction method according to an embodiment of the present invention, and FIG. 6 shows a matrix-based intra prediction method according to an embodiment of the present invention.

Referring to FIG. 6 , the residual block is predicted using a matrix B including pixels of neighboring blocks, a predefined matrix A_k, and an offset value o_k in order to generate a prediction signal in the current block. The residual signal obtained through the prediction method through MIP has a characteristic that the directionality of the residual is weak and the signal characteristics are uniform compared to the residual signal obtained through the existing directional prediction method. For transforming the residual signal, a transform showing higher energy compression performance when the input signal is uniform like DCT-2 than using the transform kernels of DST-7 and DCT-8, which show strong compression performance for directional intra prediction. It is very advantageous to use the kernel.

A low-frequency non-separable transform (LFNST) refers to a second-order transform technique. The application of the LFNST kernel varies according to the mode of the prediction signal in the screen. LFNST is a secondary transformation kernel having a higher compression performance by applying a residual signal obtained through a prediction method to a previously defined secondary transformation kernel with respect to a low-frequency region of transform coefficients obtained through peripheral rings such as DCT-2 and DST-7. conversion method. The LFNST kernel is a transform kernel obtained through offline learning, and is defined and applied differently depending on the intra-screen prediction mode and the size of the block to which the secondary transform is applied. That is, indexes for LFNST kernels are defined from 0 to 3, intra prediction modes are mapped to each index, and a total of two LFNST kernels are defined and applied to each index. The LFNST index mapped according to the intra prediction mode is defined as shown in FIG. 7 and Table 1.

Sets 0 to 3 of FIG. 7 are the same as lfnstTrSetIdx 0 to 3 of Table 1.

Multiple transform selection (MTS) means using a plurality of multiple transform kernels such as DCT-2, DST-7, and DCT-8. In other words, MTS means using DCT-2 and DCT-2 transform kernels or a combination of DST-7 and DCT-8 for the horizontal and vertical directions of the transform block. The MTS may use the MTS transform kernel differently according to a method of generating the intra prediction residual. In other words, when generating an intra prediction residual using MIP, only a pair of DCT-2 and DCT-2 transforms are used as the MTS transform kernel. Also, even when LFNST secondary transformation is used, the MTS main transformation kernel uses only DCT-2 and DCT-2 transformation pairs.

Intra sub-block partitioning (ISP) refers to an intra prediction method in which intra prediction is performed by dividing blocks in a horizontal or vertical direction during intra prediction. For example, a 4x4 block is divided into two 4x2 subblocks (horizontal direction) or divided into two 2x4 subblocks (vertical direction) to perform intra prediction. In the case of intra prediction, the ISP has the effect of increasing the compression efficiency because the prediction distance is short and the prediction accuracy is high. Also, different transform kernels may be applied to each subblock. In the present invention, when ISP and LFNST are used, a method for determining the type of the transform kernel according to the block size is proposed.

8 shows an example of MTS application according to the proposed ISP block size. 8, when LFNST is used in the current block (ie, LFNST=1) and ISP is used (ie, ISP_NO_SPLIT = 0), the horizontal transformation kernel of the current block is DCT-2 (ie, trTypeHor = 0), The transformation kernel in the vertical direction of the current block may be determined as DCT-2 (ie, trTypeVer = 0). Meanwhile, when LFNST is not used in the current block (ie, LFNST = 0) and ISP is not used (ie, ISP_NO_SPLIT = 1), different MTS kernels may be used according to the current block size. Specifically, when the horizontal and vertical sizes of the current block are 4 or more and 16 or less, the transformation kernel is determined to be an MTS kernel other than DCT-2, and when the horizontal and vertical sizes of the current block are less than 4 or greater than 16, the transformation kernel is It can be determined as DCT-2.

9 shows an example of using a transform kernel according to an ISP block size. Referring to FIG. 9 , when the horizontal or vertical length of the block is 16 or less, for example, for 4x4, 4x8, 8x4, 8x8 blocks as in the shaded part of FIG. 9, DCT such as DST-7 for horizontal and vertical A transform kernel other than -2 may be used, and for a block size other than that, a DCT-2 MTS transform kernel may be used. For a smaller block, a kernel suitable for an intra prediction residual signal such as DST-7 may be used. Otherwise, if the block size is large, since the residual signal has a uniform property, DCT-2 may be used. In FIG. 9, only blocks of 4x4, 4x8, 8x4, and 8x8 sizes are separately indicated. However, as described in FIG. 8, when the horizontal and vertical sizes of blocks are 4 or more and 16 or less, that is, 4x16, 8x16, 16x4, 16x8, 16x16 sizes are shown. It goes without saying that a transformation kernel other than DCT-2 such as DST-7 can be used for the branch block.

Also, in the present specification, when LFNST is not used and MIP is used in the case of explicit peripheral exchange kernel selection (Explicit MTS), a method of applying a transform kernel according to the current block (transform block) size is proposed. Explicit MTS means a method of explicitly signaling which changed kernel is used in a transform block by transmitting information on the use of a transform kernel in the MTS.

Referring to FIG. 10 , the transform kernel is, when an explicit main transform kernel selection is applied (ie, Explicit MTS = 1), when LFNST is not used in the transform block (ie, LFNST = 0), and when MIP is used (ie, MIP) = 1) may be determined according to the transform block size. When the horizontal or vertical length of the transform block is less than 16, that is, for 4x4, 4x8, 8x4, and 8x8 blocks like the shaded part of FIG. 9, a transform kernel other than DCT-2 such as DCT-7 is used for the horizontal and vertical lengths, and For other blocks, the DCT-2 MTS conversion kernel is used. According to FIG. 10 , a kernel suitable for an intra prediction residual signal such as DST-7 may be used for a smaller block, that is, when the block size is large, since the residual signal has a uniform property, DCT-2 may be used.

Referring to FIG. 11 , when the horizontal and vertical lengths of the transform block are greater than 32 pixels, transform kernels for the horizontal and vertical directions are determined to be DCT-2 and DCT-2, respectively, and the horizontal and vertical lengths of the transform block are When it is smaller than or equal to 32 pixels, the type of the transform kernel may be changed according to the division type of the subblock. For example, when the subblock to which the transform is applied is vertically divided and the position is on the left, the transform kernel for the horizontal direction may be determined as DCT-8, and the transform kernel for the vertical direction may be determined as DST-7. Also, when a subblock to which a transform is applied is vertically divided and the position is on the right, the transform kernel for the horizontal direction may be determined as DST-7, and the transform kernel for the vertical direction may be determined as DST-7. Also, when a subblock to which a transform is applied is horizontally divided and the position is on the upper side, the transform kernel for the horizontal direction may be determined as DST-7, and the transform kernel for the vertical direction may be determined as DCT-8. Also, when a subblock to which transform is applied is horizontally divided and the position is at the lower side, the transform kernel for the horizontal direction is determined to be DST-7, and the transform kernel for the vertical direction is determined to be DST-7.

In case of implementation by hardware, the method according to embodiments of the present invention may include one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), and Programmable Logic Devices (PLDs). , FPGAs (Field Programmable Gate Arrays), processors, controllers, microcontrollers, microprocessors, and the like.

In the case of implementation by firmware or software, the method according to the embodiments of the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described above. The software code may be stored in the memory and driven by the processor. The memory may be located inside or outside the processor, and data may be exchanged with the processor by various known means.

The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above are illustrative in all respects and should be construed as being limited. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may be implemented in a combined form.

The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

A video signal decoding apparatus comprising:

includes a processor;

The processor is

determining a transformation kernel for horizontal transformation of the current block and a transformation kernel for vertical transformation of the current block based on a preset condition, and

Obtaining a residual signal for the current block using the transform kernel,

The preset condition is whether an Intra Subblock Partitioning (ISP) prediction method is applied to the current block and whether a Low Frequency Non-Separable Transform (LFNST) is applied to the current block. Device, characterized in that the condition based on whether or not.
The method of claim 1,

When the ISP prediction method is applied and the LFNST is applied,

A transform kernel for horizontal transformation of the current block and a transform kernel for vertical transformation of the current block are both DCT (Discrete Cosine Transform) type 2 (DCT-2) transform kernels.
The method of claim 1,

When the ISP prediction method is not applied and the LFNST is not applied,

The transform kernel is determined based on a horizontal size and a vertical size of the current block.
4. The method of claim 3,

If the horizontal size of the current block is less than 4 or greater than 16,

A transform kernel for horizontal transform of the current block is a DCT type 2 (DCT-2) transform kernel,

When the horizontal size of the current block is 4 or more and 16 or less,

A transform kernel for transverse transform of the current block is not a DCT type 2 (DCT-2) transform kernel.
4. The method of claim 3,

If the vertical size of the current block is less than 4 or greater than 16,

A transform kernel for longitudinal transform of the current block is a DCT type 2 (DCT-2) transform kernel,

When the vertical size of the current block is 4 or more and 16 or less,

A transform kernel for vertical transform of the current block is not a DCT type 2 (DCT-2) transform kernel.
The method of claim 1,

The preset condition is a condition based on a division direction of the current block when a sub-block transform (SBT) is applied to the current block,

The current block is divided in the vertical direction and consists of two sub-blocks,

When SBT is applied to the left subblock among the two subblocks,

The transform kernel for the horizontal direction of the left subblock is a DCT type 8 (DCT-8) transform kernel,

The transform kernel for the vertical direction of the left subblock is a Discrete Sine Transform (DST) type 7 (DST-7) transform kernel.
7. The method of claim 6,

When SBT is applied to the right subblock among the two subblocks,

A transform kernel for the horizontal direction of the left subblock and a transform kernel for the vertical direction of the left subblock are both DST type 7 (DST-7).
7. The method of claim 6,

The current block is divided horizontally and consists of two sub-blocks,

When SBT is applied to an upper subblock among the two subblocks,

The transform kernel for the horizontal direction of the upper subblock is a DST type 7 (DST-7) transform kernel,

A transform kernel for the vertical direction of the upper subblock is a DCT type 8 (DCT-8) transform kernel.
7. The method of claim 6,

When SBT is applied to the lower subblock among the two subblocks,

The apparatus of claim 1, wherein the transform kernel for the horizontal direction of the lower subblock and the transform kernel for the vertical direction of the lower subblock are a DST type 7 (DST-7) transform kernel.
A video signal encoding apparatus comprising:

includes a processor;

The processor is

determining a transformation kernel for horizontal transformation of the current block and a transformation kernel for vertical transformation of the current block based on a preset condition, and

Obtaining a transform block for the current block using the transform kernel,

The preset condition is whether an Intra Subblock Partitioning (ISP) prediction method is applied to the current block and whether a Low Frequency Non-Separable Transform (LFNST) is applied to the current block. Device, characterized in that the condition based on whether or not.
11. The method of claim 10,

When the ISP prediction method is applied and the LFNST is applied,

A transform kernel for horizontal transformation of the current block and a transform kernel for vertical transformation of the current block are both DCT (Discrete Cosine Transform) type 2 (DCT-2) transform kernels.
11. The method of claim 10,

When the ISP prediction method is not applied and the LFNST is not applied,

The transform kernel is determined based on a horizontal size and a vertical size of the current block.
13. The method of claim 12,

If the horizontal size of the current block is less than 4 or greater than 16,

A transform kernel for horizontal transform of the current block is a DCT type 2 (DCT-2) transform kernel,

When the horizontal size of the current block is 4 or more and 16 or less,

A transform kernel for transverse transform of the current block is not a DCT type 2 (DCT-2) transform kernel.
13. The method of claim 12,

If the vertical size of the current block is less than 4 or greater than 16,

A transform kernel for longitudinal transform of the current block is a DCT type 2 (DCT-2) transform kernel,

When the vertical size of the current block is 4 or more and 16 or less,

A transform kernel for vertical transform of the current block is not a DCT type 2 (DCT-2) transform kernel.
11. The method of claim 10,

The preset condition is a condition based on a division direction of the current block when a sub-block transform (SBT) is applied to the current block,

The current block is divided in the vertical direction and consists of two sub-blocks,

When SBT is applied to the left subblock among the two subblocks,

The transform kernel for the horizontal direction of the left subblock is a DCT type 8 (DCT-8) transform kernel,

The transform kernel for the vertical direction of the left subblock is a Discrete Sine Transform (DST) type 7 (DST-7) transform kernel.
16. The method of claim 15,

When SBT is applied to the right subblock among the two subblocks,

A transform kernel for the horizontal direction of the left subblock and a transform kernel for the vertical direction of the left subblock are both DST type 7 (DST-7).
16. The method of claim 15,

The current block is divided horizontally and consists of two sub-blocks,

When SBT is applied to an upper subblock among the two subblocks,

The transform kernel for the horizontal direction of the upper subblock is a DST type 7 (DST-7) transform kernel,

A transform kernel for the vertical direction of the upper subblock is a DCT type 8 (DCT-8) transform kernel.
16. The method of claim 15,

When SBT is applied to the lower subblock among the two subblocks,

The apparatus of claim 1, wherein the transform kernel for the horizontal direction of the lower subblock and the transform kernel for the vertical direction of the lower subblock are a DST type 7 (DST-7) transform kernel.
A non-transitory computer-readable medium for storing a bitstream, comprising:

The bitstream is

determining a transformation kernel for horizontal transformation of the current block and a transformation kernel for vertical transformation of the current block based on a preset condition; and

obtaining a transform block for the current block using the transform kernel; Encoded through an encoding method comprising:

The preset condition is whether an intra subblock partitioning (ISP) prediction method is applied to the current block and whether a low frequency non-separable transform (LFNST) is applied to the current block. A computer-readable medium, characterized in that the condition is based on whether or not
20. The method of claim 19,

When the ISP prediction method is applied and the LFNST is applied,

A transform kernel for horizontal transformation of the current block and a transform kernel for vertical transformation of the current block are both DCT (Discrete Cosine Transform) type 2 (DCT-2) transform kernels.