WO2021194052A1

WO2021194052A1 - Image decoding method and device

Info

Publication number: WO2021194052A1
Application number: PCT/KR2020/018466
Authority: WO
Inventors: 이선영
Original assignee: 주식회사 아틴스
Priority date: 2020-03-27
Filing date: 2020-12-16
Publication date: 2021-09-30

Abstract

Provided are an image decoding method and device. This specification provides an image decoding method comprising the steps of: acquiring a parameter indicating whether a multiple transform set is applicable to a block to be decoded, as well as information about the width of the block to be decoded and the height of the block to be decoded; determining the transform type of the block to be decoded on the basis of at least one of the parameter indicating whether a multiple transform set is applicable, or the information about the width and height of the block to be decoded, and setting a zero-out region of the block to be decoded; and inverse-transforming the block to be decoded on the basis of the zero-out region of the block to be decoded and the result of determining the transform type.

Description

Video decoding method and apparatus

The present invention relates to video coding technology, and more particularly, to a method of determining the type of a primary transform of a decoding object block during an image decoding process.

Recently, demand for high-resolution and high-quality images, such as high definition (HD) images and ultra high definition (UHD) images, is increasing in various fields. Since the amount of information or bits to be transmitted increases as the image data becomes high-resolution and high-quality, the transmission and storage costs increase when using a medium such as a conventional wired/wireless broadband line to store image data. do.

Since the HEVC video codec was enacted in 2013, as immersive video and virtual reality services using 4K and 8K video images have been expanded, standardization work for VVC (Versatile Video Coding), a next-generation video codec that aims to improve performance by more than twice compared to HEVC standardization work is currently in progress. VVC is a video encoding standardization group ISO/ICE MPEG (Moving Picture Experts Group) and ITU-T VCEG (Joint Video Exploration Team) jointly organized by JVET (Joint Video Exploration Team) to improve encoding compression performance more than twice compared to HEVC. target is being developed. VVC standardization announced a call for proposal at the 121st MPEG and 9th JVET meeting in Gwangju in January 2018, and a total of 23 organizations proposed video codec technology at the 122nd MPEG and 10th JVET meeting in San Diego. Video standardization has begun. At the 122nd MPEG and 10th JVET meeting, technical review on video codec technologies proposed by each institution and objective compression performance and subjective quality evaluation were performed. The reference software VTM (VVC Test Mode) 1.0 was released. As for the VVC standard, the Committee Draft (CD) was completed after the end of the 127th MPEG and 15th JVET meeting in July 2019, and standardization is in progress with the goal of enacting the Final Draft International Standard (FDIS) in October 2020.

In the coding structure hierarchically split into quadtrees in HEVC, a split block structure that combines QTBT (QuadTree Binary Tree) and TT (Ternary Tree) is adopted in VVC. This could generate or process the prediction residual signal more flexibly compared to HEVC, resulting in improved compression performance compared to HEVC. In addition to this basic block structure, it is not used in existing codecs such as adaptive loop filter technology (ALF), AMP (Affine Motion Prediction) as a motion prediction technology, and decoder-side motion vector refinement (DMVR). New technologies that did not exist were adopted as standard technologies. As a transform and quantization technique, DCT-II, a transform kernel mainly used in existing video codecs, is continuously used, and the size of the applied block has been changed to apply up to a larger block size. In addition, the DST-7 kernel, which has been applied to small transform blocks such as 4×4 in existing HEVC, has been extended to large transform blocks, and a new transform kernel, DCT-8, has been added as a transform kernel.

Meanwhile, in the HEVC standard, there is no need to transmit information on a transform type for an image because transform is performed using one transform type when encoding or decoding an image. However, in the new technology, DCT-II and DCT-8 , Multiple Transform Selection using DCT-7 can be applied, so a technique for defining whether or not to apply MTS and which type of linear transformation is applied during decoding is required.

An object of the present invention is to perform inverse transformation in a predefined method according to specific conditions.

Another technical object of the present invention is to perform decoding by applying a transform type optimized to a decoding object block.

According to an aspect of the present invention, there is provided an image decoding method performed by an image decoding apparatus. The image decoding method includes: obtaining information about a parameter indicating whether multiple transform set (MTS) is applicable to a decoding object block, a width of the decoding object block, and a height of the decoding object block; Based on at least one of a parameter indicating whether multiple transform set (MTS) is applicable to the decoding object block, a width of the decoding object block, and information about a height of the decoding object block Determining a transform type, at least one of a parameter indicating whether a multiple transform set (MTS) is applicable to the decoding object block, a width of the decoding object block, and information on a height of the decoding object block setting a zero-out region of the decoding object block based on .

According to another aspect of the present invention, in the step of determining the transformation type of the decoding object block, if at least one of a width or a height of the decoding object block has a value greater than 32, the decoding object block uses a default transformation to be considered converted.

According to another aspect of the present invention, in the step of setting the zero-out area of the decoding object block, when one of the width or the height of the decoding object block has a value greater than 32, the width of the decoding object block or The area with a height greater than 32 is set as the zero-out area.

According to another aspect of the present invention, a parameter indicating whether multiple transform sets can be applied to the decoding object block is sps_mts_enabled_flag.

According to another aspect of the present invention, there is provided an image decoding method performed by an image decoding apparatus. The image decoding method includes information on whether a multiple transform set (MTS) is applied to a decoding object block, information on a prediction mode, information on whether to apply a secondary transform, information on whether prediction using a matrix is applied, and the Obtaining at least one of information on the size of the decoding object block, information on whether a multiple transform set is applied to the decoding object block, information on a prediction mode, information on whether a secondary transform is applied, whether prediction using a matrix is applied determining whether an implicit multiple transform set is applied to the decoding object block based on at least one of information on obtaining information about a transform type based on information about

According to another aspect of the present invention, the determining whether the implicit multiple transform set is applied may include information on whether a multiple transform set (MTS) is applied to the decoding object block, and information on a prediction mode. , it is determined whether the implicit multiple transform set is applied using information on whether or not the quadratic transform is applied and whether or not prediction using a matrix is applied.

According to another aspect of the present invention, the set of implicit multiple transforms includes one default transform and at least one extra transform.

According to another aspect of the present invention, the step of obtaining information on the transformation type based on the information on the size of the decoding object block may include, when all of the horizontal axis lengths of the decoding object block are 4 or more and 16, the decoding object At least one of an extra transform type is applied to the block in the horizontal axis direction.

According to another aspect of the present invention, the step of obtaining information on the transformation type based on the information on the size of the decoding object block may include, when all of the vertical axis lengths of the decoding object block are 4 or more and 16 or less, the decoding At least one of extra transform types is applied to the target block in the vertical direction.

According to another aspect of the present invention, the information on whether multiple transform set (MTS) is applied to the decoding object block includes at least one of sps_mts_enabled_flag and sps_explicit_mts_intra_enabled_flag.

According to another aspect of the present invention, the information on the prediction mode includes CuPredMode.

According to another aspect of the present invention, the information on whether the quadratic transform is applied includes lfnst_idx.

According to another aspect of the present invention, information on whether prediction using the matrix is applied includes intra_mip_flag.

According to another aspect of the present invention, the information on the transformation type of the decoding object block includes information on a horizontal-axis transformation type and information on a vertical-axis transformation type, respectively.

According to another aspect of the present invention, the step of determining whether an implicit multiple transform set is applied to the decoding object block is obtained by additionally checking whether the decoding object block is a luma block.

According to another aspect of the present invention, there is provided an image decoding apparatus including a memory and at least one processor. The image decoding apparatus includes information on whether a multiple transform set (MTS) is applied to a decoding object block, information on a prediction mode, information on whether a secondary transform is applied, information on whether prediction using a matrix is applied, and the Obtaining at least one of information on the size of the decoding object block, information on whether multiple transform sets are applied to the decoding object block, information on prediction mode, information on whether to apply a secondary transform, and whether or not prediction using a matrix is applied It is determined whether an implicit multiple transform set is applied to the decoding object block based on at least one of information about and at least one processor including an inverse transform unit that obtains information on a transform type based on the information and performs an inverse transform based on the information on the transform type.

According to the present invention, the inverse transformation can be performed in a predefined method according to specific conditions.

In addition, an effect of improving compression performance can be expected by performing decoding by applying a transform type optimized to a decoding target block.

1 is a diagram schematically illustrating a configuration of a video encoding apparatus to which the present invention can be applied.

2 illustrates an example of an image encoding method performed by a video encoding apparatus.

3 is a diagram schematically illustrating a configuration of a video decoding apparatus to which the present invention can be applied.

4 shows an example of an image decoding method performed by a decoding apparatus.

5 shows a scan order of sub-blocks and coefficients for a diagonal scan scheme.

6 shows an example of a 32x32 encoding target block after quantization.

7 illustrates the remaining zero-out areas excluding mxn among the areas of the MxN decoding object block.

8 illustrates a method of determining whether to apply an implicit MTS function according to an embodiment of the present invention.

9 shows a method of deriving transformation information according to the width and height of the corresponding block of the implicit MTS according to an embodiment of the present invention.

10 illustrates a method of performing an inverse transform based on a transform-related parameter according to an embodiment of the present invention.

11 shows a valid MTS area marked with a thick line in a 32x32 decoding object block.

12 shows a method of determining a valid MTS according to an embodiment of the present invention.

13 illustrates a method of determining a valid MTS according to another embodiment of the present invention.

14 illustrates a method of determining a valid MTS according to another embodiment of the present invention.

15 illustrates a method of determining whether to apply an explicit MTS function according to an embodiment of the present invention.

16 illustrates a method of performing an inverse transform based on a transform-related parameter according to another embodiment of the present invention.

Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the invention to a specific embodiment. Terms commonly used herein are used only to describe specific embodiments, and are not intended to limit the technical spirit of the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. As used herein, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification is present, and includes one or more other features or It is to be understood that the existence or addition of numbers, steps, operations, components, parts or combinations thereof is not precluded in advance.

On the other hand, each component in the drawings described in the present invention is shown independently for the convenience of description regarding different characteristic functions, and does not mean that each component is implemented as separate hardware or separate software. For example, two or more components among each component may be combined to form one component, or one component may be divided into a plurality of components. Embodiments in which each component is integrated and/or separated are also included in the scope of the present invention without departing from the essence of the present invention.

Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. Hereinafter, the same reference numerals are used for the same components in the drawings, and repeated descriptions of the same components are omitted.

Meanwhile, the present invention relates to video/image coding. For example, the method/embodiment disclosed in the present invention may be a versatile video coding (VVC) standard, an Essential Video Coding (EVC) standard, an AOMedia Video 1 (AV1) standard, a 2nd generation of audio video coding standard (AVS2) or a next-generation video It can be applied to methods disclosed in /image coding standards (eg, H.267, H.268, etc.).

In the present specification, a picture generally refers to a unit representing one image in a specific time period, and a slice is a unit constituting a part of a picture in coding. One picture may consist of a plurality of slices, and if necessary, a picture and a slice may be used interchangeably.

A pixel or pel may mean a minimum unit constituting one picture (or image). Also, as a term corresponding to a pixel, a 'sample' may be used. A sample may generally represent a pixel or a value of a pixel, may represent only a pixel/pixel value of a luma component, or may represent only a pixel/pixel value of a chroma component.

A unit represents a basic unit of image processing. The unit may include at least one of a specific region of a picture and information related to the region. A unit may be used interchangeably with terms such as a block or an area in some cases. In general, an MxN block may represent a set of samples or transform coefficients including M columns and N rows.

Referring to FIG. 1 , the video encoding apparatus 100 includes a picture division unit 105 , a prediction unit 110 , a residual processing unit 120 , an entropy encoding unit 130 , an adder 140 , and a filter unit 150 . ) and a memory 160 . The residual processing unit 120 may include a subtraction unit 121 , a transform unit 122 , a quantization unit 123 , a rearrangement unit 124 , an inverse quantization unit 125 , and an inverse transform unit 126 .

The picture divider 105 may divide the input picture into at least one processing unit.

As an example, the processing unit may be referred to as a coding unit (CU). In this case, the coding unit may be recursively divided from a coding tree unit (CTU) according to a quad-tree binary-tree (QTBT) structure. For example, one coding unit may be divided into a plurality of coding units having a lower depth based on a quad tree structure and/or a binary tree structure. In this case, for example, a quad tree structure may be applied first and a binary tree structure may be applied later. Alternatively, the binary tree structure may be applied first. The coding procedure according to the present invention may be performed based on the final coding unit that is no longer divided. In this case, the maximum coding unit may be directly used as the final coding unit based on coding efficiency according to image characteristics, or the coding unit may be recursively divided into coding units having a lower depth than the optimal coding unit if necessary. A coding unit of the size of may be used as the final coding unit. Here, the coding procedure may include procedures such as prediction, transformation, and restoration, which will be described later.

As another example, the processing unit may include a coding unit (CU), a prediction unit (PU), or a transform unit (TU). A coding unit may be split from a largest coding unit (LCU) into coding units of a lower depth along a quad tree structure. In this case, the maximum coding unit may be directly used as the final coding unit based on coding efficiency according to image characteristics, or the coding unit may be recursively divided into coding units having a lower depth than the optimal coding unit if necessary. A coding unit of the size of may be used as the final coding unit. When a smallest coding unit (SCU) is set, the coding unit cannot be divided into a coding unit smaller than the smallest coding unit. Herein, the final coding unit means a coding unit that is a base that is partitioned or divided into a prediction unit or a transform unit. A prediction unit is a unit partitioned from a coding unit, and may be a unit of sample prediction. In this case, the prediction unit may be divided into sub-blocks. A transform unit may be divided along a quad tree structure from a coding unit, and may be a unit deriving a transform coefficient and/or a unit deriving a residual signal from the transform coefficient. Hereinafter, the coding unit may be referred to as a coding block (CB), the prediction unit may be referred to as a prediction block (PB), and the transform unit may be referred to as a transform block (TB). A prediction block or a prediction unit may mean a specific area in the form of a block within a picture, and may include an array of prediction samples. In addition, a transform block or transform unit may mean a specific block-shaped region within a picture, and may include transform coefficients or an array of residual samples.

The prediction unit 110 may perform prediction on a processing target block (hereinafter, referred to as a current block) and generate a predicted block including prediction samples for the current block. A unit of prediction performed by the prediction unit 110 may be a coding block, a transform block, or a prediction block.

The prediction unit 110 may determine whether intra prediction or inter prediction is applied to the current block. For example, the prediction unit 110 may determine whether intra prediction or inter prediction is applied in units of CUs.

In the case of intra prediction, the prediction unit 110 may derive a prediction sample for the current block based on a reference sample outside the current block in a picture to which the current block belongs (hereinafter, referred to as a current picture). In this case, the prediction unit 110 may (i) derive a prediction sample based on an average or interpolation of neighboring reference samples of the current block, and (ii) a neighboring reference of the current block. The prediction sample may be derived based on a reference sample existing in a specific (prediction) direction with respect to the prediction sample among the samples. The case of (i) may be called a non-directional mode or a non-angular mode, and the case of (ii) may be called a directional mode or an angular mode. In intra prediction, a prediction mode may have, for example, 33 directional prediction modes and at least two or more non-directional modes. The non-directional mode may include a DC prediction mode and a planar mode (Planar mode). The prediction unit 110 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.

In the case of inter prediction, the prediction unit 110 may derive a prediction sample for the current block based on a sample specified by a motion vector on a reference picture. The prediction unit 110 may derive a prediction sample for the current block by applying any one of a skip mode, a merge mode, and a motion vector prediction (MVP) mode. In the skip mode and merge mode, the prediction unit 110 may use motion information of a neighboring block as motion information of the current block. In the skip mode, unlike the merge mode, the difference (residual) between the predicted sample and the original sample is not transmitted. In the MVP mode, the motion vector of the current block may be derived by using the motion vector of the neighboring block as a motion vector predictor of the current block.

In the case of inter prediction, a neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in a reference picture. The reference picture including the temporal neighboring block may be referred to as a collocated picture (colPic). Motion information may include a motion vector and a reference picture index. Information such as prediction mode information and motion information may be (entropy) encoded and output in the form of a bitstream.

When motion information of a temporal neighboring block is used in the skip mode and the merge mode, the highest picture on a reference picture list may be used as a reference picture. Reference pictures included in the reference picture list may be sorted based on a picture order count (POC) difference between the current picture and the corresponding reference picture. The POC corresponds to the display order of the picture and may be distinguished from the coding order.

The subtraction unit 121 generates a residual sample that is a difference between an original sample and a predicted sample. When the skip mode is applied, the residual sample may not be generated as described above.

The transform unit 122 generates transform coefficients by transforming residual samples in units of transform blocks. The transform unit 122 may perform transform according to the size of the corresponding transform block and the prediction mode applied to the coding block or prediction block spatially overlapping the corresponding transform block. For example, if intra prediction is applied to the coding block or the prediction block overlapping the transform block, and the transform block is a 4×4 residual array, the residual sample is a Discrete Sine Transform (DST) transform kernel. In other cases, the residual sample may be transformed using a DCT (Discrete Cosine Transform) transformation kernel.

The quantizer 123 may quantize the transform coefficients to generate a quantized transform coefficient.

The rearrangement unit 124 rearranges the quantized transform coefficients. The reordering unit 124 may rearrange the quantized transform coefficients in a block form into a one-dimensional vector form through a coefficient scanning method. Here, although the rearrangement unit 124 has been described as a separate configuration, the rearrangement unit 124 may be a part of the quantization unit 123 .

The entropy encoding unit 130 may perform entropy encoding on the quantized transform coefficients. Entropy encoding may include, for example, an encoding method such as exponential Golomb, context-adaptive variable length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). The entropy encoding unit 130 may encode information necessary for video reconstruction (eg, a value of a syntax element, etc.) other than the quantized transform coefficient together or separately. Entropy-encoded information may be transmitted or stored in a network abstraction layer (NAL) unit unit in the form of a bitstream.

The inverse quantization unit 125 inversely quantizes the values (quantized transform coefficients) quantized by the quantization unit 123 , and the inverse transform unit 126 inversely transforms the values inversely quantized by the inverse quantization unit 125 to obtain a residual sample. create

The adder 140 reconstructs a picture by combining the residual sample and the prediction sample. A reconstructed block may be generated by adding the residual sample and the prediction sample in units of blocks. Here, the adder 140 has been described as a separate configuration, but the adder 140 may be a part of the prediction unit 110 . Meanwhile, the adder 140 may be referred to as a restoration unit or a restoration block generator.

The filter unit 150 may apply a deblocking filter and/or a sample adaptive offset to a reconstructed picture. Artifacts of block boundaries in the reconstructed picture or distortion in the quantization process may be corrected through deblocking filtering and/or sample adaptive offset. The sample adaptive offset may be applied in units of samples, and may be applied after the process of deblocking filtering is completed. The filter unit 150 may apply an adaptive loop filter (ALF) to the reconstructed picture. ALF may be applied to the reconstructed picture after the deblocking filter and/or sample adaptive offset is applied.

The memory 160 may store a reconstructed picture (a decoded picture) or information required for encoding/decoding. Here, the reconstructed picture may be a reconstructed picture whose filtering procedure has been completed by the filter unit 150 . The stored reconstructed picture may be used as a reference picture for (inter) prediction of another picture. For example, the memory 160 may store (reference) pictures used for inter prediction. In this case, pictures used for inter prediction may be designated by a reference picture set or a reference picture list.

2 illustrates an example of an image encoding method performed by a video encoding apparatus. Referring to FIG. 2 , the image encoding method may include block partitioning, intra/inter prediction, transform, quantization, and entropy encoding. For example, the current picture may be divided into a plurality of blocks, a prediction block of the current block may be generated through intra/inter prediction, and the input block of the current block may be subtracted from the prediction block. A residual block of the current block may be generated. Thereafter, a coefficient block, ie, transform coefficients of the current block, may be generated by transforming the residual block. The transform coefficients may be quantized and entropy encoded and stored in a bitstream.

Referring to FIG. 3 , the video decoding apparatus 300 includes an entropy decoding unit 310 , a residual processing unit 320 , a prediction unit 330 , an adder 340 , a filter unit 350 and a memory 360 . may include Here, the residual processing unit 320 may include a rearrangement unit 321 , an inverse quantization unit 322 , and an inverse transform unit 323 .

When a bitstream including video information is input, the video decoding apparatus 300 may reconstruct a video corresponding to a process in which the video information is processed by the video encoding apparatus.

For example, the video decoding apparatus 300 may perform video decoding using a processing unit applied in the video encoding apparatus. Accordingly, a processing unit block of video decoding may be, as an example, a coding unit, and may be a coding unit, a prediction unit, or a transform unit, as another example. A coding unit may be partitioned from the largest coding unit along a quad tree structure and/or a binary tree structure.

A prediction unit and a transform unit may be further used depending on the case, in which case a prediction block is a block derived or partitioned from a coding unit, and may be a unit of sample prediction. In this case, the prediction unit may be divided into sub-blocks. A transform unit may be divided along a quad tree structure from a coding unit, and may be a unit deriving a transform coefficient or a unit deriving a residual signal from a transform coefficient.

The entropy decoding unit 310 may parse the bitstream and output information necessary for video or picture restoration. For example, the entropy decoding unit 310 decodes information in a bitstream based on a coding method such as exponential Golomb encoding, CAVLC or CABAC, and a value of a syntax element required for video reconstruction, and a quantized value of a transform coefficient related to a residual can be printed out.

In more detail, the CABAC entropy decoding method receives a bin corresponding to each syntax element in a bitstream, and decodes the syntax element information to be decoded and the decoding information of the surrounding and decoding target blocks or the symbol/bin information decoded in the previous step. A context model is determined using the context model, and the probability of occurrence of a bin is predicted according to the determined context model, and a symbol corresponding to the value of each syntax element can be generated by performing arithmetic decoding of the bin. have. In this case, the CABAC entropy decoding method may update the context model by using the decoded symbol/bin information for the context model of the next symbol/bin after determining the context model.

Prediction-related information among the information decoded by the entropy decoding unit 310 is provided to the prediction unit 330, and the residual value on which the entropy decoding is performed in the entropy decoding unit 310, that is, the quantized transform coefficient is a rearrangement unit ( 321) can be entered.

The reordering unit 321 may rearrange the quantized transform coefficients in a two-dimensional block form. The reordering unit 321 may perform reordering in response to coefficient scanning performed by the encoding apparatus. Here, although the rearrangement unit 321 has been described as a separate configuration, the rearrangement unit 321 may be a part of the inverse quantization unit 322 .

The inverse quantizer 322 may inverse quantize the quantized transform coefficients based on the (inverse) quantization parameter to output the transform coefficients. In this case, information for deriving the quantization parameter may be signaled from the encoding device.

The inverse transform unit 323 may inverse transform the transform coefficients to derive residual samples.

The prediction unit 330 may perform prediction on the current block and generate a predicted block including prediction samples for the current block. A unit of prediction performed by the prediction unit 330 may be a coding block, a transform block, or a prediction block.

The prediction unit 330 may determine whether to apply intra prediction or inter prediction based on the information on the prediction. In this case, a unit for determining which one of intra prediction and inter prediction is applied and a unit for generating a prediction sample may be different. In addition, units for generating prediction samples in inter prediction and intra prediction may also be different. For example, which one of inter prediction and intra prediction is to be applied may be determined in units of CUs. Also, for example, in inter prediction, a prediction mode may be determined in units of PUs and a prediction sample may be generated, and in intra prediction, a prediction mode may be determined in units of PUs and prediction samples may be generated in units of TUs.

In the case of intra prediction, the prediction unit 330 may derive a prediction sample for the current block based on neighboring reference samples in the current picture. The prediction unit 330 may derive a prediction sample for the current block by applying a directional mode or a non-directional mode based on the neighboring reference samples of the current block. In this case, a prediction mode to be applied to the current block may be determined by using the intra prediction mode of the neighboring block. Meanwhile, Matrix-based Intra Prediction (MIP) that performs prediction based on a matrix trained in advance may be used. In this case, the number of MIP modes and the size of the matrix are defined for each block size, and reference samples are used. After downsampling according to the size of the matrix, the matrix determined by the mode number is multiplied and interpolated to fit the prediction block size to generate a predicted value.

In the case of inter prediction, the prediction unit 330 may derive a prediction sample for the current block based on a sample specified on the reference picture by a motion vector on the reference picture. The prediction unit 330 may derive a prediction sample for the current block by applying any one of a skip mode, a merge mode, and an MVP mode. In this case, motion information necessary for inter prediction of the current block provided by the video encoding apparatus, for example, information about a motion vector, a reference picture index, etc., may be obtained or derived based on the information about the prediction.

In the case of the skip mode and the merge mode, motion information of a neighboring block may be used as motion information of the current block. In this case, the neighboring block may include a spatial neighboring block and a temporal neighboring block.

The prediction unit 330 may construct a merge candidate list with motion information of available neighboring blocks, and use information indicated by a merge index on the merge candidate list as a motion vector of the current block. The merge index may be signaled from the encoding device. The motion information may include a motion vector and a reference picture. When motion information of a temporal neighboring block is used in the skip mode and the merge mode, the highest picture on the reference picture list may be used as the reference picture.

In the skip mode, unlike the merge mode, the difference (residual) between the predicted sample and the original sample is not transmitted.

In the MVP mode, the motion vector of the current block may be derived by using the motion vector of the neighboring block as a motion vector predictor. In this case, the neighboring block may include a spatial neighboring block and a temporal neighboring block.

For example, when the merge mode is applied, a merge candidate list may be generated using a motion vector of a reconstructed spatial neighboring block and/or a motion vector corresponding to a Col block that is a temporal neighboring block. In the merge mode, the motion vector of the candidate block selected from the merge candidate list is used as the motion vector of the current block. The prediction information may include a merge index indicating a candidate block having an optimal motion vector selected from among candidate blocks included in the merge candidate list. In this case, the prediction unit 330 may derive the motion vector of the current block by using the merge index.

As another example, when the Motion Vector Prediction (MVP) mode is applied, a motion vector predictor candidate list is generated using a motion vector of a reconstructed spatial neighboring block and/or a motion vector corresponding to a col block that is a temporal neighboring block. can That is, a motion vector of a reconstructed spatial neighboring block and/or a motion vector corresponding to a col block that is a temporal neighboring block may be used as a motion vector candidate. The prediction information may include a prediction motion vector index indicating an optimal motion vector selected from motion vector candidates included in the list. In this case, the prediction unit 330 may select a prediction motion vector of the current block from among motion vector candidates included in the motion vector candidate list by using the motion vector index. The prediction unit of the encoding apparatus may obtain a motion vector difference (MVD) between the motion vector of the current block and the motion vector predictor, encode it and output it in the form of a bitstream. That is, the MVD may be obtained by subtracting the motion vector predictor from the motion vector of the current block. In this case, the prediction unit 330 may obtain a motion vector difference included in the prediction-related information, and derive the motion vector of the current block by adding the motion vector difference and the motion vector predictor. The prediction unit may also obtain or derive a reference picture index indicating a reference picture from the information about the prediction.

The adder 340 may reconstruct the current block or the current picture by adding the residual sample and the prediction sample. The adder 340 may reconstruct the current picture by adding the residual sample and the prediction sample in units of blocks. When the skip mode is applied, since the residual is not transmitted, the prediction sample may be the reconstructed sample. Here, the adder 340 is described as a separate configuration, but the adder 340 may be a part of the predictor 330 . Meanwhile, the adder 340 may be referred to as a restoration unit or a restoration block generator.

The filter unit 350 may apply a deblocking filtering sample adaptive offset and/or ALF to the reconstructed picture. In this case, the sample adaptive offset may be applied in units of samples or may be applied after deblocking filtering. ALF may be applied after deblocking filtering and/or sample adaptive offset.

The memory 360 may store a reconstructed picture (a decoded picture) or information necessary for decoding. Here, the reconstructed picture may be a reconstructed picture whose filtering procedure has been completed by the filter unit 350 . For example, the memory 360 may store pictures used for inter prediction. In this case, pictures used for inter prediction may be designated by a reference picture set or a reference picture list. The reconstructed picture may be used as a reference picture for other pictures. Also, the memory 360 may output the restored pictures according to the output order.

4 shows an example of an image decoding method performed by a decoding apparatus. Referring to FIG. 4 , the image decoding method may include entropy decoding, inverse quantization, inverse transform, and intra/inter prediction processes. For example, in the decoding apparatus, the reverse process of the encoding method may be performed. Specifically, quantized transform coefficients may be obtained through entropy decoding of the bitstream, and a coefficient block of the current block, ie, transform coefficients, may be obtained through an inverse quantization process for the quantized transform coefficients. A residual block of the current block may be derived through inverse transform on the transform coefficients, and the prediction block of the current block derived through intra/inter prediction is added to the residual block of the current block A reconstructed block may be derived.

Meanwhile, operators in the embodiments described below may be defined as shown in the following table.

Referring to Table 1, Floor(x) may represent a maximum integer value less than or equal to x, Log2(u) may represent a log value based on 2 of u, and Ceil(x) is greater than or equal to x It can represent the minimum integer value. For example, in the case of Floor(5.93), since the maximum integer value less than or equal to 5.93 is 5, it can represent 5.

In addition, referring to Table 1, x>>y may represent an operator for right shifting x y times, and x<<y may represent an operator for left shifting x y times. .

<도입><Introduction>

The HEVC standard generally uses one transform type, DCT (discrete cosine transform). Accordingly, there is no need to transmit a separate determination process for a transform type and information on the determined transform type. However, when the size of the current luma block is 4x4 and intra prediction is performed, a DST (discrete sine transform) transform type is exceptionally used.

Information representing the position of a non-zero coefficient among quantized coefficients that have undergone transformation and quantization can be roughly classified into three types.

1. Position (x,y) of the last significant coefficient: the position of the lowest non-zero coefficient in the scan order in the encoding object block (hereinafter, the last position) (defined as last position)

2. Coded sub-block flag: A block to be coded is divided into a plurality of sub-blocks, so that each sub-block has a non-zero coefficient A flag indicating whether or not it contains one or more zero coefficients (or a flag indicating whether it is all zero coefficients)

3. Significant coefficient flag: a flag indicating whether each coefficient in one sub-block is non-zero or zero

Here, the position of the last significant coefficient is expressed by dividing it into an x-axis component and a y-axis component, and each component is expressed by dividing it into a prefix and a suffix. That is, the syntax indicating the non-zero positions of the quantized coefficients includes a total of six syntaxes below.

1. last_sig_coeff_x_prefix,

2. last_sig_coeff_y_prefix

3. last_sig_coeff_x_suffix,

4. last_sig_coeff_y_suffix

5. coded_sub_block_flag

6. sig_coeff_flag

The last_sig_coeff_x_prefix indicates the prefix of the x-axis component indicating the position of the last significant coefficient, and last_sig_coeff_y_prefix indicates the prefix of the y-axis component indicating the position of the last significant coefficient. In addition, last_sig_coeff_x_suffix indicates the suffix of the x component indicating the position of the last significant coefficient, and last_sig_coeff_y_suffix indicates the suffix of the y component indicating the position of the last significant coefficient.

On the other hand, coded_sub_block_flag indicates "0" if all coefficients in the corresponding sub-block are all zeros, and "1" if one or more non-zero coefficients exist. sig_coeff_flag is expressed as "0" in the case of a zero coefficient, and expressed as "1" in the case of a non-zero coefficient. A coded_sub_block_flag syntax is transmitted only for sub-blocks that exist before in a scan order in consideration of the position of the last significant coefficient in the block to be coded. When coded_sub_block_flag is “1”, that is, when one or more non-zero coefficients exist, a sig_coeff_flag syntax for each of all coefficients in a corresponding sub-block is transmitted.

The HEVC standard supports the following three types of scans for coefficients.

1) up-right diagonal

2) horizontal

3) vertical

When the encoding target block is encoded using the inter prediction method, the coefficients of the corresponding block are scanned in an up-right diagonal method, and when the block is encoded using the intra prediction method, the intra prediction mode Accordingly, one of the three types is selected and the coefficients of the corresponding block are scanned.

That is, if the inter prediction method is used when the encoding object block is encoded by the image encoding apparatus, coefficients of the corresponding block are scanned in an up-right diagonal method, and the encoding object block is encoded. When the intra prediction method is used, the image encoding apparatus scans the coefficients of the corresponding block by selecting one of the three types according to the intra prediction mode. The scanning may be performed by the reordering unit 124 in the image encoding apparatus of FIG. 1 , and a two-dimensional block shape coefficient may be changed into a one-dimensional vector form through scanning.

5 shows a scan order of sub-blocks and coefficients for a diagonal scan scheme.

Referring to FIG. 5 , when the block of FIG. 5 is scanned in a diagonal scan method by the rearrangement unit 124 of the image encoding apparatus, the block of FIG. 5 is scanned in a downward direction and The scan is performed in the diagonal upper direction, and the last scan is performed for sub-block 16 at the lower right. That is, the rearrangement unit 124 is 1, 2, 3, ... , 14, 15, and 16 sub-blocks are scanned in order to rearrange the quantized transform coefficients in the form of a two-dimensional block into a form of a one-dimensional vector. Similarly, the reordering unit 124 of the image encoding apparatus scans coefficients in each sub-block in the same diagonal scan method as that of the sub-block. For example, in

subblock

1, 0, 1, 2, ... , 13, 14, and 15 in the order of the scans are performed.

However, when the scanned coefficients are stored in the bitstream, the stored order is the reverse of the scan order. That is, when the block of FIG. 10 is scanned by the reordering unit 124 of the image encoding apparatus, scans are performed in the order of coefficient 0 to 255, but the order in which each pixel is stored in the bitstream is at position 255 It is stored in the bitstream in order from the pixel at position 0 to the pixel at position 0.

6 shows an example of a 32x32 encoding target block after being quantized by the video encoding apparatus. Here, when the 32×32 block shown in FIG. 6 is scanned by the image encoding apparatus, a diagonal method is arbitrarily used. In FIG. 6 , a pixel indicated by a diagonal hatch indicates a non-zero coefficient, and a pixel indicated by an x indicates a last significant coefficient. All other white coefficients have a value of zero. Here, if the coded_sub_block_flag syntax is substituted into the block of FIG. 6, 24 sub-blocks existing before the last position in the scan order among a total of 64 sub-blocks, that is, the thick line in FIG. coded_sub_block_flag information for a sub-block indicated by . The coded_sub_block_flag value for the first sub-block including the DC value and the 24th sub-block including the last position coefficient among the 24 sub-blocks is " 1" and coded_sub_block_flag values for the remaining 22 sub-blocks are transmitted to the video decoding apparatus through a bitstream. At this time, in the case of a sub-block including one or more non-zero coefficients among 22 sub-blocks, the coded_sub_block_flag value is " set to 1". In FIG. 6, the coded_sub_block_flag value of the 4th, 5th, 11th, and 18th sub-blocks that are sub-blocks including gray-marked pixels among 22 sub-blocks excluding the first sub-block and the 24th sub-block is "1 " is set to

1. 복호화 대상 블록의 일차 변환(primary transform) 타입(type)의 결정 방법1. Determination method of a primary transform type of a decoding target block

The present specification discloses a method of determining a type of a primary transform of a decoding object block during an image decoding process. That is, when a decoding object block is decoded by an image decoding apparatus, a process of determining whether the image encoding apparatus has primary transformed and coded into what type of transformation is required in the transformation process by the image encoding apparatus. A primary transform type consists of one default transform and a plurality of extra transforms. The decoding target block uses a default transform or a multiple transform set (MTS) including a default transform and extra transforms according to a condition. That is, the decoding object block may be transformed by using only the default transform or by using a multiple transform set including the default transform and the additional transform in the transform process. Conversely, from the viewpoint of the image decoding apparatus, decoding may be performed by determining whether a decoding object block uses only a default transform or a multiple transform set (MTS) including a default transform and an additional transform. When the decoding object block uses the MTS, information about a transform actually used among a plurality of transforms is transmitted or derived. Here, as for the information about the transform actually used, a horizontal axis transformation type and a vertical axis transformation type may exist separately. That is, when the decoding object block is transformed using MTS, the image decoding apparatus may perform decoding by receiving or determining which transform type is used and transformed among a plurality of transform types.

According to an embodiment, DCT-II may be set as a default transform, and DST-7 and DCT-8 may be set as extra transforms. In this case, the maximum size of DCT-II, which is a default transform, is supported up to 64×64, and the maximum size of DST-7 and DCT-8, which are extra transforms, is supported up to 32×32. For example, when the size of the decoding object block is 64×64, one 64×64 DCT-II is applied to a transform process. That is, when at least one of the width and height of the decoding target block is greater than 32 (greater than 32), a default transform (*) is directly applied without applying the MTS. That is, from the viewpoint of the video decoding apparatus, it is only necessary to determine whether the MTS is used and transformed only when both the width and length of the decoding object block are 32 or less. Conversely, when one of the horizontal and vertical sizes of the decoding target block is greater than 32, it may be determined that the default transformation is applied and transformed. In this way, when the decoding object block is transformed by default transformation, there is no MTS-related transmitted syntax information. In the present invention, for convenience, the transform type value of DCT-II is set to “0”, the transform type value of DST-7 is set to “1”, and the transform type value of DCT-8 is set to “2”. However, the present invention is not limited thereto. Table 2 below defines the transform type assigned to each value of the trType syntax.

Tables 3 and 4 show examples of transform kernels of DST-7 and DST-8 when the size of the decoding and target blocks is 4x4.

Table 3 shows the coefficient values of the corresponding transform kernel when the tyType is "1" (DST-7) and the size of the decoding target block is 4x4, and Table 4 shows the coefficient values of the corresponding transform kernel when the tyType is "2" (DCT-8) and represents the coefficient value of the corresponding transform kernel when the size of the decoding target block is 4x4.

A zero-out area may be included in the entire transform area of the decoding object block. Transform transforms pixel domain values into frequency domain values. In this case, an upper-left frequency region is referred to as a low-frequency region and a lower-right frequency region is referred to as a high-frequency region. The low frequency component reflects the general (average) characteristics of the corresponding block, and the high frequency component reflects the sharp (unique) characteristic of the corresponding block. Therefore, the low frequency component has a plurality of large values, and the high frequency component has a few small values. Through the quantization process after transformation, a small number of small values in the high-frequency region mostly have zero values. Here, in addition to the low frequency region belonging to the upper left, the remaining region having most of the zero values is referred to as a zero-out region, and the zero-out region may be excluded from the signaling process. A region excluding the zero-out region in the decoding object block is referred to as a valid region.

Referring to FIG. 7 , the upper left gray area represents the low frequency area, and the white area represents the high frequency zero-out.

As another example, when the decoding object block is 64x64, the upper left 32x32 area becomes a valid area, and the remaining areas except for this become a zero-out area, so that no signaling is performed.

In addition, when the size of the decoding object block is one of 64x64, 64x32, and 32x64, the upper left 32x32 area becomes an effective area, and the remaining part becomes a zero-out area except for the coefficients quantized by the decoder. Since the size of the block is known when parsing the syntax for , the zero-out region is not signaled. That is, an area in which the width or height of the decoding object block is greater than 32 is set as the zero-out area. In this case, since it corresponds to a case where the size of the block to be decoded is greater than 32 in width or length, the transform used is DCT-II, which is the default transform. Since the maximum size of DST-7 and DCT-8, which are extra transforms, is supported up to 32×32, MTS is not applied to the target block of this size.

In addition, when the size of the decoding object block is one of 32x32, 32x16, and 16x32 and MTS is applied to the decoding object block (for example, when DST-7 or DCT-8 is used), the upper left 16x16 area becomes an effective area , and the remaining part is set as a zero-out area. Here, the zero-out region may be signaled according to the location of the last significant coefficient and the scanning method. This is because, since the encoder signals the MTS index (mts_idx) value after signaling the quantized coefficient-related syntax, information on the transform type cannot be known when the decoder parses the syntax for the quantized coefficient. When the zero-out region is signaled as described above, the decoder can perform transformation only on the effective region after ignoring or removing quantized coefficients corresponding to the zero-out region. Here, the information on the actually used transformation type may have a horizontal axis transformation type and a vertical axis transformation type separately. For example, when the horizontal axis transformation type of the decoding object block is DST-7 or DCT-8, the horizontal axis (width) effective area of the decoding object block becomes 16, and the vertical axis transformation type of the decoding object block is DST-7 or DCT- In the case of 8, the effective area of the vertical axis (height) of the decoding object block is 16.

On the other hand, when the size of the decoding object block is one of 32x32, 32x16, and 16x32 and MTS is not applied to the decoding object block (eg, when DCT-II, which is a default transform, is used), all regions are valid regions. A zero-out region does not exist and is transformed using the default transformation DCT-II. Here, as for the information about the transform actually used, a horizontal axis transformation type and a vertical axis transformation type may exist separately. For example, when the horizontal axis transformation type of the decoding object block is DCT-II, the effective area of the horizontal axis (width) of the decoding object block becomes the width of the corresponding block, and when the vertical axis transformation type of the decoding object block is DCT-II, The vertical axis (height) effective area of the decoding target block becomes the height of the corresponding block. That is, all areas (width x height) of the decoding object block become effective areas.

On the other hand, when the decoding object block has a size smaller than 16, which is not defined above, all regions become valid regions, and there is no zero-out region. Whether MTS is applied to the decoding object block and the transform type value are determined by implicit and/or explicit embodiments.

In the present invention, syntax content for expressing the position of a non-zero coefficient among quantized coefficients that have undergone transformation and quantization is the same as that of the HEVC method. However, the coded_sub_block_flag syntax name is changed to sb_coded_flag and used. In addition, a method of scanning the quantized coefficients uses an up-right diagonal method.

In the present invention, MTS may be applied to a luma block (not applied to a chroma block). In addition, the MTS function may be turned on/off using a flag indicating whether MTS is used, that is, sps_mts_enabled_flag. When the MTS function is used, sps_mts_enabled_flag = on is set, and it is possible to set whether to explicitly use the MTS function for intra prediction and inter prediction, respectively. That is, a flag sps_explicit_mts_intra_enabled_flag indicating whether MTS is used in intra prediction and a flag sps_explicit_mts_inter_enabled_flag indicating whether MTS is used in inter prediction may be separately set. In the present specification, for convenience, the values of the three flags sps_mts_enabled_flag, sps_explicit_mts_intra_enabled_flag, and sps_explicit_mts_inter_enabled_flag indicating whether to use the MTS are described as being located in a sequence parameter set (SPS), but is not limited thereto. That is, the three flags are at least one of decoding capability information (DCI), video parameter set (VPS), sequence parameter set (SPS), picture parameter set (PPS), picture header (PH), and slice header (SH). can be set individually. In addition, the flags indicating whether the three MTSs are used may be defined as high level syntax (HLS).

The use of MTS can be divided into an explicit method and an implicit method. Explicit use of MTS is defined as MTS-related information (e.g., conversion actually used) when a specific condition is satisfied while a flag value indicating whether MTS is used within and/or between screens is set to on in the SPS. information) is transmitted. That is, the image decoding apparatus may receive the MTS-related information and check which transformation type the decoding object block is transformed using based on the MTS-related information, and may perform decoding based on the received MTS-related information. For example, in an environment where explicit MTS is used, the three flags may be set as follows.

1. sps_mts_enabled_flag = on

2. sps_explicit_mts_intra_enabled_flag = on

3. sps_explicit_mts_inter_enabled_flag = on

The implicit use of MTS induces MTS-related information (eg, actually used conversion information) when a specific condition is satisfied while a value of sps_mts_enabled_flag among three flags is set to on in the SPS. For example, in an environment in which implicit MTS is used, the three flags may be set as follows.

1. sps_mts_enabled_flag = on

2. sps_explicit_mts_intra_enabled_flag = off

3. sps_explicit_mts_inter_enabled_flag = off (on, off doesn't matter)

Hereinafter, an implicit MTS method and an explicit MTS method will be described through several embodiments.

2. 제1 실시예(묵시적 MTS)2. Example 1 (Implicit MTS)

The implicit MTS described in this embodiment may be used when the decoding object block is encoded using the intra prediction method. That is, when the decoding object block is encoded by the intra prediction method when the decoding object block is encoded by the video encoding apparatus, encoding and/or decoding may be performed by the video encoding apparatus and/or the video decoding apparatus using the implicit MTS. Meanwhile, whether to use the implicit MTS when decoding the decoding object block may be indicated by the implicitMtsEnabled parameter. The image decoding apparatus may determine whether to perform decoding using the implicit MTS by checking the value of the implicitMtsEnabled parameter. For example, when an implicit MTS is used for decoding, the implicitMtsEnable parameter may have a value of 1, otherwise, the implicitMtsEnable parameter may have a value of 0. Meanwhile, in this specification, the implicitMtsEnabled may be indicated as “implicit_MTS_enabled” in some cases.

Looking at the conditions of high level syntax (HLS) for applying this implicit MTS, sps_mts_enabled_flag is a flag indicating whether MTS is applied regardless of whether it is implicit or explicit. Therefore, it is set to "on" to apply the implicit MTS. should be On the other hand, the implicit MTS is used when the decoding object block is encoded by the video encoding apparatus and encoded by the intra prediction method. Accordingly, the image decoding apparatus may determine whether to use the implicit MTS by checking the sps_explicit_mts_intra_enabled_flag value. However, sps_explicit_mts_intra_enabled_flag is set to “on” when the decoding object block is encoded by the intra prediction method when the decoding object block is encoded by the video encoding apparatus and explicit MTS is applied. Accordingly, when the decoding object block is encoded as implicit MTS by the video encoding apparatus, sps_explicit_mts_intra_enabled_flag is set to “off”. Meanwhile, as described above, the implicit MTS is used when the decoding object block is encoded by the video encoding apparatus using the intra prediction method. Therefore, it does not matter what value the sps_explicit_mts_inter_enabled_flag value indicating the explicit MTS when the decoding object block is encoded by the intra prediction method by the video encoding apparatus has. Meanwhile, since the implicit MTS can be used when a decoding object block is encoded by an intra prediction method by an image encoding apparatus, it can be applied when CuPredMode has a MODE_INTRA value.

In summary, the conditions for the decoding object block to be decoded using the implicit MTS by the image decoding apparatus may be listed as follows.

1) sps_mts_enabled_flag is equal to 1

2) sps_explicit_mts_intra_enabled_flag is equal to 0

3) CuPredMode is equal to MODE_INTRA (in-screen prediction method)

Meanwhile, CuPredMode[ 0 ][ xTbY ][ yTbY ] indicating the prediction mode of the current position in the luma block may have a MODE_INTRA value.

Additional conditions for using implicit MTS are as follows.

4) lfnst_idx is equal to 0

5) intra_mip_flag is equal to 0

Here, the lfnst_idx value represents a secondary transform, and when lfnst_idx = 0, it means that the secondary transform is not used. The intra_mip_flag value indicates whether a prediction method using a matrix (matrix-based intra prediction: mip), which is one of the intra prediction methods, is used. When intra_mip_flag = 0, it means that prediction using a matrix is not used, and when intra_mip_flag = 1, it means that prediction using a matrix is used.

That is, the present embodiment describes a method of setting a primary transform type (or MTS) for a decoding object block that does not use a secondary transform while predicting with a general intra prediction method. . When all of the above five conditions are satisfied, the implicit MTS function may be activated (refer to FIG. 13 ).

8 illustrates a method of determining whether to apply an implicit MTS function according to an embodiment of the present invention. Each of the steps of FIG. 8 may be performed in the image decoding apparatus.

Referring to FIG. 8 , the video decoding apparatus determines whether sps_mts_enable_flag has a value of 1, sps_explicit_mts_intra_enable_flag has a value of 0, and CuPredMode has a MODE_INTRA value (S810). As a result of the determination, when all of the conditions of S810 are satisfied, the image decoding apparatus determines whether lfnst_idx has a value of 0 and intra_mip_flag has a value of 0 (S820), and satisfies both the conditions of S810 and S820. In this case, the implicit_MTS_enabled value is set to 1 (S830). Meanwhile, the image decoding apparatus sets the implicit_MTS_enabled value to 0 when none of the conditions of S810 or S820 are satisfied (S840).

When the implicit MTS function for the decoding object block is activated (implicit_MTS_enabled = on), an MTS value (transform information actually used) is derived according to the width and height of the corresponding block (refer to FIG. 9 ). In this case, the transform should not be a sub-block transform (sbt) in which only a part of the target block undergoes a transform process. That is, the cu_sbt_flag value of the target block is “0”.

9 shows a method of deriving transformation information according to the width and height of the corresponding block of the implicit MTS according to an embodiment of the present invention. Each of the steps of FIG. 9 may be performed in the image decoding apparatus.

Referring to FIG. 9 , the image decoding apparatus determines whether the implicit_MTS_enabled value is '1' (S910). At this time, although not shown in the drawing, the image decoding apparatus may additionally check whether the cu_sbt_flag value has a value of “0”. In this case, when cu_sbt_flag has a value of '1', it indicates that the decoding object block is transformed by sub-block transformation in which only a part of the object block undergoes a transformation process. Conversely, when cu_sbt_flag has a value of '0', it indicates that the decoding object block is not transformed by sub-block transformation in which only a part of the object block undergoes a transformation process. Accordingly, the operations according to FIG. 14 may be set to be operated only when the cu_sbt_flag has a value of '0'.

When the implicit_MTS_enabled value is 1, it is determined whether the value of nTbW has a value of 4 or more and 16 or less (S920), and when the implicit_MTS_enabled value is not '1', the operation is terminated. nTbW represents the width of the corresponding transform block, and is used to determine whether an additional transform type, DST-7, can be used in the horizontal axis direction.

As a result of the determination in step S920, when the value of nTbW has a value of 4 or more and 16 or less, trTypeHor is set to '1' (S930), and when the value of nTbW does not have a value of 4 or more and 16 or less, trTypeHor is set to ' 0' (S940). In this case, the nTbW represents the width of the corresponding transform block, and is used to determine whether an additional transform type, DST-7, can be used in the horizontal axis direction. In this case, when tyTypeHor is set to '0', it may be determined that the corresponding transform block is transformed using the DCT-II transform that is the default type transform in the horizontal axis direction. Meanwhile, when the trTypeHor is set to '1', it may be determined that the corresponding transform block is transformed using the DST-7 transform, which is one of the additional transform types, in the horizontal axis direction.

In addition, the image decoding apparatus determines whether the value of nTbH has a value of 4 or more and 16 or less (S950), and sets trTypeVer to '1' when the value of nTbH has a value of 4 or more and 16 or less ( S960), when the value of nTbW does not have a value of 4 or more and 16 or less, trTypeVer is set to '0' (S970). The nTbH represents the height of the corresponding transform block, and is used to determine whether an additional transform type, DST-7, can be used in the vertical axis direction. In this case, when trTypeVer is set to '0', it may be determined that the corresponding transform block is transformed using the DCT-II transform, which is the default type transform in the vertical axis direction. On the other hand, when the trTypeVer is set to '1', it can be determined that the corresponding transform block is transformed using the DST-7 transform, which is one of the additional transform types, in the vertical axis direction.

10 illustrates a method of performing an inverse transform based on a transform-related parameter according to an embodiment of the present invention. Each of the steps of FIG. 10 may be performed by an image decoding apparatus, for example, by an inverse transform unit of the decoding apparatus.

Referring to FIG. 10 , the image decoding apparatus obtains sps_mts_enabled_flag, sps_explicit_mts_intra_enabled_flag, CuPredMode[0][xTbY][yTbY], lfnst_idx, IntraMipFlag[x0, y0], NTbW, nTbH (S1010). At this time, the related parameters of sps_mts_enabled_flag, sps_explicit_mts_intra_enabled_flag, CuPredMode[0][xTbY][yTbY], lfnst_idx, IntraMipFlag[x0, y0] are described in detail in the description of FIG. Used to determine whether a block can apply implicit MTS. In addition, NTbW and nTbH respectively indicate the width and height of the corresponding transform block, and are used to determine whether an additional transform type, DST-7, can be used.

Next, the image decoding apparatus sets implicit_MTS_enabled based on values of sps_mts_enabled_flag, sps_explicit_mts_intra_enabled_flag, CuPredMode[0][xTbY][yTbY], Lfnst_idx, IntraMipFlag[x0, y0] (S1020). In this case, the implicit_MTS_enable may be set by performing the process of FIG. 13 .

Next, the image decoding apparatus sets trTypeHor and trTypeVer based on implicit_MTS_enabled, nTbW, and nTbH values (S1030). In this case, the trTypeHor and trTypeVer methods may be set by performing the process of FIG. 9 .

Next, the image decoding apparatus performs inverse transformation based on trTypeHor and trTypeVer (S1040). An inverse transform applied according to trTypeHor and trTypeVer may be configured according to Table 2. For example, when trTypeHor is “1” and trTypeVer is “0”, DST-7 may be applied in the horizontal axis direction and DST-II may be applied in the vertical axis direction.

Meanwhile, although not shown in the drawing, in order to set whether or not to use implicit MTS from the viewpoint of the video encoding apparatus, sps_mts_enabled_flag, sps_explicit_mts_intra_enabled_flag, CuPredMode[0][xTbY][yTbY], lfnst_idx, IntraMipFlag[x0, nTbHy0], NTbW, n can be set.

3. 제2 실시예(명시적 MTS)3. Second embodiment (explicit MTS)

This embodiment describes a transform method applied to a decoding object block when an MTS function is explicitly activated in high level syntax (HLS). Looking at the conditions of high level syntax (HLS) for applying explicit MTS, sps_mts_enabled_flag is a flag indicating whether MTS is applied regardless of whether it is implicit or explicit, so it is set to "on" for implicit MTS to be applied should be Meanwhile, since the explicit MTS can be applied to both the case where the decoding object block is encoded by the intra prediction method and the case where the decoding method is encoded by the inter prediction method, when the explicit MTS is applied, sps_explicit_mts_intra_enabled_flag and/or sps_explicit_mts_intra_enabled_flag are all set to “on”. is set Summarizing this, it can be listed as the condition below.

1) sps_mts_enabled_flag = on

2) sps_explicit_mts_intra_enabled_flag = on

3) sps_explicit_mts_inter_enabled_flag = on

Here, when the decoding object block is encoded by the intra prediction method, the condition sps_explicit_mts_intra_enabled_flag = "on" is checked, and when the decoding object block is encoded by the inter prediction method, the sps_explicit_mts_inter_enabled_flag = "on" condition is checked.

Additional conditions for using explicit MTS are as follows.

4) lfnst_idx is equal to 0 (see implicit MTS)

5) transform_skip_flag is equal to 0

6) intra_subpartitions_mode_flag is equal to 0

7) cu_sbt_flag is equal to 0 (see implicit MTS)

8) valid MTS area

9) The target block has a width and height of 32 or less

Here, the lfnst_idx value represents a secondary transform, and when lfnst_idx = 0, it means that the secondary transform is not used.

The transform_skip_flag value indicates that the transform process is omitted, and when transform_skip_flag = 0, it indicates that the transform is normally performed without omitting the transform process. The intra_subpartitions_mode_flag value indicates that the target block is divided into a plurality of sub-blocks and undergoes prediction, transformation, and quantization processes as one of the intra-picture prediction methods. That is, when the corresponding flag value (intra_subpartitions_mode_flag) is “0”, it means that general intra prediction is performed without dividing the target block into sub-blocks. On the other hand, by the supported size of the extra transforms (DST-7 and DCT-8) (up to 32x32 support, described above), MTS use may be limited. That is, when the width and height of the target block are 32 or less, MTS can be used. That is, if any one of the width and the height exceeds 32 (the use of MTS is not possible), DCT-II, which is a default transform (*), is performed.

cu_sbt_flag indicates whether a sub-block transform (sbt) in which only a part of the target block undergoes a transformation process. That is, when the cu_sbt_flag value is “0”, it means that it is not a sub-block transform (sbt) in which only a part of the target block undergoes a transformation process.

Hereinafter, a valid area (hereinafter, a valid MTS area) will be described in detail.

11 shows an example of an effective MTS area marked with a thick line in a 32x32 decoding object block.

Referring to FIG. 11 , the 16x16 area on the upper left side excluding the DC coefficient becomes the effective MTS area. That is, the upper left 16x16 area excluding the 1x1 (DC) area is an effective MTS area. For example, if the positions of all non-zero coefficients in the target block belong to the effective MTS area, MTS is applicable, and one or more non-zero coefficients values are in the effective MTS area. If it is out of , MTS cannot be applied, so DCT-II, which is a default transform (*), is performed. This is the same concept as the zero-out area described above. That is, if a 32x32 target block uses MTS (ie, DST-7 or DCT-8), the upper left 16x16 becomes a valid area, and the rest of the 32x32 target block becomes a zero-out area. If all my non-zero coefficients are located in the upper left 16x16 region, MTS (ie, DST-7 or DCT-8) can be applied. However, exceptionally, if there is only one non-zero coefficient in the block and the position is DC (1x1), MTS cannot be applied and DCT-II, which is a default transform (*), is performed. .

As a result, in this embodiment, in order to determine whether the MTS is applicable or not, the valid MTS area needs to be checked, and the following two conditions need to be checked for the valid MTS area.

(a) When the non-zero coefficient in the block is one, the position is DC (1x1)

(b) whether all non-zero coefficients in the block are located in the upper left 16x16 region

In order to confirm the condition (a), last position information may be used. Here, the last position refers to the position of the last non-zero coefficient, that is, the last significant coefficient in the scan order in the target block. As an example, information about the last sub-block including the last position, that is, the last non-zero coefficient, may be used. For example, if the position of the last sub block is not (0, 0), condition (a) may be satisfied. In other words, if the position of the last sub block in the sub block within the target block is not “0” (greater than 0), the condition (a) may be satisfied. Alternatively, if the position of the last sub-block is "0", information about the last scan position indicating the relative position of the last position in the sub-block may be used. . For example, if the last scan position in the scan order of the coefficients in the sub-block is not “0” (greater than 0), the condition (a) may be satisfied ( FIG. 12 ). reference). In addition, as mentioned above, the MTS of the present invention is applied to a luma block.

In order to confirm the condition (b), sub-block information including one or more non-zero coefficients may be used. Here, sub-block information including one or more non-zero coefficients can be confirmed with an sb_coded_flag value of the corresponding sub-block. When the corresponding flag value is "1" (sb_coded_flag = 1), it means that one or more non-zero coefficients are located in the corresponding sub block, and when sb_coded_flag = 0, It means that all coefficients in the corresponding sub-block are all zeros. That is, if the positions of all sub-blocks whose sb_coded_flag value is “1” in the target block are located within (0,0) to (3,3), the condition (b) may be satisfied. Conversely, if even one of the sub-blocks having the sb_coded_flag value of “1” in the target block is out of position within (0,0) to (3,3), the condition (b) cannot be satisfied. In other words, if even one of the sub-blocks with the sb_coded_flag value of “1” in the target block has a value greater than 3 among the x-coordinate or the y-coordinate of the sub-block, the condition (b) can be satisfied. None (see FIG. 18). As another embodiment, when the sb_coded_flag value is "1" in the sub block scan order in the target block and the first sub block having a value greater than 3 among the x coordinate or the y coordinate of the sub block is found, ( b) A condition may be set to false, and a process of checking a sub-block having a sb_coded_flag value of “1” after the scan order may be omitted (refer to FIG. 13 ). In addition, as mentioned above, the MTS of the present invention is applied to a luma block.

12 shows a method of determining a valid MTS according to an embodiment of the present invention. The embodiment of FIG. 12 relates to a method of confirming the condition of (a) among the two conditions for checking the valid MTS area described above. Each of the steps of FIG. 12 may be performed in the image decoding apparatus.

Referring to FIG. 12 , the image decoding apparatus sets MtsDcOnlyFlag to “1” (S1210). The MtsDcOnlyFlag may indicate whether there is one non-zero coefficient in the block and the position is DC. For example, if there is one non-zero coefficient in the block and the position is DC, the MtsDcOnlyFlag has a value of “1”, and in other cases, the MtsDcOnlyFlag has a value of “0”. In this case, the video decoding apparatus may apply MTS when the MtsDcOnlyFlag value has “0”. The reason for setting MtsDcOnlyFlag to “1” in step S1210 is to set MtsDcOnlyFlag to “0” when the corresponding block satisfies the condition that the block is not in the DC position when there is one non-zero coefficient in the block below. reset to , otherwise, MTS is not applied.

Next, the image decoding apparatus determines whether the target block is a luma block (S1220). The purpose of determining whether the target block is the luma block is because the MTS is applied only to the luma block as described above.

Next, the image decoding apparatus determines whether the last sub-block is greater than 0 (S1230), and when the last sub-block is greater than 0, sets MtsDcOnlyFlag to “0” (S1240), and ends the process do.

As a result of the determination in step S1230, if the last sub-block is not greater than 0, it is determined whether the last scan position is greater than 0 (S1250).

As a result of the determination in step S1250, if the last scan position is greater than 0, MtsDcOnlyFlag is set to “0” (S1240), and the process is terminated.

As a result of the determination in step S1250, if the last scan position is not greater than 0, the process is terminated.

According to this embodiment, MtsDcOnlyFlag is set to “0” if the last sub-block is greater than 0 or the last scan position is greater than 0, otherwise MtsDcOnlyFlag is set to “1”.

Thereafter, when determining whether or not to apply MTS, MtsDcOnlyFlag is checked and, if it has a value of “1”, MTS is not applied and DCT-II, which is a default transformation, may be applied.

13 illustrates a method of determining an effective MTS area according to another embodiment of the present invention. The embodiment of FIG. 13 specifically shows a method of confirming the condition of (b) among the two conditions for confirming the valid MTS area described above. Each of the steps of FIG. 13 may be performed in the image decoding apparatus.

Referring to FIG. 13 , the video decoding apparatus sets MtsZerooutFlag to “1” (S1305). The MtsZerooutFlag indicates whether non-zero coefficients in a block exist in a zero-out region. For example, if at least one of the non-zero coefficients in the block exists in the zero-out region, MtsZerooutFlag has a value of “0”, and all non-zero coefficients in the block are zero-out (zero-out). If it does not exist in the zero-out) area, MtsZerooutFlag may have a value of “1”. In this embodiment, it is assumed that all non-zero coefficients in the block do not exist in the zero-out region, and the initial value of MtsZerooutFlag is set to “1”, and the conditions of the zero-out region and the non-zero coefficients are MtsZerooutFlag can be set to “0” when both conditions are satisfied. In this case, when MtsZerooutFlag having a value of “0” exists, explicit MTS may not be applied.

Next, the image decoding apparatus sets the initial value of the variable i to the value of the last sub block, and subtracts the value of the variable i by 1 until the value of the variable i becomes 0 in the following step S1325 The processes of steps S1350 to S1350 are repeated (S1320). The purpose of repeating the routine of step S1820 is to check the sb_coded_flag values of all sub-blocks from the last sub-block to the first sub-block. As described above, when the corresponding flag value is “1”, it means that one or more non-zero coefficients exist in the corresponding sub-block, and when the corresponding flag value is “0”, non-zero coefficients exist in the corresponding sub-block. means not Therefore, referring to FIG. 11 , when the positions of all sub-blocks whose sb_coded_flag value is “1” in the target block exist only within (0, 0) to (3, 3), that is, 0 to 8 based on variable i If it exists only within, it can be determined that the condition (b) for applying the explicit MTS is satisfied.

Next, the image decoding apparatus determines whether a condition in which the variable i is smaller than the last sub block (i<last sub block) and the variable i is larger than 0 (i>0) is simultaneously satisfied (S1325) ). For example, when the routine of step S1320 is first executed, the condition of step S1325 is not satisfied because the initial value of the variable i is set to the same value as that of the last sub block.

As a result of the determination of step S1325, when the condition that the variable i is smaller than the last sub block (i<last sub block) and the variable i is larger than 0 (i>0) simultaneously meets the conditions, sb_coded_flag is parsed (S1830) ). If both conditions are not satisfied at the same time, sb_coded_flag is set to “1” (S1835).

At this time, the parsed sb_coded_flag indicates whether one or more non-zero coefficients exist in the corresponding sub-block. When one or more non-zero coefficients exist in the corresponding subblock, sb_coded_flag has a value of “1”, and when there is no non-zero coefficient in the corresponding subblock, sb_coded_flag has a value of “0”.

On the other hand, step S1835 is performed only when i indicates the last sub-block and the first sub-block. That is, since the last position coefficient is included in the last sub-block, a value of “1” is parsed for the sb_coded_flag value, and a DC coefficient exists in the first sub-block, so the sb_coded_flag value has a value of “1”. is parsed

Next, the image decoding apparatus determines whether the corresponding block is a luma block (S1340). The purpose of determining whether the target block is the luma block is because the MTS is applied only to the luma block as described above.

As a result of the determination of step S1340, if the corresponding block is a luma block, it is determined whether the condition of “sb_coded_flag && ( xSb > 3 | | ySb > 3 )” is satisfied (S1845). If the condition of step S1845 is satisfied, MtsZerooutFlag is set It is set to “0” (S1350).

According to the present embodiment, when even one non-zero coefficient is found in a sub-block other than sub-blocks (3, 3) in the target block, that is, in the zero-out area, MtsZerooutFlag is set to “0” to indicate an explicit MTS. It may be judged not applicable.

14 illustrates a method of determining an effective MTS according to another embodiment of the present invention. The embodiment of FIG. 14 specifically shows a method of confirming the condition of (b) among the two conditions for confirming the valid MTS area described above. However, in the embodiment of FIG. 13 , the valid MTS area was checked by checking the sb_coded_flag of all sub-blocks, but in the embodiment of FIG. 14 , when the first invalid MTS is found, there is a difference in that the subsequent sb_coded_flag does not need to be checked. there is Each of the steps of FIG. 14 may be performed in the image decoding apparatus.

Referring to FIG. 14 , the image decoding apparatus sets MtsZerooutFlag to “1” (S1405). The MtsZerooutFlag indicates whether non-zero coefficients in a block exist in a zero-out region. For example, if at least one of the non-zero coefficients in the block exists in the zero-out region, MtsZerooutFlag has a value of “0”, and all non-zero coefficients in the block are zero-out (zero-out). If it does not exist in the zero-out) area, MtsZerooutFlag may have a value of “1”. In this embodiment, it is assumed that all non-zero coefficients in the block do not exist in the zero-out region, and the initial value of MtsZerooutFlag is set to “1”, and the conditions of the zero-out region and the non-zero coefficients are MtsZerooutFlag can be set to “0” when both conditions are satisfied. In this case, when MtsZerooutFlag having a value of “0” exists, explicit MTS may not be applied.

Next, the image decoding apparatus sets the initial value of the variable i to the value of the last sub-block, and subtracts the value of the variable i by 1 until the value of the variable i becomes 0 in the following step S1425 The processes of steps S1450 to S1450 are repeatedly performed (S1420). The purpose of repeating the routine of step S1420 is to check the sb_coded_flag values of all sub-blocks from the last sub-block to the first sub-block. As described above, when the sb_coded_flag value is “1”, it means that one or more non-zero coefficients exist in the corresponding sub-block, and when the sb_coded_flag value is “0”, there is no non-zero coefficient in the corresponding sub-block. means Therefore, referring to FIG. 16, when the positions of all sub-blocks having the sb_coded_flag value of “1” in the target block exist only within (0, 0) to (3, 3), that is, 0 to 8 based on variable i If it exists only within, it can be determined that the condition (b) for applying the explicit MTS is satisfied.

Next, the image decoding apparatus determines whether a condition in which the variable i is smaller than the last sub block (i<last sub block) and the variable i is larger than 0 (i>0) is simultaneously satisfied (S1425) ). For example, when the routine of step S1920 is first executed, the condition of step S1425 is not satisfied because the initial value of the variable i is set to the same value as that of the last sub block.

As a result of the determination in step S1425, when the condition that the variable i is smaller than the last sub block (i<last sub block) and the variable i is larger than 0 (i>0) is simultaneously satisfied, the sb_coded_flag is parsed (S1430) ). If both conditions are not satisfied at the same time, sb_coded_flag is set to “1” (S1435).

Meanwhile, step S1435 is performed only when i indicates the last sub-block and the first sub-block. That is, since the last position coefficient is included in the last sub-block, a value of “1” is parsed for the sb_coded_flag value, and a DC coefficient exists in the first sub-block, so the sb_coded_flag value has a value of “1”. is parsed

Next, the video decoding apparatus determines whether a condition of “MtsZerooutFlag && luma block” is satisfied (S1440).

As a result of the determination of step S1440, if the condition of “MtsZerooutFlag && luma block” is satisfied, it is further determined whether the condition of “sb_coded_flag && (xSb > 3 ｜ ySb > 3 )” is satisfied (S1445), and “sb_coded_flag &&” (xSb > 3 || ySb > 3 )", MtsZerooutFlag is set to “0” (S1450).

As a result of the determination in step S1440, if the condition of “MtsZerooutFlag && luma block” is not satisfied, the process in the corresponding sub-block is terminated.

According to this embodiment, if the MtsZerooutFlag value is set to “0” at least once in the corresponding variable i, that is, in the corresponding sub-block, in the next routine, variable i-1, a false value is derived in step S1940 to set the sb_coded_flag value. No need to check anymore.

On the other hand, when the decoding target block satisfies both the conditions (a) and (b), the explicit MTS use is confirmed, and transform information actually used in the corresponding block is transmitted in the form of an index (mts_idx). . On the other hand, when all conditions are not satisfied, DCT-II, which is a default transform (*), is used (see FIG. 15 ). Table 5 shows the transformation types of the horizontal axis and the vertical axis according to the mts_idx value.

In Table 5, trTypeHor means a transform type on the horizontal axis, and trTypeVer means a transform type on the vertical axis. Values of transform types in Table 5 mean trType values in Table 2. For example, when the value of mts_idx is "2", DCT-8 (2) may be used as the horizontal axis transform and DST-7 (1) may be used as the vertical axis transformation (transform).

In the present invention, in all cases where the aforementioned default transform (*) DCT-II is used/performed/applied, it may be replaced with the expression “derives the mts_idx value to “0””. That is, when the mts_idx value is "0", this is because DCT-II (0) is set for both the horizontal axis and the vertical axis transform.

In the present invention, the binarization method of mts_idx uses a truncated rice (TR) method. The cMax value, which is a parameter value for TR, is "4", and the cRiceParam value is "0". Table 6 shows the codewords of the MTS index.

Referring to Table 6, when the mts_idx value is “0”, the corresponding codeword is “0”, when the mts_idx value is “1”, the corresponding codeword is “10”, and the mts_idx value is “2” , the corresponding codeword is “110”, when the mts_idx value is “3”, the corresponding codeword is “1110”, and when the mts_idx value is “4”, the corresponding codeword is “1111”. can be checked

15 illustrates a method of determining whether to apply an explicit MTS function according to an embodiment of the present invention. Each of the steps of FIG. 15 may be performed in the image decoding apparatus.

Referring to FIG. 15 , the video decoding apparatus determines whether a condition of "(sps_explicit_mts_intra_enabled_flag && CuPredMode = MODE_INTRA)|(sps_explicit_mts_inter_enabled_flag && CuPredMode = MODE_INTER)" is satisfied (S1510).

sps_explicit_mts_intra_enabled_flag is a flag indicating whether or not to use explicit MTS for intra prediction, and sps_explicit_mts_intra_enabled_flag is a flag indicating whether to use explicit MTS for inter prediction. sps_explicit_mts_intra_enabled_flag has a value of “1” when explicit MTS is used for intra prediction, and a value of “0” otherwise. sps_explicit_mts_intra_enabled_flag has a value of “1” when explicit MTS is used for inter prediction, and has a value of “0” otherwise.

CuPredMode indicates whether the decoding object block is encoded by any prediction method. When the decoding object block is encoded by the intra prediction method, CuPredMode has a MODE_INTRA value, and when the decoding object block is encoded by the inter prediction method, CuPredMode has a MODE_INTER value.

Therefore, when the decoding object block uses intra prediction and explicit MTS, “sps_explicit_mts_intra_enabled_flag && CuPredMode = MODE_INTRA” has a value of “1”, and when the decoding object block uses inter prediction and explicit MTS, “sps_explicit_mts_inter_enabled_flag” && CuPredMode = MODE_INTER" has a value of “1.” Accordingly, in step S2010, by checking the values of sps_explicit_mts_intra_enabled_flag, sps_explicit_mts_inter_enabled_flag, and CuPredMode, it may be determined whether the decoding object block uses the explicit MTS.

When the condition of step S1510 is satisfied, the image decoding apparatus determines whether the condition of "lfnst_idx=0 && transform_skip_flag=0 && cbW <32 && cbH < 32 && intra_subpartitions_mode_flag = 0 && cu_sbt_flag = 0 " is satisfied (S1520) .

The transform_skip_flag value indicates whether transform skip is applied to the current block. That is, it indicates whether to omit the transformation process for the current block. When transform_skip_flag = 0, it indicates that transform skip is not applied to the current block.

cbW and cbH represent the width and height of the current block, respectively. As described above, the maximum size of DCT-II, which is the default transform, is supported up to 64×64, and the maximum size of DST-7 and DCT-8, which are extra transforms, is supported up to 32×32. do. For example, when the size of the decoding object block is 64×64, one 64×64 DCT-II is applied to a transform process. That is, when at least one of the width and height of the decoding target block is greater than 32 (greater than 32), a default transform (*) is directly applied without applying the MTS. Therefore, in order for MTS to be applied, both cbW and cbH must have a value of 32 or less.

intra_subpartitions_mode_flag indicates whether an intra subpartition mode is applied. The intra sub-partition mode is one of intra-picture prediction methods, and indicates that the target block is divided into a plurality of sub-blocks and subjected to prediction, transformation, and quantization processes. That is, when the corresponding flag value (intra_subpartitions_mode_flag) is “0”, it means that general intra prediction is performed without dividing the target block into sub-blocks.

cu_sbt_flag indicates whether sub-block transform (sbt) in which only a part of the target block undergoes a transformation process is applied. That is, when the cu_sbt_flag value is “0”, it means that the sub-block transform (sbt), which undergoes the transformation process of only a part of the target block, is not applied.

Accordingly, it can be determined whether the decoding object block can apply the explicit MTS through whether the condition of step S1520 is satisfied.

If the condition of step S1510 is not satisfied, the image decoding apparatus sets the value of mts_idx to “0” (S1530), and ends the process.

When the condition of step S1520 is satisfied, the video decoding apparatus determines whether the condition of “MtsZeroOutFlag=1 && MtsDcOnlyFlag=0” is satisfied (S1540).

The MtsZerooutFlag indicates whether non-zero coefficients in a block exist in a zero-out region. MtsZerooutFlag has a value of “0” when at least one of the non-zero coefficients in the block is in the zero-out region, and all non-zero coefficients in the block are zero-out. If it does not exist in the region, MtsZerooutFlag may have a value of “1”. In this case, the value of MtsZerooutFlag may be determined by performing the process of FIG. 13 or FIG. 14 .

MtsDcOnlyFlag indicates whether there is one non-zero coefficient in the block and the position is DC. If there is one non-zero coefficient in the block and the position is DC, the MtsDcOnlyFlag has a value of “1”, and in other cases, MtsDcOnlyFlag has a value of “0”. In this case, the value of MtsDcOnlyFlag may be determined by performing the process of FIG. 17 .

Meanwhile, if the condition of step S1520 is not satisfied, the image decoding apparatus sets the value of mts_idx to “0” (S1530), and ends the process.

When the condition of step S1540 is satisfied, the image decoding apparatus parses mts_idx ( S1550 ), and ends the process. In this case, the transformation type of the horizontal axis and the vertical axis according to the value of mts_idx may be allocated according to Table 5. In this case, values of transform type in Table 5 mean trType values in Table 2. For example, when the mts_idx value is “2”, DCT-8 may be applied as a transform on the horizontal axis and DST-7 may be applied as a transform on the vertical axis.

Also, even when the condition of step S1540 is not satisfied, the image decoding apparatus sets the value of mts_idx to “0” (S1530) and ends the process.

16 illustrates a method of performing an inverse transform based on a transform-related parameter according to another embodiment of the present invention. Each of the steps of FIG. 16 may be performed by an image decoding apparatus, for example, by an inverse transform unit of the decoding apparatus.

Referring to FIG. 16 , the image decoding apparatus acquires values of sps_explicit_mts_intra_enable_flag, sps_explicit_mts_inter_enable_flag, CuPredMode, lfnst_idx, transform_skip_flag, cbW, cbH, intrafla_subpartitions_mode_flag_flag, cu_sbt (S1610). At this time, sps_explicit_mts_intra_enable_flag, sps_explicit_mts_inter_enable_flag, CuPredMode, lfnst_idx, transform_skip_flag, cbW, cbH, intra_subpartitions_mode_flag, cu_sbt_flag The related MTS flags in detail are described in detail about what the decoding target is, and what parameters are explicitly described in block diagram 15 above. It is used to determine whether it is applicable or not.

Next, the image decoding apparatus obtains MtsZeroOutFlag and MtsDcOnlyFlag values (S1620). In this case, the MtsZeroOutFlag may be obtained by performing the process of FIG. 13 or 14 , and MtsDcOnlyFlag may be obtained by performing the process of FIG. 12 .

Next, the image decoding apparatus obtains an mts_idx value based on the parameters obtained in steps S1610 and S1620 ( S1630 ). That is, the image decoding apparatus obtains the MtsDcTsZeroOutlyFlagx values based on sps_explicit_mts_intra_enable_flag, sps_explicit_mts_inter_enable_flag, CuPredMode, lfnst_idx, transform_skip_flag, cbW, cbH, intra_subpartitions_mode_flag, MtsDcOnlyFlag_, MtsDcOnlyFlag_based. In this case, mts_idx may be obtained by performing the process of FIG. 15 .

Next, the image decoding apparatus performs inverse transform based on mts_idx (S1640). An inverse transform applied according to the mts_idx value may be configured according to Tables 5 and 2. For example, when the mts_idx value is “2”, DCT-8 may be applied in the horizontal axis direction and DST-7 may be applied in the vertical axis direction.

Meanwhile, although not shown in the drawing, from the viewpoint of an image encoding apparatus, sps_explicit_mts_intra_enable_flag, sps_explicit_mts_inter_enable_flag, CuPredMode, lfnst_idx, transform_skip_flagOn, cbW, dts_modeero_flag, Mcu_sbt. In the above embodiment, the methods are described on the basis of a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in a different order or concurrently with other steps as described above. have. In addition, those skilled in the art will understand that the steps shown in the flowchart are not exhaustive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the present invention.

Embodiments described in this document may be implemented and performed on a processor, microprocessor, controller, or chip. For example, the functional units shown in each figure may be implemented and performed on a computer, a processor, a microprocessor, a controller, or a chip. In this case, information for implementation (ex. information on instructions) or an algorithm may be stored in a digital storage medium.

In addition, the decoding device and the encoding device to which the present invention is applied are a multimedia broadcasting transmission/reception device, a mobile communication terminal, a home cinema video device, a digital cinema video device, a surveillance camera, a video conversation device, a real-time communication device such as a video communication device, and a mobile streaming device. Device, storage medium, camcorder, video on demand (VoD) service providing device, OTT video (Over the top video) device, internet streaming service providing device, three-dimensional (3D) video device, videophone video device, vehicle terminal (ex) It may be included in a vehicle terminal, an airplane terminal, a ship terminal, etc.) and a medical video device, and may be used to process a video signal or a data signal. For example, the OTT video (Over the top video) device may include a game console, a Blu-ray player, an Internet-connected TV, a home theater system, a smart phone, a tablet PC, a digital video recorder (DVR), and the like.

In addition, the processing method to which the present invention is applied may be produced in the form of a program executed by a computer, and may be stored in a computer-readable recording medium. Multimedia data having a data structure according to the present invention may also be stored in a computer-readable recording medium. The computer-readable recording medium includes all types of storage devices and distributed storage devices in which computer-readable data is stored. The computer-readable recording medium is, for example, Blu-ray Disc (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical It may include a data storage device. In addition, the computer-readable recording medium includes a medium implemented in the form of a carrier wave (eg, transmission through the Internet). In addition, the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted through a wired/wireless communication network.

In addition, an embodiment of the present invention may be implemented as a computer program product by program code, and the program code may be executed in a computer according to an embodiment of the present invention. The program code may be stored on a carrier readable by a computer.

Claims

An image decoding method performed by an image decoding apparatus, comprising:

obtaining information about a parameter indicating whether a multiple transform set (MTS) is applicable to a decoding object block, a width of the decoding object block, and a height of the decoding object block;

The decoding object block based on at least one of a parameter indicating whether multiple transform set (MTS) is applicable to the decoding object block, a width of the decoding object block, and information on a height of the decoding object block determining a transformation type of ;

The decoding object block based on at least one of a parameter indicating whether multiple transform set (MTS) is applicable to the decoding object block, a width of the decoding object block, and information on a height of the decoding object block setting a zero-out area of ; and

and performing an inverse transform on the decoding object block based on a result of determining a zero-out region and a transform type of the decoding object block.
The method of claim 1, wherein in the step of determining a transform type of the decoding object block,

When at least one of a width or a height of the decoding object block has a value greater than 32, it is determined that the decoding object block is transformed using a default transform.
The method of claim 1, wherein in the step of setting a zero-out area of the decoding object block,

When one of the width or height of the decoding object block has a value greater than 32, an area in which the width or height of the decoding object block is greater than 32 is set as a zero-out area.
The method of claim 1 , wherein the parameter indicating whether multiple transform sets can be applied to the decoding object block is sps_mts_enabled_flag.
An image decoding method performed by an image decoding apparatus, comprising:

Information on whether a multiple transform set (MTS) is applied to the decoding object block, information on a prediction mode, information on whether a secondary transform is applied, information on whether prediction using a matrix is applied, and the size of the decoding object block obtaining at least one of information about

Implicit multiplication is applied to the decoding object block based on at least one of information on whether a multiple transform set is applied to the decoding object block, information on a prediction mode, information on whether a secondary transform is applied, and information on whether prediction using a matrix is applied determining whether a transform set is applied;

obtaining information on a transform type based on information on whether an implicit multiple transform set is applied to the decoding object block and information on a size of the decoding object block; and

and performing an inverse transform based on the information on the transform type.
The method of claim 5, wherein determining whether the implicit multiple transform set is applied comprises:

Using information on whether multiple transform set (MTS) is applied to the decoding target block, information on a prediction mode, information on whether to apply a secondary transform, and information on whether or not prediction using a matrix is applied, the implicit multiple An image decoding method comprising determining whether a transform set is applied.
The method of claim 5, wherein the implicit multiple transform set includes one default transform and at least one extra transform.
The method of claim 7 , wherein the acquiring information on a transform type based on information on a size of the decoding object block comprises:

When the horizontal axis length of the decoding object block is 4 or more and 16, at least one of an extra transform type is applied to the decoding object block in the horizontal axis direction.
The method of claim 7 , wherein the acquiring information on a transform type based on information on a size of the decoding object block comprises:

When all of the vertical axis lengths of the decoding object block are 4 or more and 16 or less, the decoding object block applies at least one of an extra transform type to the vertical axis direction.
The method of claim 5, wherein the information on whether multiple transform set (MTS) is applied to the decoding object block comprises at least one of sps_mts_enabled_flag and sps_explicit_mts_intra_enabled_flag.
The method of claim 5 , wherein the information on the prediction mode includes CuPredMode.
The method of claim 5, wherein the information on whether the quadratic transform is applied includes lfnst_idx.
The method of claim 5 , wherein the information on whether or not prediction using the matrix is applied includes intra_mip_flag.
The method of claim 5, wherein the information on the transformation type of the decoding object block includes information on a horizontal transformation type and information on a vertical transformation type, respectively.
The method of claim 5, wherein the determining whether the implicit multiple transform set is applied to the decoding object block is obtained by additionally checking whether the decoding object block is a luma block. .
An image decoding apparatus comprising a memory and at least one processor, comprising:

the at least one processor,

Information on whether a multiple transform set (MTS) is applied to the decoding object block, information on a prediction mode, information on whether a secondary transform is applied, information on whether prediction using a matrix is applied, and the size of the decoding object block obtain at least one of the information regarding

Implicit multiplication is applied to the decoding object block based on at least one of information on whether a multiple transform set is applied to the decoding object block, information on a prediction mode, information on whether a secondary transform is applied, and information on whether prediction using a matrix is applied determine whether a transform set is applied;

obtaining information on a transform type based on information on whether an implicit multiple transform set is applied to the decoding object block and information on a size of the decoding object block;

and an inverse transform unit that performs an inverse transform based on the information on the transform type.