WO2020050651A1

WO2020050651A1 - Multiple transform selection-based image coding method and device therefor

Info

Publication number: WO2020050651A1
Application number: PCT/KR2019/011486
Authority: WO
Inventors: 살레히파메흐디; 김승환; 구문모; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2018-09-05
Filing date: 2019-09-05
Publication date: 2020-03-12

Abstract

An image decoding method according to the present invention comprises: acquiring prediction mode information, residual information, and transform index information from a bitstream; deriving a prediction mode for the current block on the basis of the prediction mode information; deriving prediction samples by performing intra-prediction for the current block when the prediction mode is an intra-prediction mode; deriving quantized transform coefficients for the current block on the basis of the residual information; deriving transform coefficients by performing dequantization on the basis of the quantized transform coefficients; deriving a horizontal transform kernel and a vertical transform kernel on the basis of the transform index information; deriving residual samples for the current block by performing inverse transformation of the transform coefficients on the basis of the horizontal transform kernel and the vertical transform kernel; and producing a reconstruction picture on the basis of the prediction samples and the residual samples.

Description

Image coding method and apparatus based on multiple transform selection

The present invention relates to an image coding technique, and more particularly, to an image coding method and apparatus based on multiple transform selection in an image coding system.

Recently, demand for high-resolution, high-quality video / video, such as 4K or 8K or higher Ultra High Definition (UHD) video / video, is increasing in various fields. As the video / video data becomes high-resolution and high-quality, the amount of information or bits transmitted relative to the existing video / video data increases, so the video data is transmitted using a medium such as a conventional wired / wireless broadband line or an existing storage medium. When using to store video / video data, transmission cost and storage cost increase.

In addition, recently, interest and demand for immersive media such as VR (Virtual Reality), AR (Artificial Realtiy) content, or holograms is increasing, and video / video having video characteristics different from reality videos such as game videos. The broadcast for is increasing.

Accordingly, a high-efficiency video / video compression technology is required to effectively compress, transmit, store, and reproduce information of a high-resolution, high-quality video / video having various characteristics as described above.

An object of the present invention is to provide a method and apparatus for improving image coding efficiency.

Another technical problem of the present invention is to provide a method and apparatus for increasing conversion efficiency.

Another technical problem of the present invention is to provide an image coding method and apparatus based on multiple transform selection.

Another technical problem of the present invention is to provide a method and apparatus for coding information on multiple transform selection that can increase coding efficiency.

According to an embodiment of the present invention, an image decoding method performed by a decoding apparatus is provided. The method includes obtaining prediction mode information, residual information, and transform index information from a bitstream, deriving a prediction mode for a current block based on the prediction mode information, and when the prediction mode is an intra prediction mode, Deriving prediction samples by performing intra prediction on the current block, deriving quantized transform coefficients for the current block based on the residual information, and performing inverse quantization based on the quantized transform coefficients Deriving transform coefficients, deriving a horizontal transform kernel and a vertical transform kernel based on the transform index information, and performing inverse transform on the transform coefficients based on the horizontal transform kernel and the vertical transform kernel to Deriving residual samples for the current block, and phase And generating a reconstructed picture based on the predicted samples and the residual samples, and deriving the horizontal transform kernel and the vertical transform kernel include transforms corresponding to the transform index information from a plurality of transform combinations. And selecting the combination, and deriving the horizontal transform kernel and the vertical transform kernel included in the selected transform combination.

According to another embodiment of the present invention, an image encoding method performed by an encoding device is provided. The method includes deriving a prediction mode for a current block, when the prediction mode is an intra prediction mode, performing intra prediction on the current block to derive prediction samples, and based on the prediction samples Deriving residual samples for, deriving a horizontal transform kernel and a vertical transform kernel applied to residual samples of the current block, and generating transform index information based on the horizontal transform kernel and the vertical transform kernel Performing transformation on the residual samples based on the horizontal transform kernel and the vertical transform kernel to derive transform coefficients for the current block, and quantized by performing quantization based on the transform coefficients Deriving transform coefficients, based on the quantized transform coefficients Generating residual information, and encoding image information including the prediction mode information, the residual information, and the transform index information, wherein the transform index information is the horizontal transform among a plurality of transform combinations. Characterized in that it represents a combination of a transformation including a kernel and the vertical transformation kernel.

According to the present invention, overall image / video compression efficiency can be improved.

According to the present invention, it is possible to reduce the amount of data to be transmitted for residual processing through efficient conversion, and to increase residual coding efficiency.

According to the present invention, when applying multiple transform selection, since different transform kernels can be applied in the horizontal and vertical directions according to the transform efficiency, overall coding efficiency can be improved.

1 schematically shows an example of a video / image coding system to which the present invention can be applied.

2 is a diagram schematically illustrating a configuration of a video / video encoding apparatus to which the present invention can be applied.

3 is a diagram schematically illustrating a configuration of a video / video decoding apparatus to which the present invention can be applied.

4 is a diagram schematically illustrating a multiplexing technique according to the present invention.

5 is a flowchart illustrating a process of determining a transform combination according to whether multiple transform selection (MTS or EMT) is applied according to an embodiment of the present invention.

6 and 7 are diagrams for explaining a non-separated quadratic transform (NSST) according to an embodiment of the present invention.

8 and 9 are views for explaining RST according to an embodiment of the present invention.

10 is a flowchart illustrating an encoding process in which multiple transform selection is performed according to an embodiment of the present invention.

11 is a flowchart illustrating a decoding process in which multiple transform selection is performed according to an embodiment of the present invention.

12 is a flowchart illustrating a process of encoding a multiple conversion selection flag (AMT flag) and multiple conversion index (AMT index) according to an embodiment of the present invention.

13 is a flowchart illustrating a decoding process of applying a horizontal transform or a vertical transform to a row or column based on a multiple transform selection flag (AMT flag) and a multiple transform index (AMT index) according to an embodiment of the present invention. .

14 shows three forward scan sequences that can be applied to a 4x4 transform coefficient or a transform coefficient block (4x4 block, Coefficient Group (CG)) applied in the HEVC standard.

15 and 16 are diagrams illustrating mapping of transform coefficients according to a diagonal scan order according to an embodiment of the present invention.

17 is a flowchart schematically illustrating a decoding method for performing inverse transform according to an embodiment of the present invention.

18 shows related components of a decoding apparatus for performing inverse transform according to an embodiment of the present invention.

19 is a flowchart schematically illustrating a video / video encoding method by an encoding device according to an embodiment of the present invention.

20 is a flowchart schematically illustrating a video / video decoding method by a decoding apparatus according to an embodiment of the present invention.

21 shows an example of a content streaming system to which the invention disclosed in this document can be applied.

The present invention can be applied to various changes and can have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the invention to the specific examples. Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the technical spirit of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms “include” or “have” are intended to indicate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, or that one or more other features or It should be understood that numbers, steps, operations, components, parts, or combinations thereof are not excluded in advance of the presence or addition possibilities.

On the other hand, each configuration in the drawings described in the present invention is shown independently for convenience of description of the different characteristic functions, it does not mean that each configuration is implemented in separate hardware or separate software from each other. For example, two or more of each configuration may be combined to form one configuration, or one configuration may be divided into a plurality of configurations. Embodiments in which each configuration is integrated and / or separated are also included in the scope of the present invention without departing from the spirit of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Hereinafter, the same reference numerals are used for the same components in the drawings, and duplicate descriptions for the same components may be omitted.

This document is about video / video coding. For example, the methods / embodiments disclosed in this document may include a versatile video coding (VVC) standard, an essential video coding (EVC) standard, an AOMedia Video 1 (AV1) standard, a 2nd generation of audio video coding standard (AVS2), or next-generation video / It can be applied to the method disclosed in the video coding standard (ex. H.267 or H.268, etc.).

In this document, various embodiments of video / image coding are proposed, and the above embodiments may be performed in combination with each other unless otherwise specified.

In this document, video may refer to a set of images over time. A picture generally refers to a unit representing one image in a specific time period, and a slice / tile is a unit constituting a part of a picture in coding. A slice / tile may include one or more coding tree units (CTUs). One picture may be composed of one or more slices / tiles. One picture may be composed of one or more tile groups. One tile group may include one or more tiles. The brick may represent a rectangular region of CTU rows within a tile in a picture. Tiles can be partitioned into multiple bricks, and each brick can be composed of one or more CTU rows in the tile (A tile may be partitioned into multiple bricks, each of which consisting of one or more CTU rows within the tile ). A tile that is not partitioned into multiple bricks may be also referred to as a brick. A brick scan can indicate a specific sequential ordering of CTUs partitioning a picture, the CTUs can be aligned with a CTU raster scan within a brick, and the bricks in a tile can be aligned sequentially with a raster scan of the bricks of the tile. A, and tiles in a picture can be sequentially aligned with a raster scan of the tiles of the picture (A brick scan is a specific sequential ordering of CTUs partitioning a picture in which the CTUs are ordered consecutively in CTU raster scan in a brick , bricks within a tile are ordered consecutively in a raster scan of the bricks of the tile, and tiles in a picture are ordered consecutively in a raster scan of the tiles of the picture). A tile is a rectangular region of CTUs within a particular tile column and a particular tile row in a picture. The tile column is a rectangular area of CTUs, the rectangular area has a height equal to the height of the picture, and the width can be specified by syntax elements in a picture parameter set (The tile column is a rectangular region of CTUs having a height equal to the height of the picture and a width specified by syntax elements in the picture parameter set). The tile row is a rectangular region of CTUs, the rectangular region has a width specified by syntax elements in a picture parameter set, and the height can be the same as the height of the picture (The tile row is a rectangular region of CTUs having a height specified by syntax elements in the picture parameter set and a width equal to the width of the picture). A tile scan can indicate a specific sequential ordering of CTUs partitioning a picture, the CTUs can be successively aligned with a CTU raster scan within a tile, and the tiles in a picture have been successively aligned with a raster scan of the tiles of the picture. (A tile scan is a specific sequential ordering of CTUs partitioning a picture in which the CTUs are ordered consecutively in CTU raster scan in a tile whereas tiles in a picture are ordered consecutively in a raster scan of the tiles of the picture). A slice may include an integer number of bricks of a picture, and the integer number of bricks may be included in one NAL unit (A slice includes an integer number of bricks of a picture that may be exclusively contained in a single NAL unit). A slice may consist of either a number of complete tiles or only a consecutive sequence of complete bricks of one tile ). Tile groups and slices can be used interchangeably in this document. For example, the tile group / tile group header in this document may be referred to as a slice / slice header.

A pixel or pel may mean a minimum unit constituting one picture (or image). In addition, 'sample' may be used as a term corresponding to a pixel. The sample may generally represent a pixel or a pixel value, may represent only a pixel / pixel value of a luma component, or may represent only a pixel / pixel value of a chroma component. Alternatively, the sample may mean a pixel value in the spatial domain, or a conversion coefficient in the frequency domain when the pixel value is converted into the frequency domain.

The unit may represent a basic unit of image processing. The unit may include at least one of a specific region of a picture and information related to the region. One unit may include one luma block and two chroma (ex. Cb, cr) blocks. The unit may be used interchangeably with terms such as a block or area depending on the case. In the general case, the MxN block may include samples (or sample arrays) of M columns and N rows or a set (or array) of transform coefficients.

In this document, "/" and "," are interpreted as "and / or". For example, “A / B” is interpreted as “A and / or B”, and “A, B” is interpreted as “A and / or B”. Additionally, “A / B / C” means “at least one of A, B and / or C”. Also, “A, B, and C” means “at least one of A, B, and / or C”. (In this document, the term "/" and "," should be interpreted to indicate "and / or." For instance, the expression "A / B" may mean "A and / or B." Further, "A, B "may mean" A and / or B. "Further," A / B / C "may mean" at least one of A, B, and / or C. "Also," A / B / C "may mean" at least one of A, B, and / or C. ")

Additionally, "or" in this document is interpreted as "and / or." For example, “A or B” may mean 1) only “A”, 2) only “B”, or 3) “A and B”. In other words, “or” in this document may mean “additionally or alternatively”. (Further, in the document, the term "or" should be interpreted to indicate "and / or." For instance, the expression "A or B" may comprise 1) only A, 2) only B, and / or 3) both A and B. In other words, the term "or" in this document should be interpreted to indicate "additionally or alternatively.")

Referring to FIG. 1, a video / image coding system may include a first device (source device) and a second device (receiving device). The source device may transmit the encoded video / image information or data to a receiving device through a digital storage medium or network in the form of a file or streaming.

The source device may include a video source, an encoding device, and a transmission unit. The receiving device may include a receiving unit, a decoding apparatus, and a renderer. The encoding device may be called a video / video encoding device, and the decoding device may be called a video / video decoding device. The transmitter can be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display unit, and the display unit may be configured as a separate device or an external component.

The video source may acquire a video / image through a capture, synthesis, or generation process of the video / image. The video source may include a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, one or more cameras, a video / image archive including previously captured video / images, and the like. The video / image generating device may include, for example, a computer, a tablet and a smart phone, and the like (electronically) to generate the video / image. For example, a virtual video / image may be generated through a computer or the like, and in this case, the video / image capture process may be replaced by a process in which related data is generated.

The encoding device can encode the input video / video. The encoding apparatus may perform a series of procedures such as prediction, transformation, and quantization for compression and coding efficiency. The encoded data (encoded video / video information) may be output in the form of a bitstream.

The transmitting unit may transmit the encoded video / video information or data output in the form of a bitstream to a receiving unit of a receiving device through a digital storage medium or a network in a file or streaming format. The digital storage media may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD. The transmission unit may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcast / communication network. The receiver may receive / extract the bitstream and deliver it to a decoding device.

The decoding apparatus may decode a video / image by performing a series of procedures such as inverse quantization, inverse transformation, and prediction corresponding to the operation of the encoding apparatus.

The renderer can render the decoded video / image. The rendered video / image may be displayed through the display unit.

2 is a diagram schematically illustrating a configuration of a video / video encoding apparatus to which the present invention can be applied. Hereinafter, the video encoding device may include a video encoding device.

Referring to FIG. 2, the encoding device 200 includes an image partitioner 210, a predictor 220, a residual processor 230, and an entropy encoder 240. It may be configured to include an adder (250), a filtering unit (filter, 260) and a memory (memory, 270). The prediction unit 220 may include an inter prediction unit 221 and an intra prediction unit 222. The residual processing unit 230 may include a transform unit 232, a quantizer 233, a dequantizer 234, and an inverse transformer 235. The residual processing unit 230 may further include a subtractor 231. The adder 250 may be called a reconstructor or a recontructged block generator. The above-described image segmentation unit 210, prediction unit 220, residual processing unit 230, entropy encoding unit 240, adding unit 250, and filtering unit 260 may include one or more hardware components ( For example, it may be configured by an encoder chipset or processor). Also, the memory 270 may include a decoded picture buffer (DPB), or may be configured by a digital storage medium. The hardware component may further include a memory 270 as an internal / external component.

The image division unit 210 may divide the input image (or picture, frame) input to the encoding apparatus 200 into one or more processing units. For example, the processing unit may be called a coding unit (CU). In this case, the coding unit is recursively divided according to a quad-tree binary-tree ternary-tree (QTBTTT) structure from a coding tree unit (CTU) or a largest coding unit (LCU). You can. For example, one coding unit may be divided into a plurality of coding units of a deeper depth based on a quad tree structure, a binary tree structure, and / or a ternary structure. In this case, for example, a quad tree structure may be applied first, and a binary tree structure and / or a ternary structure may be applied later. Alternatively, a binary tree structure may be applied first. The coding procedure according to the present invention can be performed based on the final coding unit that is no longer split. In this case, the maximum coding unit may be directly used as a final coding unit based on coding efficiency according to image characteristics, or the coding unit may be recursively divided into coding units having a lower depth than optimal if necessary. The coding unit of the size of can be used as the final coding unit. Here, the coding procedure may include procedures such as prediction, transformation, and reconstruction, which will be described later. As another example, the processing unit may further include a prediction unit (PU) or a transform unit (TU). In this case, the prediction unit and the transform unit may be partitioned or partitioned from the above-described final coding unit, respectively. The prediction unit may be a unit of sample prediction, and the transformation unit may be a unit for deriving a transform coefficient and / or a unit for deriving a residual signal from the transform coefficient.

The unit may be used interchangeably with terms such as a block or area depending on the case. In a general case, the MxN block may represent samples of M columns and N rows or a set of transform coefficients. The sample may generally represent a pixel or a pixel value, and may indicate only a pixel / pixel value of a luma component or only a pixel / pixel value of a saturation component. The sample may be used as a term for one picture (or image) corresponding to a pixel or pel.

The encoding apparatus 200 subtracts a prediction signal (a predicted block, a prediction sample array) output from the inter prediction unit 221 or the intra prediction unit 222 from the input image signal (original block, original sample array). A signal (residual signal, residual block, residual sample array) may be generated, and the generated residual signal is transmitted to the conversion unit 232. In this case, as illustrated, a unit for subtracting a prediction signal (a prediction block, a prediction sample array) from an input image signal (original block, original sample array) in the encoder 200 may be referred to as a subtraction unit 231. The prediction unit may perform prediction on a block to be processed (hereinafter, referred to as a current block), and generate a predicted block including prediction samples for the current block. The prediction unit may determine whether intra prediction or inter prediction is applied in units of a current block or CU. As described later in the description of each prediction mode, the prediction unit may generate various information regarding prediction, such as prediction mode information, and transmit it to the entropy encoding unit 240. The prediction information may be encoded by the entropy encoding unit 240 and output in the form of a bitstream.

The intra prediction unit 222 may predict the current block by referring to samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located apart depending on a prediction mode. In intra prediction, prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional mode may include, for example, a DC mode and a planar mode (Planar mode). The directional mode may include, for example, 33 directional prediction modes or 65 directional prediction modes depending on the degree of detail of the prediction direction. However, this is an example, and more or less directional prediction modes may be used depending on the setting. The intra prediction unit 222 may determine a prediction mode applied to the current block using a prediction mode applied to neighboring blocks.

The inter prediction unit 221 may derive the predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on the reference picture. At this time, to reduce the amount of motion information transmitted in the inter prediction mode, motion information may be predicted in units of blocks, subblocks, or samples based on the correlation of motion information between a neighboring block and a current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter prediction, the neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture. The reference picture including the reference block and the reference picture including the temporal neighboring block may be the same or different. The temporal neighboring block may be referred to by a name such as a collocated reference block or a colCU, and a reference picture including the temporal neighboring block may be called a collocated picture (colPic). It might be. For example, the inter prediction unit 221 constructs a motion information candidate list based on neighboring blocks, and provides information indicating which candidates are used to derive the motion vector and / or reference picture index of the current block. Can be created. Inter prediction may be performed based on various prediction modes. For example, in the case of the skip mode and the merge mode, the inter prediction unit 221 may use motion information of neighboring blocks as motion information of the current block. In the skip mode, unlike the merge mode, the residual signal may not be transmitted. In the case of a motion vector prediction (MVP) mode, a motion vector of a current block is obtained by using a motion vector of a neighboring block as a motion vector predictor and signaling a motion vector difference. I can order.

The prediction unit 220 may generate a prediction signal based on various prediction methods described below. For example, the prediction unit may apply intra prediction or inter prediction as well as intra prediction and inter prediction at the same time for prediction for one block. This can be called combined inter and intra prediction (CIIP). Also, the prediction unit may be based on an intra block copy (IBC) prediction mode or a palette mode for prediction of a block. The IBC prediction mode or palette mode may be used for content video / video coding such as a game, such as screen content coding (SCC). IBC basically performs prediction in the current picture, but may be performed similarly to inter prediction in that a reference block is derived in the current picture. That is, the IBC can use at least one of the inter prediction techniques described in this document. The palette mode can be regarded as an example of intra coding or intra prediction. When the palette mode is applied, a sample value in a picture may be signaled based on information on the palette table and palette index.

The prediction signal generated through the prediction unit (including the inter prediction unit 221 and / or the intra prediction unit 222) may be used to generate a reconstructed signal or may be used to generate a residual signal. The transform unit 232 may generate transform coefficients by applying a transform technique to the residual signal. For example, the transformation technique is DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (

), GBT (Graph-Based Transform), or CNT (Conditionally Non-linear Transform). Here, GBT refers to a transformation obtained from this graph when it is said that the relationship information between pixels is graphed. CNT means a transform obtained by generating a prediction signal using all previously reconstructed pixels and based on it. Also, the transform process may be applied to pixel blocks having the same size of a square, or may be applied to blocks of variable sizes other than squares.

The quantization unit 233 quantizes the transform coefficients and transmits them to the entropy encoding unit 240, and the entropy encoding unit 240 encodes the quantized signal (information about quantized transform coefficients) and outputs it as a bitstream. have. Information about the quantized transform coefficients may be called residual information. The quantization unit 233 may rearrange block-type quantized transform coefficients into a one-dimensional vector form based on a coefficient scan order, and quantize the quantized transform coefficients based on the one-dimensional vector form. Information regarding transform coefficients may be generated. The entropy encoding unit 240 may perform various encoding methods such as exponential Golomb (CAVLC), context-adaptive variable length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). The entropy encoding unit 240 may encode information necessary for video / image reconstruction (eg, a value of syntax elements, etc.) together with the quantized transform coefficients together or separately. The encoded information (ex. Encoded video / video information) may be transmitted or stored in units of network abstraction layer (NAL) units in the form of a bitstream. The video / image information may further include information regarding various parameter sets such as an adaptation parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). Also, the video / image information may further include general constraint information. In this document, information and / or syntax elements transmitted / signaled from an encoding device to a decoding device may be included in video / video information. The video / video information may be encoded through the above-described encoding procedure and included in the bitstream. The bitstream can be transmitted over a network or stored on a digital storage medium. Here, the network may include a broadcasting network and / or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD. The signal output from the entropy encoding unit 240 may be configured as an internal / external element of the encoding unit 200 by a transmitting unit (not shown) and / or a storing unit (not shown) for storing, or the transmitting unit It may be included in the entropy encoding unit 240.

The quantized transform coefficients output from the quantization unit 233 may be used to generate a prediction signal. For example, a residual signal (residual block or residual samples) may be reconstructed by applying inverse quantization and inverse transformation through the inverse quantization unit 234 and the inverse transformation unit 235 to the quantized transform coefficients. The adder 155 adds the reconstructed residual signal to the predicted signal output from the inter predictor 221 or the intra predictor 222, so that the reconstructed signal (restored picture, reconstructed block, reconstructed sample array) Can be created. If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block may be used as a reconstructed block. The adder 250 may be called a restoration unit or a restoration block generation unit. The generated reconstructed signal may be used for intra prediction of the next processing target block in the current picture, or may be used for inter prediction of the next picture through filtering as described below.

Meanwhile, LMCS (luma mapping with chroma scaling) may be applied during picture encoding and / or reconstruction.

The filtering unit 260 may improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, the filtering unit 260 may generate a modified restoration picture by applying various filtering methods to the restoration picture, and the modified restoration picture may be a DPB of the memory 270, specifically, the memory 270. Can be stored in. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, and the like. The filtering unit 260 may generate various information regarding filtering as described later in the description of each filtering method, and transmit it to the entropy encoding unit 240. The filtering information may be encoded by the entropy encoding unit 240 and output in the form of a bitstream.

The modified reconstructed picture transmitted to the memory 270 may be used as a reference picture in the inter prediction unit 221. When the inter prediction is applied through the encoding apparatus, prediction mismatch between the encoding apparatus 100 and the decoding apparatus can be avoided, and encoding efficiency can be improved.

The memory 270 DPB may store the modified reconstructed picture for use as a reference picture in the inter prediction unit 221. The memory 270 may store motion information of a block from which motion information in a current picture is derived (or encoded) and / or motion information of blocks in a picture that has already been reconstructed. The stored motion information may be transmitted to the inter prediction unit 221 to be used as motion information of a spatial neighboring block or motion information of a temporal neighboring block. The memory 270 may store reconstructed samples of blocks reconstructed in the current picture, and may transmit the reconstructed samples to the intra prediction unit 222.

Referring to FIG. 3, the decoding apparatus 300 includes an entropy decoder (310), a residual processor (320), a prediction unit (predictor, 330), an adder (340), and a filtering unit (filter, 350) and memory (memoery, 360). The prediction unit 330 may include an inter prediction unit 331 and an intra prediction unit 332. The residual processing unit 320 may include a deequantizer 321 and an inverse transformer 321. The entropy decoding unit 310, the residual processing unit 320, the prediction unit 330, the adding unit 340, and the filtering unit 350 described above may include one hardware component (eg, a decoder chipset or processor) according to an embodiment. ). Also, the memory 360 may include a decoded picture buffer (DPB), or may be configured by a digital storage medium. The hardware component may further include a memory 360 as an internal / external component.

When a bitstream including video / image information is input, the decoding apparatus 300 may restore an image in response to a process in which the video / image information is processed in the encoding apparatus of FIG. 2. For example, the decoding apparatus 300 may derive units / blocks based on block partitioning related information obtained from the bitstream. The decoding apparatus 300 may perform decoding using a processing unit applied in the encoding apparatus. Thus, the processing unit of decoding may be, for example, a coding unit, and the coding unit may be divided along a quad tree structure, a binary tree structure and / or a ternary tree structure from a coding tree unit or a largest coding unit. One or more transform units can be derived from the coding unit. Then, the decoded video signal decoded and output through the decoding device 300 may be reproduced through the reproduction device.

The decoding apparatus 300 may receive the signal output from the encoding apparatus of FIG. 2 in the form of a bitstream, and the received signal may be decoded through the entropy decoding unit 310. For example, the entropy decoding unit 310 may parse the bitstream to derive information (eg, video / image information) necessary for image reconstruction (or picture reconstruction). The video / image information may further include information regarding various parameter sets such as an adaptation parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). Also, the video / image information may further include general constraint information. The decoding apparatus may decode a picture further based on the information on the parameter set and / or the general restriction information. Signaling / receiving information and / or syntax elements described later in this document may be decoded through the decoding procedure and obtained from the bitstream. For example, the entropy decoding unit 310 decodes information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, and quantizes a value of a syntax element required for image reconstruction and a transform coefficient for residual. Can output In more detail, the CABAC entropy decoding method receives bins corresponding to each syntax element in the bitstream, and decodes the syntax element information to be decoded and decoding information of neighboring and decoding target blocks or information of symbols / bins decoded in the previous step. The context model is determined by using, and the probability of occurrence of the bin is predicted according to the determined context model, and arithmetic decoding of the bin is performed to generate a symbol corresponding to the value of each syntax element. have. At this time, the CABAC entropy decoding method may update the context model using the decoded symbol / bin information for the next symbol / bin context model after determining the context model. Among the information decoded by the entropy decoding unit 310, prediction information is provided to a prediction unit (inter prediction unit 332 and intra prediction unit 331), and the entropy decoding unit 310 performs entropy decoding. The dual value, that is, quantized transform coefficients and related parameter information, may be input to the residual processing unit 320. The residual processor 320 may derive a residual signal (residual block, residual samples, residual sample array). Also, information related to filtering among information decoded by the entropy decoding unit 310 may be provided to the filtering unit 350. Meanwhile, a receiving unit (not shown) that receives a signal output from the encoding device may be further configured as an internal / external element of the decoding device 300, or the receiving unit may be a component of the entropy decoding unit 310. Meanwhile, the decoding device according to this document may be called a video / picture / picture decoding device, and the decoding device may be classified into an information decoder (video / picture / picture information decoder) and a sample decoder (video / picture / picture sample decoder). It might be. The information decoder may include the entropy decoding unit 310, and the sample decoder may include the inverse quantization unit 321, an inverse transformation unit 322, an addition unit 340, a filtering unit 350, and a memory 360 ), At least one of an inter prediction unit 332 and an intra prediction unit 331.

The inverse quantization unit 321 may inverse quantize the quantized transform coefficients to output transform coefficients. The inverse quantization unit 321 may rearrange the quantized transform coefficients in a two-dimensional block form. In this case, the reordering may be performed based on the coefficient scan order performed by the encoding device. The inverse quantization unit 321 may perform inverse quantization on the quantized transform coefficients using a quantization parameter (for example, quantization step size information), and obtain transform coefficients.

The inverse transform unit 322 inversely transforms the transform coefficients to obtain a residual signal (residual block, residual sample array).

The prediction unit may perform prediction on the current block and generate a predicted block including prediction samples for the current block. The prediction unit may determine whether intra prediction or inter prediction is applied to the current block based on information about the prediction output from the entropy decoding unit 310, and may determine a specific intra / inter prediction mode.

The prediction unit 320 may generate a prediction signal based on various prediction methods described below. For example, the prediction unit may apply intra prediction or inter prediction as well as intra prediction and inter prediction at the same time for prediction for one block. This can be called combined inter and intra prediction (CIIP). Also, the prediction unit may be based on an intra block copy (IBC) prediction mode or a palette mode for prediction of a block. The IBC prediction mode or palette mode may be used for content video / video coding such as a game, such as screen content coding (SCC). IBC basically performs prediction in the current picture, but may be performed similarly to inter prediction in that a reference block is derived in the current picture. That is, the IBC can use at least one of the inter prediction techniques described in this document. The palette mode can be regarded as an example of intra coding or intra prediction. When the palette mode is applied, information on the palette table and palette index may be signaled by being included in the video / image information.

The intra prediction unit 331 may predict the current block by referring to samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located apart depending on a prediction mode. In intra prediction, prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra prediction unit 331 may determine a prediction mode applied to the current block using a prediction mode applied to neighboring blocks.

The inter prediction unit 332 may derive the predicted block for the current block based on the reference block (reference sample array) specified by the motion vector on the reference picture. At this time, to reduce the amount of motion information transmitted in the inter prediction mode, motion information may be predicted in units of blocks, subblocks, or samples based on the correlation of motion information between a neighboring block and a current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter prediction, the neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture. For example, the inter prediction unit 332 may construct a motion information candidate list based on neighboring blocks, and derive a motion vector and / or reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes, and information on the prediction may include information indicating a mode of inter prediction for the current block.

The adder 340 reconstructs the obtained residual signal by adding it to the prediction signal (predicted block, prediction sample array) output from the prediction unit (including the inter prediction unit 332 and / or the intra prediction unit 331). A signal (restored picture, reconstructed block, reconstructed sample array) can be generated. If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block may be used as a reconstructed block.

The adding unit 340 may be called a restoration unit or a restoration block generation unit. The generated reconstructed signal may be used for intra prediction of a next processing target block in a current picture, may be output through filtering as described below, or may be used for inter prediction of a next picture.

Meanwhile, LMCS (luma mapping with chroma scaling) may be applied in a picture decoding process.

The filtering unit 350 may improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, the filtering unit 350 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture, and the modified reconstructed picture may be a DPB of the memory 360, specifically, the memory 360 Can be transferred to. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, and the like.

The (corrected) reconstructed picture stored in the DPB of the memory 360 may be used as a reference picture in the inter prediction unit 332. The memory 360 may store motion information of a block from which motion information in a current picture is derived (or decoded) and / or motion information of blocks in a picture that has already been reconstructed. The stored motion information may be transmitted to the inter prediction unit 332 to be used as motion information of a spatial neighboring block or motion information of a temporal neighboring block. The memory 360 may store reconstructed samples of blocks reconstructed in the current picture, and may transmit the reconstructed samples to the intra prediction unit 331.

In the present specification, the embodiments described in the filtering unit 260, the inter prediction unit 221, and the intra prediction unit 222 of the encoding device 200 are respectively the filtering unit 350 and the inter prediction of the decoding device 300. The unit 332 and the intra prediction unit 331 may be applied to the same or corresponding.

As described above, in performing video coding, prediction is performed to improve compression efficiency. Through this, a predicted block including prediction samples for a current block as a block to be coded can be generated. Here, the predicted block includes prediction samples in a spatial domain (or pixel domain). The predicted block is derived equally from an encoding device and a decoding device, and the encoding device decodes information (residual information) about the residual between the original block and the predicted block, not the original sample value itself of the original block. Signaling to the device can improve video coding efficiency. The decoding apparatus may derive a residual block including residual samples based on the residual information, and combine the residual block and the predicted block to generate a reconstructed block including reconstructed samples, and reconstruct the reconstructed blocks. A reconstructed picture can be generated.

The residual information may be generated through a transform and quantization procedure. For example, the encoding device derives a residual block between the original block and the predicted block, and performs transformation procedures on residual samples (residual sample array) included in the residual block to derive transformation coefficients. And, by performing a quantization procedure on the transform coefficients, the quantized transform coefficients are derived to signal related residual information (via a bitstream) to a decoding apparatus. Here, the residual information may include value information of the quantized transform coefficients, location information, a transform technique, a transform kernel, quantization parameters, and the like. The decoding apparatus may perform an inverse quantization / inverse transformation procedure based on the residual information and derive residual samples (or residual blocks). The decoding apparatus may generate a reconstructed picture based on the predicted block and the residual block. The encoding apparatus may also inverse quantize / inverse transform quantized transform coefficients for reference for inter prediction of a picture to derive a residual block, and generate a reconstructed picture based thereon.

On the other hand, according to the present invention, in performing the transformation, the vertical component and the horizontal component can be separated and separately transformed. In this case, the transform kernel for the vertical direction and the transform kernel for the horizontal direction may be selected separately. This can be called multiple transform selection (MTS).

Referring to FIG. 4, the transform unit may correspond to the transform unit in the encoding apparatus of FIG. 2 described above, and the inverse transform unit may correspond to the inverse transform unit in the encoding apparatus of FIG. 2 described above or the inverse transform unit in the decoding apparatus of FIG. 3. .

The transform unit may derive (primary) transform coefficients by performing a primary transform based on residual samples (residual sample array) in the residual block (S410). Here, the primary transform may be referred to as a core transform. The primary transform may be based on multiple transform selection (MTS), and may be referred to as a multiple core transform when multiple transforms are applied as the primary transform.

The transform unit may derive (secondary) transform coefficients by performing a second transform based on the (primary) transform coefficients (S420). (Secondary) transform coefficients can be referred to as modified transform coefficients.

The first-order transform is a transformation from the spatial domain to the frequency domain, and the second-order transform means transforming into a more compressed expression by using a correlation existing between (first-order) transform coefficients. Here, the second order transform may include a non-separable transform. In this case, the secondary transform may be called a non-separable secondary transform (NSST) or a reduced secondary transform (RST).

Since the second transform is performed to further improve the transform performance, the transform unit may selectively perform the second transform. In the embodiment of FIG. 4, the second (inverse) transform is described based on the performance, but the second transform may be omitted.

The transform unit may transmit the (second order) transform coefficients derived by performing the second transform to the quantization unit. The quantization unit may quantize transform coefficients by performing quantization on (secondary) transform coefficients. Then, the quantized transform coefficients may be encoded, signaled to a decoding device, and transmitted to an inverse quantization / inverse conversion unit in the encoding device.

When the secondary transform is omitted, (primary) transform coefficients, which are outputs of the primary transform, may be derived as quantized transform coefficients through a quantization unit. And the quantized transform coefficients are encoded and signaled to a decoding apparatus, and can also be transmitted to an inverse quantization / inverse transform unit in the encoding apparatus.

The inverse transform unit may perform a series of procedures in the reverse order of the procedure performed by the above-described transform unit. The inverse transform unit receives (inverse quantized) transform coefficients, performs a second (inverse) transform to derive (first) transform coefficients (S450), and performs a first (inverse) transform of the (first) transform coefficients. By performing, the residual block (residual samples) may be obtained (S460).

Here, the primary transform coefficients may be referred to as modified transform coefficients from the viewpoint of the inverse transform unit. As described above, the encoding device and the decoding device can generate a reconstructed block based on the residual block and the predicted block, and generate a reconstructed picture based on the residual block.

When the second (inverse) transform is omitted, the inverse transform unit may receive (inverse quantized) transform coefficients to perform a first inverse transform to obtain residual blocks (residual samples). As described above, the encoding device and the decoding device can generate a reconstructed block based on the residual block and the predicted block, and generate a reconstructed picture based on the residual block.

On the other hand, as described above, it is possible to perform various stages of conversion when applying the conversion. As described above with reference to FIG. 4, two steps may be applied: a primary transform and a secondary transform, or more transform stages may be added according to an algorithm. The primary transform may be performed using a DCT (Discrete Cosine Transform) and / or a DST (Discrete Sine Transform) transform type. In an embodiment, DCT type 2 may be applied to the first transform as in HEVC, or DST type 7 may be applied to a specific case. For example, in the intra prediction mode, DST type 7 may be applied only to a specific case such as a 4X4 block. In another embodiment, multiple transform selection may be applied to the first transform, and in this case, a combination of multiple transforms may be applied. In this case, a primary transform based on multiple transform selection may be referred to as an explicit multiple transform (EMT). For example, in explicit multiplex conversion, a combination of conversion types such as DST type 7 (DST7), DCT type 8 (DCT8), DST type 1 (DST1), DCT type 5 (DCT5), and DCT type 2 (DCT2) is used. Can be used.

Table 1 and Table 2 below exemplarily show a combination of transforms used in multiple transform selection (explicit multiple transform). Table 1 shows the combinations of multiple transform selections applied in the intra prediction mode, and Table 2 shows the combinations of multiple transform selections applied in the inter prediction mode.

Referring to Table 1, when the intra prediction mode is applied, a transform set may be configured according to the intra prediction mode, and each transform set may include a plurality of transform combination candidates. For example, the transform set may be composed of five sets (Set0 to Set4) according to the intra prediction mode, and each transform set (Set0 to Set4) may include transform combination candidates with an index value of 0 to 3 set. have. Each transform combination candidates may be composed of a horizontal transform applied to a row and a vertical transform applied to a column, and horizontal based on a combination of DST7, DCT8, DST1, and DCT5. The type of transform and vertical transform can be determined.

Referring to Table 2, when the inter prediction mode is applied, a transform combination may be configured differently according to whether to apply a multiple transform selection to a corresponding block (eg, EMT_CU_Flag). For example, if multiple transform selection is not applied to a corresponding block (for example, EMT_CU_Flag is 0), a transform combination set that applies DCT2 to horizontal transform and vertical transform can be used. Alternatively, when multiple transform selection is applied to a corresponding block (for example, when EMT_CU_Flag is 1), a transform combination set including 4 transform combination candidates may be used. In this case, the transform combination set may include transform combination candidates in which index values of 0 to 3 are set, and the types of horizontal transform and vertical transform may be determined for each transform combination candidate based on the combination of DST7 and DCT8.

According to an embodiment of the present invention, application of multiple transform selection in block units (eg, in CU units in the case of HEVC), using a syntax element indicating whether to apply multiple transform selection to the current block Can decide. As an example, the syntax element may use EMT_CU_flag.

When EMT_CU_flag is 0 in the intra prediction mode, it may be determined that multiple transform selection is not applied to the current block. At this time, DCT2 or 4x4 DST7 may be applied as in the case of using a single transform (eg, HEVC). When EMT_CU_flag is 1 in the intra prediction mode, it may be determined to apply multiple transform selection to the current block. At this time, the multiple conversion combinations presented in Table 1 above can be applied. The possible multiple transform combinations may vary depending on the intra prediction mode as shown in Table 1 above.For example, when the intra prediction mode is 14, 15, 16, 17, 18, 19, 20, 21, 22, the horizontal direction By applying DST7 and DCT5 and DST7 and DCT8 in the vertical direction, a total of four possible combinations can be allowed. Therefore, it is necessary to separately signal which of the four combinations to apply. To this end, 2 bits of index information may be used, and, for example, one of 4 transform combinations may be signaled through signaling of the 2 bits of the EMT_TU_index syntax element.

In the inter prediction mode, if EMT_CU_flag is 0, DCT2 may be applied as shown in Table 2, and if EMT_CU_flag is 1, multiple transform combinations shown in Table 2 may be applied. For example, four possible combinations can be used by applying DST7 and DCT8 as shown in Table 2 above.

More specifically, referring to FIG. 5, the decoding apparatus may acquire and parse (entropy decode) the EMT_CU_flag syntax element (S500). Then, the decoding apparatus may determine whether to apply the multiple transform selection according to the result value of the parsed EMT_CU_flag (S510).

When EMT_CU_flag is 0, the decoding apparatus determines that multiple transform selection is not applied, and may perform transformation by applying DCT2 to the current block (S515).

When EMT_CU_flag is 1, the decoding apparatus determines to apply the multiple transform selection, so whether the number of non-zero transform coefficients is less than or equal to a certain threshold value (eg, 2) for transform coefficients in the current block. It can be determined (S520).

When the number of non-zero transform coefficients is equal to or less than a certain threshold, the decoding apparatus omits parsing for EMT_TU_index, sets the EMT_TU_index value to 0, and applies DST7 to the current block as shown in Table 1 above. It can be performed (S525).

If the number of non-zero transform coefficients is not equal to or less than a specific threshold, the decoding apparatus may parse (entropy decode) the EMT_TU_index syntax element (S530).

The decoding apparatus may perform a transformation by determining a combination of horizontal and vertical transformations for the current block according to the parsed EMT_TU_index value (S535). In this case, multiple transforms may be performed by selecting horizontal transforms and vertical transforms corresponding to EMT_TU_index values based on the transform combinations shown in Tables 1 and 2 above.

Meanwhile, in applying multiple transform selection, a block size to which multiple transform selection is applied may be limited. For example, it is possible to limit the block size to 64x64, and if it is larger than 64x64, multiple conversions may not be applied.

As described with reference to FIG. 4, in performing the transformation, after applying the first transform, the second transform may be additionally applied. Here, the second transform may use a non-separable secondary transform (NSST) or a reduced secondary transform (RST).

NSST is applied only in the intra prediction mode, and each intra prediction mode has a transform set applicable. Table 3 below shows an example in which a transform set for each intra prediction mode is allocated in NSST.

In one embodiment, the transform set in NSST may be established using symmetry for the prediction direction. For example, since intra prediction modes 52 and 16 are symmetric based on intra prediction mode 34 (diagonal direction), the same transform set can be applied as shown in Table 3 above. In this way, intra prediction modes that are symmetrical to each other using the symmetry of the prediction direction may be formed as a group to allocate the same transform set. However, in applying the transformation to intra prediction modes (eg, modes 52 and 16) that are symmetrical to each other based on the diagonal direction, input data is input to any one mode (eg, a vertical direction mode such as mode 52). It can be applied after transpose.

The intra prediction mode may include two non-directinoal (or non-angular) intra prediction modes and 65 directional (or angular) intra prediction modes. In some cases, the intra prediction mode No. 67 may be further used, and the intra prediction mode No. 67 may represent a linear model (LM) mode. When these intra prediction modes are used, a total of 35 transform sets may be configured as shown in Table 3 above. Here, in the non-directional planar mode (planar) mode (0) and DC mode (1), since there is no symmetry, each transform set has a transform set, and each transform set includes two transforms. Can be. For the remaining directional modes, it may be configured to include 3 transforms per transform set. Therefore, the total number of possible transforms can be (2x2 + 33x3) = 103.

NSST is not applied to the entire block to which the first transform is applied (eg, TU in the case of HEVC), but can be applied only to the top-left 8x8 region of the block. Of course, it can be applied to the entire area for blocks of size 8x8 or less.

That is, if the size of the block is 8x8 or more, 8x8 NSST is applied, and if it is less than 8x8, 4x4 NSST is applied. In this case, 4x4 NSST may be applied after dividing into 4x4 blocks. Both 8x8 NSST and 4x4 NSST follow the transformation set configuration of Table 3 described above. As a non-separated transform, 8x8 NSST receives 64 data and outputs 64 data, and 4x4 NSST has 16 inputs and 16 outputs.

Both 8x8 NSST and 4x4 NSST can be configured with a hierarchical combination of Givens rotations. The matrix corresponding to one Givens rotation may be equal to Equation 1.

The calculation for Givens rotation based on Equation 1 may be illustrated as in FIG. 6. 6 is a diagram of the matrix product of Equation (1). As shown in FIG. 6, since one Givens rotation rotates two data, in order to process 64 (for 8x8 NSST) or 16 (for 4x4 NSST) data, a total of 32 or 8 Givens rotation respectively This is necessary. Therefore, a bundle of 32 or 8 is formed to form a Givens rotation layer.

7 shows a process in which four Givens rotation layers are sequentially processed for the 4x4 NSST. As shown in FIG. 7, the output data for one Givens rotation layer is transmitted as input data for the next Givens rotation layer through a predetermined permutation (shuffling). As shown in FIG. 7, the permutation pattern is determined regularly, and in the case of 4x4 NSST, one round is formed by combining the four Givens rotation layers and the permutations. In the case of 8x8 NSST, 6 Givens rotation layers and corresponding permutations form a round. 4x4 NSST goes through 2 rounds, and 8x8 NSST goes through 4 rounds. Different rounds use the same permutation pattern, but the Givens rotation angles applied are different. Therefore, angle data for all Givens rotations constituting each transformation should be stored.

As the last step, one permutation is finally performed on the data output through the Givens rotation layers, and the permutation information is separately stored for each transformation. In forward NSST, the permutation is performed last, and in reverse NSST, the reverse permutation is applied first. In the case of the reverse NSST, the Givens rotation layers and permutations applied in the forward NSST can be performed in reverse order, and the angle of each Givens rotation is taken and rotated.

In performing the secondary transform as described above, the NSST or a reduced secondary transform (RST) described later may be used.

Assuming that an orthogonal matrix representing one transform has an NxN form, RT (Reduced Transform) leaves only R among N transform base vectors (here, R <N). The matrix for the forward RT that produces the transform coefficients is given by Equation 2.

Since the matrix for the inverse RT becomes a transpose matrix of the forward RT matrix, the application of the forward RT and the reverse RT is schematically illustrated in FIG. 8.

In one embodiment, RT may be applied to the upper left 8x8 block of a block (hereinafter, a transform coefficient block) composed of transform coefficients that have undergone first-order transformation. In this case, RT can be named 8x8 RST. When the R value in Equation 2 is set to 16, 8x8 RST has a 16x64 matrix form in the case of forward 8x8 RST and 64x16 matrix form in the case of reverse 8x8 RST. Also, for 8x8 RST, the same transform set configuration as in Table 3 above can be applied. That is, the corresponding 8x8 RST may be applied according to the transform set in Table 3 above. Since one transform set is composed of two or three transforms according to the intra prediction mode, it can be configured to select one of up to four transforms, including when a second transform is not applied (one transform) Is the identity matrix). Assuming that indexes of 0, 1, 2, and 3 are assigned to each of the 4 transforms (for example, index 0 can be assigned to an identity matrix, that is, when the second transform is not applied), NSST index Signaling the syntax element (syntax element) for each transform coefficient block may be applied to designate a transform applied. That is, for the 8x8 upper left block through the NSST index, in the case of NSST, 8x8 NSST may be designated, and in the RST configuration, 8x8 RST may be designated.

When the forward 8x8 RST as in Equation 2 is applied, 16 valid transform coefficients are generated, so it can be seen that 64 input data constituting the 8x8 region are reduced to 16 output data. From the perspective of the two-dimensional domain, a transform coefficient that is valid only for a quarter of the domain is filled. Therefore, 16 output data obtained by applying the forward 8x8 RST can be filled in the upper left area of the transform coefficient block.

FIG. 9 is a view showing a transform coefficient scan order, and when the forward scan order is assigned from 1 (on the forward scan order), it shows scanning from the 17th coefficient to the 64th coefficient. Since the reverse scan is shown in FIG. 9, it can be seen that the reverse scan is performed from the 64th to the 17th (see arrow direction).

Referring to FIG. 9, the upper left 4x4 region of the transform coefficient block is an ROI (Region Of Interest) region in which a valid transform coefficient is filled, and the rest of the region is vacated. The blanked area may be filled with a value of 0 by default. If it is found that a valid non-zero transform coefficient is found in addition to the ROI region of FIG. 9, since it is certain that 8x8 RST is not applied, the corresponding NSST index coding can be omitted. Conversely, if a non-zero transform coefficient is not found outside the ROI region of FIG. 9 (when 8x8 RST is applied, and if it is filled with 0 for a region other than the ROI), it is possible that 8x8 RST was applied, so the NSST index is coded. can do. Such conditional NSST index coding needs to be checked for the presence or absence of a non-zero transform coefficient, and thus can be performed after the residual coding process.

Hereinafter, an encoding / decoding process in which multiple transform selection (MTS) is performed will be described. In this specification, multiple transform selection (MTS) may be referred to as an explicit multiple transform (EMT) or an adaptive multiple transform (AMT).

In addition, although the present specification basically describes an embodiment in which transforms are applied separately for horizontal and vertical directions, such combinations of transforms (horizontal and vertical transform combinations) are also used as non-separable transforms. Configuration is possible. Alternatively, it may be composed of a mixture of separable transforms and non-separable transforms. In this case, if a non-separable transform is used, selection by a row / column transformation or selection by a horizontal / vertical direction becomes unnecessary, and a separable transform The conversion combinations of Tables 1 and 2 above can be used only when is selected.

In addition, the schemes proposed in this specification can be applied regardless of a primary transform or a secondary transform. That is, there is no restriction that it should be applied to either one, and both can be applied. Here, the primary transform may mean a transform for initially transforming a residual block, and the secondary transform transforms a block generated as a result of the primary transform. It may mean a transformation for applying.

Referring to FIG. 10, the encoding device may determine a transform group corresponding to the current block (S1010). Here, the transform group may refer to the transform groups of Table 1 and Table 2, but the present invention is not limited thereto and may be composed of other transform combinations.

The encoding apparatus may perform transformation on candidate transformation combinations available in the transformation group (S1020).

As a result of the conversion, the encoding apparatus may determine or select a conversion combination having the smallest RD (Rate Distortion) cost (S1030).

The encoding device may encode a transform combination index corresponding to the selected transform combination (S1040).

Referring to FIG. 11, the decoding apparatus may determine a transform group for the current block (S1110).

The decoding apparatus may parse the transform combination index (S1120). Here, the transform combination index may correspond to any one of a plurality of transform combinations in the transform group.

The decoding apparatus may derive a transform combination corresponding to the transform combination index (S1130). Here, the conversion combination may mean the conversion combination described in Tables 1 and 2, but the present invention is not limited to this, and other conversion combinations may also be used.

The decoding apparatus may perform inverse transform on the current block based on the transform combination (S1140). If the transformation combination consists of a row transformation and a column transformation, the row transformation can be applied first and then the column transformation. However, the present invention is not limited to this, and may be applied on the contrary, or when the non-separation transformations are configured, the non-separation transformation may be directly applied.

Meanwhile, in another embodiment, a process of determining a transform group and a process of parsing a transform combination index may be performed simultaneously.

Referring to FIG. 12, the encoding apparatus may determine whether adaptive multiple transforms (AMT) are applied to the current block (S1210).

If AMT is applied, the encoding device may encode the AMT flag to 1 (S1220).

Then, the encoding device may determine the AMT index based on at least one of a prediction mode, a horizontal transform, and a vertical transform of the current block (S1230). Here, the AMT index refers to an index indicating any one of a plurality of transform combinations for each intra prediction mode, and the AMT index may be transmitted for each transform unit.

When the AMT index is determined, the encoding device may encode the AMT index (S1240).

Meanwhile, when AMT is not applied, the encoding apparatus may encode the AMT flag to 0 (S1250).

Referring to FIG. 13, the decoding apparatus may parse the AMT flag from the bitstream (S1310). Here, the AMT flag may indicate whether adaptive multiple transforms (AMT) are applied to the current block.

The decoding device may check whether AMT is applied to the current block based on the AMT flag (S1320). For example, it is possible to check whether the AMT flag is 1.

If the AMT flag is 1, the decoding apparatus may check whether the number of non-zero transform coefficients is greater than or equal to a threshold (S1330). For example, the threshold can be set to 2, which can be set differently based on the block size or the size of the transform unit.

If the number of non-zero transform coefficients is greater than the threshold, the decoding apparatus may parse the AMT index (S1340). Here, the AMT index refers to an index indicating any one of a plurality of transform combinations for each intra prediction mode or inter prediction mode, and the AMT index may be transmitted for each transformation unit. Alternatively, the AMT index may mean an index indicating any one of the conversion combinations defined in the preset conversion combination table, where the preset conversion combination table may mean Tables 1 and 2, but the present invention It is not limited.

The decoding apparatus may derive or determine the horizontal transform and the vertical transform based on at least one of the AMT index or prediction mode (S1350).

Alternatively, the decoding device may derive a transform combination corresponding to the AMT index. For example, the decoding apparatus may derive or determine the horizontal transform and vertical transform corresponding to the AMT index.

Meanwhile, when the number of non-zero transform coefficients is not greater than the threshold, the decoding apparatus may apply a preset vertical inverse transform for each column (S1360). For example, the vertical inverse transform may be an inverse transform of DST7. Then, the decoding apparatus may apply a predetermined horizontal inverse transform for each row (S1370). For example, the horizontal inverse transform may be an inverse transform of DST7.

That is, when the number of non-zero transform coefficients is not greater than the threshold, a transform kernel preset in the encoding device or the decoding device may be used. For example, not defined in the conversion combination table such as Table 1 and Table 2, a conversion kernel that is widely used may be used.

Meanwhile, when the AMT flag is 0, the decoding apparatus may apply a predetermined vertical inverse transform for each column (S1380). For example, the vertical inverse transform may be an inverse transform of DCT2. Then, the decoding apparatus may apply a predetermined horizontal inverse transform for each row (S1390). For example, the horizontal inverse transform may be an inverse transform of DCT2.

That is, when the AMT flag is 0, a conversion kernel preset in the encoding device or the decoding device may be used. For example, not defined in the conversion combination table such as Table 1 and Table 2, a conversion kernel that is widely used may be used.

Hereinafter, the design and associated optimization methods of RST that can be applied to a 4x4 block from the above-described RST structure will be described. Naturally, some concepts can be applied not only to 4x4 RST, but also to 8x8 RST or other forms of transformation.

In applying RST, the present invention proposes an RST that can be applied to a 4x4 block.

In one embodiment of the present invention, a non-separable transform or RST that can be applied to one 4x4 block, that is, a 4x4 transform block to be transformed, is a 16x16 transform. That is, if the data elements constituting the 4x4 block are arranged in a row in a row-first or column-first order, a 16x1 vector can be applied, and non-separated transformation or RST can be applied to the block. . The forward 16x16 transform is composed of 16 row direction transform basis vectors, and when an inner product is taken for the 16x1 vector and each transform base vector, a transform coefficient for the transform base vector is obtained. do. The process of obtaining the corresponding transform coefficients for the 16 transform base vectors is the same as multiplying the 16x16 non-separated transform or RST matrix by the input 16x1 vector. The transform coefficients obtained by matrix multiplication have a 16x1 vector form, and statistical characteristics may be different for each transform coefficient. For example, when the 16x1 transform coefficient vector is composed of 0th to 15th elements, the variance of the 0th element may be greater than that of the 15th element. That is, the larger the element located in front, the larger the corresponding variance value and the larger the energy value.

By applying the inverse 16x16 non-separated transform or the inverse RST from the 16x1 transform coefficient (when ignoring effects such as quantization or integer calculation), the original 4x4 block signal can be restored before the transform. If the forward 16x16 non-separated transform is an orthonormal transform, the corresponding backward 16x16 transform can be obtained by taking a transpose of a matrix for the forward 16x16 transform. For simplicity, multiplying the inverse 16x16 non-separated transform matrix by a 16x1 transform coefficient vector yields data in the form of a 16x1 vector, and arranging it in the row-first or column-first order that was applied first can restore the 4x4 block signal.

As described above, elements constituting the 16x1 transform coefficient vector may have different statistical characteristics. As in the previous example, if the transform coefficients arranged near the front (closer to the 0th element) have a larger energy, the inverse transform (inverse transform) is applied to some transform coefficients that appear first without using all transform coefficients. It is also possible to restore a signal that is fairly close to the original signal. For example, when the inverse 16x16 non-separation transform is composed of 16 column basis vectors, only the L column base vectors are left to form the 16xL matrix, and L transform coefficients, which are more important among transform coefficients After leaving only the fields (Lx1 vector, which may appear first as in the previous example), multiplying the 16xL matrix by the Lx1 vector can restore the original input 16x1 vector data and the 16x1 vector with little error. As a result, since only L coefficients are involved in data restoration, it is only necessary to obtain an Lx1 transform coefficient vector, not a 16x1 transform coefficient vector, when obtaining a transform coefficient. That is, L significant transform coefficients can be obtained by selecting L corresponding row direction transform vectors from the forward 16x16 non-separated transform matrix and constructing the Lx16 transform and multiplying with the 16x1 input vector.

Here, the L value has a range of 1 <= L <16, and in general, L can be selected by an arbitrary method among 16 transform base vectors, but in terms of encoding and decoding, a signal as in the example presented above It may be advantageous from the viewpoint of encoding efficiency to select transform base vectors having high importance in terms of energy.

In addition, in applying RST, the present invention proposes a method of setting an application area of 4x4 RST and arranging transform coefficients.

In one embodiment of the present invention, 4x4 RST may be applied as a secondary transform, and at this time, may be applied secondary to a block to which a primary transform such as DCT-type 2 is applied. When the size of a block to which the first transform is applied is assumed to be NxN, 4x4 RST may be applied when NxN is equal to or larger than 4x4. Therefore, an example of applying 4x4 RST to an NxN block is as follows.

1) 4x4 RST can be applied only to some areas, not all areas of NxN. For example, it can be applied only to the top-left MxM region (M <= N).

2) After the region to which the second transform is applied is divided into 4x4 blocks, 4x4 RST may be applied to each divided block.

3) The above 1) and 2) can be mixed and applied. For example, after dividing the upper left MxM area into 4x4 blocks, 4x4 RST may be applied to the divided area.

As a specific example, the second transform is applied only to the upper left 8x8 region, when the NxN block is equal to or greater than 8x8, 8x8 RST is applied, and when the NxN block is smaller than 8x8 (4x4, 8x4, 4x8), 2) After dividing into 4x4 blocks, each 4x4 RST can be applied.

When L transform coefficients (1 헚 <16) are generated after applying 4x4 RST, there is a degree of freedom in how to place L transform coefficients (ie, how to map transform coefficients in a target block). However, since a predetermined order will exist when the transform coefficients are read and processed in the residual coding part, coding performance may vary depending on how the L transform coefficients are arranged in a 2D block. The residual coding in HEVC starts coding from the position farthest from the DC position. This is to increase coding performance by taking advantage of the fact that the farther away from the DC position, the quantized coefficient value is zero or close to zero. Therefore, it may be advantageous in terms of coding performance to arrange more important coefficients having high energy for the L transform coefficients to be coded later in the order of residual coding.

14 shows three forward scan sequences that can be applied to a 4x4 transform coefficient or a transform coefficient block (4x4 block, Coefficient Group (CG)) applied in the HEVC standard. (a) is a diagonal scan, (b) is a horizontal scan, and (c) is a vertical scan.

In the residual coding, the reverse order of the scan order in FIG. 14 is followed, that is, it is coded in the order of 16 to 1. Since the three scan orders illustrated in FIG. 14 are selected according to the intra prediction mode, the L transform coefficients may be configured to determine the scan order according to the intra prediction mode.

15 and 16 are diagrams illustrating mapping of transform coefficients according to a diagonal scan order according to an embodiment of the present invention. 15 and 16 show an example of placing valid transform coefficients in a diagonal scan order when 4x4 RST is applied to a 4x8 block.

In one embodiment, when a 4x4 RST is applied by dividing the upper left 4x8 block into 4x4 blocks according to the diagonal scan order, and the L value is 8 (that is, if only 8 transform coefficients are left out of 16), FIG. 15 As shown, transform coefficients may be located. For example, as shown in FIG. 15, a transform coefficient may be mapped to a half area of each 4x4 block, and a value of 0 may be filled in by default for positions marked with X.

Accordingly, it is assumed that L transform coefficients are arranged for each 4x4 block according to the scan order shown in FIG. 14, and it is assumed that the remaining (16-L) positions of each 4x4 block are filled with 0, corresponding residual coding. Can be applied (eg, residual coding in existing HEVC).

In another embodiment, as shown in FIG. 16 (a), L transform coefficients that are respectively placed in two 4x4 blocks may be combined and arranged in one 4x4 block as shown in FIG. 16 (b). In particular, when the L value is 8, since the transform coefficients of two 4x4 blocks are disposed in one 4x4 block and completely filled in one 4x4 block, no other transform coefficients are left in the other 4x4 block. Therefore, since most residual coding is unnecessary for a 4x4 block vacated like this, in the case of HEVC, the corresponding coded_sub_block_flag can be coded as 0. Here, coded_sub_block_flag applied to HEVC (or VVC) is flag information for specifying the position of a sub-block that is a 4x4 array for 16 transform coefficient levels in the current transform block, and "0" for a 4x4 block in which no residual remains. It may be signaled as.

In addition, various methods are possible for mixing the transform coefficients of two 4x4 blocks. In general, they can be combined in any order, but the following methods are practical examples.

(1) The transform coefficients of two 4x4 blocks are mixed alternately in the scan order. That is, the transform coefficient for the upper block in FIG. 11 is

And the transform coefficient of the sub-block

When I say,

You can mix them one by one as shown. sure

So that it is placed first

Wow

You can change the order.

(2) The transform coefficients for the first 4x4 block can be arranged first, followed by the transform coefficients for the second 4x4 block. In other words,

It can be arranged by connecting as follows. naturally,

You can also change the order like so:

Hereinafter, a method of coding an NSST index for 4x4 RST will be described. The first method is a case where the NSST index is coded after residual coding, and the second method is a case where the NSST index is coded before residual coding.

In coding the NSST index, in the present invention, the NSST index can be coded after residual coding.

As illustrated in FIG. 15, when 4x4 RST is applied, 0 values may be filled from L + 1 to 16th according to a transform coefficient scan order for each 4x4 block. Therefore, if any one of the two 4x4 blocks has a non-zero value in the L + 1th to 16th positions, it corresponds to a case where 4x4 RST is not applied. If the 4x4 RST also has a structure to select and apply one of the prepared transform sets such as NSST, an index (which may be referred to as a transform index, an RST index, or an NSST index) to which a transform is applied may be signaled.

Suppose that the decoding apparatus can know the NSST index through bitstream parsing, and perform this parsing process after residual coding. In this case, if residual coding is performed and it turns out that there is at least one non-zero transform coefficient between L + 1th to 16th, it is certain that 4x4 RST is not applied as described above, so the NSST index is not parsed. Can be set to not. Accordingly, in this method, the NSST index is selectively parsed only when necessary, thereby reducing signaling cost.

For example, if 4x4 RST is applied to multiple 4x4 blocks within a specific region as shown in FIG. 15 (both the same 4x4 RST may be applied or different 4x4 RSTs may be applied), through one NSST index A 4x4 RST (same or separate) applied to all 4x4 blocks may be designated. Since 4x4 RST and whether or not to apply to all the 4x4 blocks are determined by one NSST index, it is residual whether the non-zero transform coefficient exists at positions L + 1 to 16 for all 4x4 blocks. By investigating during the coding process, if a non-zero transform coefficient is present in a position that is not allowed even in one 4x4 block (L + 1 to 16th positions), the NSST index can be configured not to be coded.

These NSST indices may be separately signaled for Luminance blocks and Chrominance blocks, and in the case of chroma blocks, they may signal separate NSST indices for Cb and Cr, respectively, and signaling the NSST index only once One NSST index can be shared.

When one NSST index is shared for Cb and Cr, 4x4 RST indicated by the same NSST index may be applied (the 4x4 RST itself for Cb and Cr may be the same, and the NSST index is the same but has individual 4x4 RST). May be). At this time, in order to apply the conditional signaling described above for the shared NSST index, it is checked whether there is a non-zero transform coefficient from L + 1 to 16th for all 4x4 blocks for Cb and Cr. If a non-conversion coefficient is found, it can be configured to omit signaling for the NSST index.

As another example, in the case of combining the transform coefficients for two 4x4 blocks as shown in FIG. 16, after checking whether a non-zero transform coefficient appears at a position where a valid transform coefficient does not exist when 4x4 RST is applied, the NSST index It is possible to determine whether or not signaling. In particular, when the L value is 8 as shown in FIG. 16 and there are no valid transform coefficients for one 4x4 block when 4x4 RST is applied (block indicated by X in FIG. 12B), coded_sub_block_flag of a block without valid transform coefficients By checking, if the value is 1, the NSST index may not be signaled.

In addition, when coding the NSST index, in the present invention, the NSST index can be coded before residual coding.

When coding for the NSST index is performed before residual coding according to an embodiment of the present invention, since whether to apply 4x4 RST is predetermined, residual coding is omitted for locations where it is certain that the transform coefficient is filled with 0. can do.

Here, whether or not 4x4 RST is applied may be configured to be known through an NSST index value (for example, if the NSST index is 0, 4x4 RST is not applied), or may be signaled through a separate syntax element. . For example, if a separate syntax element is an NSST flag, the NSST flag is parsed first to determine whether 4x4 RST is applied, and if NSST flag value is 1, residual coding is omitted for locations where a valid transform coefficient cannot exist. can do.

In the case of HEVC, first, when performing residual coding, the last non-zero coefficient position on the TU is coded. If the coding for the NSST index is performed after the last non-zero transform coefficient position coding and the position of the last non-zero transform coefficient is assumed to be a position where the non-zero transform coefficient cannot occur when assuming application of 4x4 RST , 4x4 RST can be applied without coding the NSST index. For example, in the case of positions indicated by X in FIG. 15, when 4x4 RST is applied, since valid conversion coefficients are not located (for example, a value of 0 may be filled), the last non-zero conversion in the area indicated by X When the coefficient is located, coding for the NSST index can be omitted. If the last non-zero transform coefficient is not located in the region indicated by X, coding for the NSST index can be performed.

Coding NSST index conditionally after coding for the last non-zero transform coefficient position (as described above, if the position of the last non-zero transform coefficient is an unacceptable position assuming the application of 4x4 RST, coding for the NSST index) When it is known whether to apply 4x4 RST, the remaining residual coding part can be processed in the following two ways.

(1) In case 4x4 RST is not applied, general residual coding may be maintained. That is, coding is performed under the assumption that a non-zero transform coefficient may exist at any position from the last non-zero transform coefficient position to the DC position.

(2) In the case of applying 4x4 RST, for a specific position or a specific 4x4 block (for example, X position in FIG. 15), the corresponding conversion coefficient has no choice but to be present (it may be filled with 0 by default). Alternatively, residual coding may be omitted for a block. For example, when the position indicated by X in FIG. 15 is reached, coding for sig_coeff_flag (a flag for whether a non-zero transform coefficient exists in a corresponding position applied to HEVC and VVC) may be omitted. In addition, when the transform coefficients of two blocks are combined as shown in FIG. 16, coding for coded_sub_block_flag (existing in HEVC) can be omitted for a 4x4 block emptied as 0, and a corresponding value can be derived as 0, and for a corresponding 4x4 block All can be filled with zero values without coding.

On the other hand, when coding the NSST index after coding for the last non-zero transform coefficient position, the x position (Px) and the y position (Py) of the last non-zero transform coefficient are greater than Tx and Ty (specific threshold values), respectively. In small cases, NSST index coding can be omitted and configured not to apply 4x4 RST. For example, when Tx = 1 and Ty = 1, it means that NSST index coding is omitted for the case where the last non-zero transform coefficient is present in the DC position. The method of determining whether to encode the NSST index through comparison with the threshold can be applied to luma and chroma differently. For example, different Tx and Ty may be applied to luma and chroma, and a threshold value may be applied to luma (or chroma) and not applied to chroma (or luma).

Of course, there are two ways to omit the NSST index coding (if the last non-zero transform coefficient is located in an area where no valid transform coefficient exists, how to omit the NSST index coding, the X coordinate for the last non-zero transform coefficient And how to omit the NSST index coding when the Y coordinate is less than a certain threshold, respectively. For example, it is possible to first perform a threshold check for the last non-zero transform coefficient position coordinate, and then check whether the last non-zero transform coefficient is located in a region where a valid transform coefficient does not exist and vice versa. Purity is possible.

The method of coding the NSST index before the residual coding described above can also be applied to 8x8 RST. That is, if the last non-zero transform coefficient is located in a region other than the upper left 4x4 in the upper left 8x8 region, coding for the NSST index can be omitted, otherwise coding for the NSST index can be performed. In addition, if the X and Y coordinate values for the last non-zero transform coefficient position are less than a certain threshold, coding for the NSST index can be omitted. Of course, both methods can be applied together.

In addition, in coding the NSST index, in the present invention, different RST index coding and residual coding schemes may be applied to luma and chroma when RST is applied.

The first method (method 1) in which NSST index coding is performed after residual coding and the method (method 2) in which NSST index coding is performed before and residual coding may be applied differently to luma and chroma, respectively.

For example, luma follows the method described in method 2, and method 1 can be applied to chroma. Alternatively, conditional NSST index coding may be applied to luma according to method 1 or method 2, conditional NSST index coding may not be applied to chroma, and vice versa. That is, NSST index coding may be applied conditionally according to method 1 or method 2 to chroma, and conditional NSST index coding may not be applied to luma.

Hereinafter, an optimization method for a case of applying multiple transforms as a first order transform will be described.

In applying multiple transforms, in the present invention, multiple transforms may be applied based on a method of reduced transform (RT). This may also be referred to as terms such as Reduced Explicit Multiple Transform (REMT) or Reduced Adaptive Multiple Transform (RAMT).

As described above, when a combination of multiple transforms (DCT2, DST7, DCT8, DST1, DCT5, etc.) is selectively used for the primary transform, as in the multiple transform selection (or Explicit Multiple Transform; Adaptive Multiple Transform), complexity is reduced. For this, the worst case complexity can be significantly reduced by applying the transformation only to a predefined region, rather than performing the transformation in all cases.

For example, when the first transform is applied to an MxM-sized pixel block based on the aforementioned method of RT (Reduced transform), instead of obtaining a MxM-sized transform block, the (M> = R) transform block of RxR Only calculations for can be performed. As a result, there are non-zero coefficients that are valid only for the RxR region, and it can be regarded as a zero value without performing calculation for the transformation coefficients that will exist in other regions. Table 4 below shows three examples of a reduced adaptive multiple transform (RAMT) using a predefined R value for each primary transform size.

In applying the above-described RAMT (or REMT), in the present invention, the RT factor (reduced transform factor) (R) may be determined depending on the corresponding primary transform.

For example, when the primary transform is DCT2, since the calculation amount is relatively simple compared to other primary transforms, the decrease in encoding performance can be minimized by not using RT for a small block or by using a relatively large R value. have. For example, in the case of DCT2 and other conversions, other RT factors may be used as shown in Table 5 below. Table 5 shows an example of a RAMT using different RT factors for each conversion size.

In addition, in applying multiple transforms, in the present invention, an EMT (or AMT) core transform may be selected according to an intra prediction mode. As shown in Table 1 and Table 2, when EMT_CU_Flag = 1 (or AMT_CU_Flag = 1), one of four combinations of EMT indexes (0,1,2,3) may be selected through EMT_TU_index of 2 bits, Based on the given EMT index, a transform type to be applied to the corresponding primary transform can be selected. Table 6 below is an example of a mapping table for selecting a transformation type applied to the corresponding primary transformation for the horizontal and vertical directions based on the EMT_index value.

In the present invention, statistics of primary transforms occurring according to the intra prediction mode are analyzed, and based on this, a more efficient EMT core transform mapping method is proposed. First, the following Table 7 shows the distribution of EMT_TU_index for each intra prediction mode as a percentage (%).

Based on the intra prediction mode using 67 mode, in Table 7 above, the horizontal (Hor) mode indicates modes 2 to 33, and the vertical (Ver) mode indicates directions 34 to 66. Mode.

As can be seen from Table 7, EMT_TU_index = 2 in the vertical direction (Hor) mode (2 <= mode <= 33) shows a much higher probability than EMT_TU_index = 1. Therefore, the present invention proposes a mapping table as shown in Table 8 below.

Table 8 shows an example in which different mappings are used for horizontal mode groups. As described above, in the method of deriving the first order transform based on EMT_TU_index, another mapping table may be used based on the intra prediction direction.

Also, in the present invention, EMT_TU_index available for each intra prediction mode is not the same and may be defined differently. For example, as shown in Table 7, the probability of occurrence is relatively low in the case of EMT_TU_index = 3 in the planar mode (in the case of EMT_TU_index> 1 in the angular mode), this probability of occurrence More efficient encoding is possible by excluding the lower index portion. Table 9 below shows an example in which available EMT_TU_index values depend on intra prediction modes.

In the present invention, the following two encoding methods are proposed to more efficiently encode the values of EMT_TU_index distributed differently for each of the above-described intra prediction modes.

1) When the EMT (AMT) TU index value is binarized, it can be encoded using a truncated unary method rather than a fixed length binarization method. Table 10 below shows examples of fixed length and truncated unary binarization.

2) When encoding the EMT TU index value through context modeling, a context model may be determined using information of an intra prediction mode. Table 11 shows three embodiments (method 1, method 2, method 3) in which intra prediction modes are mapped according to context. In particular, the context modeling method for each intra prediction mode specified in the present invention may be considered together with other factors such as block size.

Hereinafter, according to the present invention, a process of performing transformation by applying multiple transformation selection (MTS) in an AMT (or EMT) method is proposed, but a syntax element for applying multiple transformation selection and a transformation kernel used for multiple transformation We propose a method for determining (conversion type).

In one embodiment of the present invention, in performing the transformation, a syntax element indicating whether multiple transformation selection is available may be used. By using such a syntax element, it is possible to explicitly signal from the encoding apparatus to the decoding apparatus whether or not transformation can be performed using multiple transform selection for the current block to be coded. Table 12 below shows an example of a syntax table that signals information for indicating whether multi-conversion selection is available in a sequence parameter set. Table 13 shows an example of a semantics table that defines information represented by the syntax elements of Table 12.

Referring to Table 12 and Table 13, sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag syntax elements may be used as information indicating whether multiple transform selection is available in performing the transform. For example, sps_mts_intra_enabled_flag is information indicating whether a transform based on multiple transform selection is available for an intra coding block, and sps_mts_inter_enabled_flag is information indicating whether a transform based on multiple transform selection is available for an inter coding block. Can be Here, the intra coding block refers to a block coded in intra prediction mode, and the inter coding block refers to a block coded in inter prediction mode.

In one embodiment, the encoding apparatus sets and signals whether a transform based on multiple transform selection is available for an intra coding block through sps_mts_intra_enabled_flag, and the decoding apparatus decodes the signaled sps_mts_intra_enabled_flag to multiple for the intra coding block It is possible to determine whether a conversion selection is available. Alternatively, the encoding apparatus sets and signals whether a transform based on multiple transform selection is available for an inter-coding block through sps_mts_inter_enabled_flag, and the decoding apparatus decodes the signaled sps_mts_inter_enabled_flag to select multiple transforms for the inter-coding block Can determine if it is available. As described above, when it is determined that the multiple transform selection is available for the corresponding intra coding block or the corresponding inter coding block based on sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag, information indicating whether to apply multiple transform selection to be described later (for example, cu_mts_flag) or multiple transform selection Information indicating the transform kernel used at the time (eg, mts_idx) may be additionally signaled.

Here, the sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag syntax element is shown in Table 12 as being signaled at the sequence level (i.e., sequence parameter set), but is signaled through the slice level (i.e., slice header) or picture level (i.e., picture parameter set), etc. It might be.

In addition, in one embodiment of the present invention, when the sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag signaled through a higher level (for example, a sequence parameter set, etc.) as shown in Table 12 above shows that the transformation based on multiple transform selection is available, the lower level In the example (eg, residual coding syntax, transform unit syntax, etc.), information indicating whether multiple transform selection is applied to the corresponding block may be additionally signaled. Table 14 below shows information indicating whether multiple transform selections are additionally applied at a lower level (e.g., transform unit syntax) based on syntax elements explicitly signaled at a higher level (e.g., sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag) (e.g. cu_mts_flag). An example of a syntax table for signaling is shown. Table 15 shows an example of a semantics table defining information represented by the syntax element of Table 14.

In addition, in one embodiment of the present invention, information indicating whether multi-conversion selection is available (eg, sps_mts_intra_enabled_flag, sps_mts_inter_enabled_flag) or information indicating whether multi-conversion selection is applied (eg, cu_mts_flag), as shown in Tables 12 and 14, is provided. Based on this, information indicating a transform kernel used when selecting multiple transforms may be signaled. Table 16 below shows an example of a syntax table signaling information indicating a transform kernel applied when multiple transforms are selected. Table 17 shows an example of a semantics table that defines information represented by the syntax element of Table 16.

Referring to Tables 16 and 17, mts_idx syntax element may be used as information indicating a transform kernel used when selecting multiple transforms. The mts_idx syntax element may be set to an index value indicating one combination applied to a current block among specific combinations configured for horizontal direction transformation and vertical direction transformation used in multiple transformation, such as the transformation set described above. .

For example, when explicitly instructing to apply multiple transform selection for the current block through syntax elements such as sps_mts_intra_enabled_flag, sps_mts_inter_enabled_flag, cu_mts_flag, the mts_idx syntax element signals information required to perform the transformation of the current block The level may be transmitted through residual coding syntax or transform unit syntax. The decoding device may obtain mts_idx syntax elements from the encoding device, derive transformation kernels (horizontal direction transformation kernel and vertical direction transformation kernel) applied to the current block based on the index value indicated by mts_idx, and perform multiple transformations. have.

In this case, combinations of the horizontal transform kernel and the vertical transform kernel used for multiple transform selection may be preset, and each combination may respectively correspond to an index value of mts_idx. Accordingly, the decoding apparatus selects a combination corresponding to the index value of mts_idx among the combinations of the preset horizontal direction conversion kernel and vertical direction conversion kernel, and the horizontal direction conversion kernel and vertical direction conversion kernel of the selected combination are assigned to the current block. It can be derived from the set of transform kernels to be applied.

17 is a flowchart schematically illustrating a decoding method for performing inverse transform according to an embodiment of the present invention. The method disclosed in FIG. 17 shows a process of performing inverse transformation based on the syntax elements described in Tables 12 to 17 above.

Referring to FIG. 17, the decoding apparatus may acquire sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag (S1700).

Here, the sps_mts_intra_enabled_flag syntax element indicates whether cu_mts_flag exists in the residual coding syntax of the intra coding unit. For example, if sps_mts_intra_enabled_flag = 0, cu_mts_flag is not present in the residual coding syntax of the intra coding unit, and if sps_mts_intra_enabled_flag = 1, cu_mts_flag is present in the residual coding syntax of the intra coding unit. And, the sps_mts_inter_enabled_flag syntax element indicates whether cu_mts_flag exists in the residual coding syntax of the inter coding unit. For example, if sps_mts_inter_enabled_flag = 0, cu_mts_flag is not present in the residual coding syntax of the inter coding unit, and if sps_mts_inter_enabled_flag = 1, cu_mts_flag is present in the residual coding syntax of the inter coding unit.

The decoding device may obtain cu_mts_flag based on sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag (S1710).

For example, when sps_mts_intra_enabled_flag = 1 or sps_mts_inter_enabled_flag = 1, the decoding device may acquire cu_mts_flag. Here, the cu_mts_flag syntax element indicates whether multiple transform selection (hereinafter referred to as 'MTS') is applied to the residual sample of the luma transform block. For example, if cu_mts_flag = 0, MTS is not applied to the residual sample of the luma transform block, and if cu_mts_flag = 1, MTS is applied to the residual sample of the luma transform block. As another example, at least one of the embodiments described in Table 25 or other embodiments described later with respect to cu_mts_flag may be applied.

The decoding apparatus may obtain mts_idx based on cu_mts_flag (S1720).

For example, when cu_mts_flag = 1, the decoding apparatus may obtain mts_idx. Here, the mts_idx syntax element indicates which transform kernel is applied to luma residual samples along the horizontal and / or vertical direction of the current transform block. For example, at least one of the embodiments described in Tables 18 to 25 or other embodiments described later with respect to mts_idx may be applied.

The decoding device may derive a transform kernel corresponding to mts_idx (S1730).

For example, the transform kernel corresponding to mts_idx may be defined by being divided into horizontal transform and vertical transform. As another example, different transformation kernels may be applied to the horizontal transformation and the vertical transformation. However, the present invention is not limited thereto, and the same transform kernel may be applied to horizontal transform and vertical transform.

Then, the decoding apparatus may perform inverse transform based on the transform kernel (S1740).

In addition, in this document, MTS may be expressed as an Adaptive Mutliple Transform (AMT) or (Explicit Multiple Transform, EMT). Similarly, mts_idx may also be expressed as AMT_idx, EMT_idx, AMT_TU_idx EMT_TU_idx, etc. It is not limited to expression.

18 shows related components of a decoding apparatus for performing inverse transform according to an embodiment of the present invention. The related component of the decoding apparatus disclosed in FIG. 18 may perform the method of FIG. 17 above. Also, the decoding apparatus of FIG. 18 may be the decoding apparatus disclosed in FIG. 3.

Referring to FIG. 18, the decoding apparatus 300 derives an element 1800 for obtaining a sequence parameter, an element 1810 for obtaining a multiple transform selection flag, an element 1820 for obtaining a multiple transform selection index, and a transformation kernel Element 1830.

The element 1800 for obtaining sequence parameters may obtain sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag. Here, the sps_mts_intra_enabled_flag syntax element indicates whether cu_mts_flag is present in the residual coding syntax of the intra coding unit, and the sps_mts_inter_enabled_flag syntax element indicates whether cu_mts_flag is present in the residual coding syntax of the inter coding unit. As a specific example, the description of FIG. 17 may be applied.

The element 1810 for obtaining the multiple transform selection flag may acquire cu_mts_flag based on sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag. For example, when sps_mts_intra_enabled_flag = 1 or sps_mts_inter_enabled_flag = 1, the element 1810 acquiring the multiple transform selection flag may acquire cu_mts_flag. Here, the cu_mts_flag syntax element indicates whether multiple transform selection (hereinafter referred to as 'MTS') is applied to the residual sample of the luma transform block. As a specific example, the description of FIG. 17 may be applied.

The element 1820 that acquires the multiple transform selection index may obtain mts_idx based on cu_mts_flag. For example, when cu_mts_flag = 1, the element 1820 obtaining a multiple transform selection index may obtain mts_idx. Here, the mts_idx syntax element indicates which transform kernel is applied to luma residual samples along the horizontal and / or vertical direction of the current transform block. As a specific example, the description of FIG. 17 may be applied.

The element 1830 that derives the transform kernel may derive the transform kernel corresponding to mts_idx.

Also, the decoding apparatus 300 may perform inverse transformation based on the transformation kernel.

Meanwhile, according to an embodiment of the present invention, combinations of transform kernels used for multiple transform selection may be configured in various ways. Here, combinations of transform kernels may be referred to as multiple transform selection candidates (hereinafter, MTS candidates). Also, combinations of transform kernels (that is, MTS candidates) represent multiple transform kernel sets, and the multiple transform kernel sets may be derived by combining a transform kernel type corresponding to a vertical transform kernel and a transform kernel type corresponding to a horizontal transform kernel. You can. At this time, there may be a plurality of conversion kernel types that can be used for multiple conversion selection. In this case, the conversion kernel type corresponding to the vertical conversion kernel is one of the plurality of conversion kernel types, and the conversion kernel type corresponding to the horizontal conversion kernel is It may be one of the plurality of conversion kernel types. In other words, multiple transform kernel sets (ie, MTS candidates) may be configured by combining a plurality of transform kernel types. For example, as the conversion kernel type that can be used for multiple conversion selection, DST7, DCT8, DCT2, DST1, DCT5, etc. can be used, and multiple conversion kernel sets ( That is, MTS candidates). These multiple transform kernel sets (ie, MTS candidates) may be variously configured in consideration of transform efficiency.

In configuring MTS candidates, in one embodiment of the present invention, a plurality of MTS candidates are configured by combining them using DST7 and DCT8 as transform kernel types, and an MTS index value (eg, mts_idx) corresponds to each of the plurality of MTS candidates. ).

In one embodiment, in the first index value of the MTS index (eg, index value 0), MTS candidates (that is, a set of transform kernels) are selected by combining both the transform kernel types corresponding to the vertical transform kernel and the horizontal transform kernel with DST7. ) Can be mapped. For the second index value (eg, index value 1) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DST7 and a transform kernel type corresponding to the horizontal transform kernel is selected as DCT8 and combined). , Transform kernel set). For the third index value (eg, index value 2) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DCT8 and a transform kernel type corresponding to the horizontal transform kernel is selected as DST7 and combined) , Transform kernel set). In the fourth index value (eg, index value 3) of the MTS index, MTS candidates (ie, a set of transform kernels) that are combined by selecting a transform kernel type corresponding to both the vertical transform kernel and the horizontal transform kernel as DCT8 can be mapped. have. The combined MTS candidates may be represented as a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel according to the MTS index values as shown in Table 18 below.

In another embodiment, in the first index value of the MTS index (eg, index value 0), MTS candidates (ie, a set of transform kernels) that are selected by combining both a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel with DST7 ) Can be mapped. For the second index value of the MTS index (eg, index value 1), the MTS candidate (that is, the transform kernel type corresponding to the vertical transform kernel is selected as DCT8 and the transform kernel type corresponding to the horizontal transform kernel is selected as DST7 and combined). , Transform kernel set). The combined MTS candidates may be represented as a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel according to the MTS index values as shown in Table 19 below.

In another embodiment, the first index value (eg, index value 0) of the MTS index includes MTS candidates (that is, transform kernels) selected by combining both the transform kernel types corresponding to the vertical transform kernel and the horizontal transform kernel with DST7. Set). For the second index value (eg, index value 1) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DST7 and a transform kernel type corresponding to the horizontal transform kernel is selected as DCT8 and combined). , Transform kernel set). The combined MTS candidates may be represented by a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel according to the MTS index values as shown in Table 20 below.

Referring to Tables 18 to 20, a transform kernel type corresponding to a vertical transform kernel and a transform kernel type corresponding to a horizontal transform kernel are mapped according to the index value of the MTS index. Here, DST7 is indicated when the converted kernel type value is 1, and DCT8 is indicated when the converted kernel type value is 2. In some cases, the MTS index syntax element is not signaled. That is, when it is determined that the MTS-based transform is not available (for example, when sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag is 0) or when it is determined that the MTS-based transform is not applied (for example, when cu_mts_flag is 0), MTS index information May not exist. In this case, the decoding apparatus infers the value of the MTS index to -1 as shown in Tables 18 to 20 above, and the corresponding conversion kernel type 0 is the conversion kernel type of the current block (that is, the vertical conversion kernel and the horizontal conversion). Kernel). At this time, the transform kernel type 0 may indicate DCT2.

Also, according to an embodiment, the present invention may configure MTS candidates in consideration of the directionality of the intra prediction mode.

In one embodiment, when 67 intra prediction modes are applied, 4 MTS candidates shown in Table 18 above may be used for 2 non-directional modes (eg, DC mode and planner mode), and a mode having horizontal directionality For the horizontal group mode including the (eg, mode 2 to 34), two MTS candidates shown in Table 19 above can be used, and a vertical group mode (eg, mode 35) including modes with vertical directionality (# 66 mode), two MTS candidates shown in Table 20 above may be used.

In another embodiment, when 67 intra prediction modes are applied, 3 MTS candidates may be used for 2 non-directional modes (eg, DC mode and planner mode), and horizontal directionality may be used as shown in Table 21 below. For a horizontal group mode including excitation modes (eg, 2 to 34 modes), two MTS candidates may be used as shown in Table 22 below, and a vertical group mode including modes with vertical directionality (eg : Modes 35 to 66), as shown in Table 23 above, two MTS candidates may be used.

Table 21 below shows an example of a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel according to MTS index values as MTS candidates used for two non-directional modes (eg, DC mode, planner mode).

Referring to Table 21, the first index value (eg, index value 0) of the MTS index includes MTS candidates (that is, transforms) selected by combining both the transform kernel types corresponding to the vertical transform kernel and the horizontal transform kernel with DST7. Kernel set). For the second index value (eg, index value 1) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DST7 and a transform kernel type corresponding to the horizontal transform kernel is selected as DCT8 and combined). , Transform kernel set). For the third index value (eg, index value 2) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DCT8 and a transform kernel type corresponding to the horizontal transform kernel is selected as DST7 and combined) , Transform kernel set).

Table 22 below shows MTS candidates used for a horizontal group mode (eg, modes 2 to 34) including modes with horizontal directionality, and a transform corresponding to a vertical transform kernel and a horizontal transform kernel according to the MTS index value. Here is an example of a kernel type.

Referring to Table 22, the first index value (eg, index value 0) of the MTS index includes MTS candidates (that is, transforms) selected by combining both the transform kernel types corresponding to the vertical transform kernel and the horizontal transform kernel with DST7. Kernel set). For the second index value of the MTS index (eg, index value 1), the MTS candidate (that is, the transform kernel type corresponding to the vertical transform kernel is selected as DCT8 and the transform kernel type corresponding to the horizontal transform kernel is selected as DST7 and combined). , Transform kernel set).

Table 23 below shows MTS candidates used for a vertical group mode (eg, 35 to 66 modes) including vertically oriented modes, and transforms corresponding to a vertical transform kernel and a horizontal transform kernel according to the MTS index value. Here is an example of a kernel type.

Referring to Table 23, the first index value (eg, index value 0) of the MTS index includes MTS candidates (that is, transforms) selected by converting both the transform kernel types corresponding to the vertical transform kernel and the horizontal transform kernel to DST7. Kernel set). For the second index value (eg, index value 1) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DST7 and a transform kernel type corresponding to the horizontal transform kernel is selected as DCT8 and combined). , Transform kernel set).

In Tables 21 to 23, DST7 is indicated when the converted kernel type value is 1, and DCT8 is indicated when the converted kernel type value is 2. In some cases, the MTS index syntax element is not signaled. That is, when it is determined that the MTS-based transform is not available (for example, when sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag is 0) or when it is determined that the MTS-based transform is not applied (for example, when cu_mts_flag is 0), MTS index information May not exist. In this case, the decoding apparatus infers the value of the MTS index to -1 as shown in Tables 21 to 23 above, and the corresponding conversion kernel type 0 is the conversion kernel type of the current block (ie, vertical conversion kernel and horizontal conversion). Kernel). At this time, the transform kernel type 0 may indicate DCT2.

Of course, the present invention can configure MTS candidates for all intra prediction modes without considering the directionality of the intra prediction mode. In one embodiment, three MTS candidates may be configured for all intra prediction modes, and an MTS index value (eg, mts_idx) may be allocated to each of the three MTS candidates.

For example, in the first index value of the MTS index (for example, index value 0), MTS candidates (ie, a set of transform kernels) that are selected by combining both the transform kernel types corresponding to the vertical transform kernel and the horizontal transform kernel with DST7 (that is, the transform kernel set) Can be mapped. For the second index value (eg, index value 1) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DST7 and a transform kernel type corresponding to the horizontal transform kernel is selected as DCT8 and combined). , Transform kernel set). For the third index value (eg, index value 2) of the MTS index, an MTS candidate (that is, a transform kernel type corresponding to the vertical transform kernel is selected as DCT8 and a transform kernel type corresponding to the horizontal transform kernel is selected as DST7 and combined) , Transform kernel set). The combined MTS candidates may be represented by a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel according to the MTS index values as shown in Table 24 below.

Referring to Table 24, the transform kernel type corresponding to the vertical transform kernel and the transform kernel type corresponding to the horizontal transform kernel are mapped according to the index value of the MTS index. Here, DST7 is indicated when the converted kernel type value is 1, and DCT8 is indicated when the converted kernel type value is 2. In some cases, the MTS index syntax element is not signaled. That is, when it is determined that the MTS-based transform is not available (for example, when sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag is 0) or when it is determined that the MTS-based transform is not applied (for example, when cu_mts_flag is 0), MTS index information May not exist. In this case, the decoding apparatus infers the value of the MTS index to -1 as shown in Table 24 above, and the corresponding transform kernel type 0 is converted to the transform kernel type of the current block (ie, the vertical transform kernel and the horizontal transform kernel). Can be used. At this time, the transform kernel type 0 may indicate DCT2.

In addition, the present invention can configure MTS candidate (s) applied to all prediction modes (ie, intra prediction mode and inter prediction mode). In one embodiment, one MTS candidate may be configured for the intra prediction mode and the inter prediction mode, and an MTS index value (eg, mts_idx) may be allocated. In this case, since one MTS candidate is used, flag information can be used instead of the MTS index to reduce the number of bits.

For example, when the flag information (eg, cu_mts_flag) is 1, a transform kernel type indicated by one MTS candidate may be mapped. That is, when the flag information (eg, cu_mts_flag) is 1, both the transform kernel type corresponding to the vertical transform kernel and the transform kernel type corresponding to the horizontal transform kernel can be mapped to DST7.

Table 25 below shows an example of mapping the transform kernel type corresponding to the vertical transform kernel and the horizontal transform kernel based on the flag information (eg, cu_mts_flag).

Referring to Table 25, when the flag information (eg, cu_mts_flag) is 1, regardless of the prediction mode (that is, intra prediction mode or inter prediction mode), the transform kernel type corresponding to the vertical transform kernel and the horizontal transform kernel is used. All 1 values can be derived. Alternatively, when the flag information (eg, cu_mts_flag) is 0, a value of 0 may be derived as both a transform kernel type corresponding to a vertical transform kernel and a horizontal transform kernel. Here, when the conversion kernel type is 1, it may mean that DST7 is used, and when the conversion kernel type is 0, it may mean that DCT2 is used. As described above, in some cases, flag information (eg, cu_mts_flag) may not be signaled. For example, flag information (eg, cu_mts_flag) may not be signaled when it is determined that MTS-based conversion is not available (eg, when sps_mts_intra_enabled_flag or sps_mts_inter_enabled_flag is 0). In this case, the decoding device infers the flag information (eg, cu_mts_flag) to 0 as shown in Table 25 above, and the corresponding conversion kernel type 0 corresponds to the conversion kernel type of the current block (ie, vertical conversion kernel and horizontal conversion kernel). ).

In the above-described embodiments, DST7 and DCT8 are used as the conversion kernel type, but this is only an example. In the present invention, multiple transforms can be performed by configuring a transform kernel set for multiple transform selection using various transform kernel types (eg, DCT2, DCT4, DCT5, DCT7, DCT8, DST1, DST4, DST7, etc.).

Meanwhile, DCT / DST conversion kernel types such as DCT2, DCT4, DCT5, DCT7, DCT8, DST1, DST4, and DST7 may be defined based on basis functions, and basis functions may be represented as in Table 26 below. The conversion kernel type described herein may be referred to as a conversion type.

The method disclosed in FIG. 19 may be performed by the encoding apparatus 200 disclosed in FIG. 2. Specifically, steps S1900 to S1910 of FIG. 19 may be performed by the prediction unit 220 and the intra prediction unit 222 illustrated in FIG. 2, and step S1920 of FIG. 19 may be performed by the subtraction unit 231 illustrated in FIG. 2. 19, steps S1930 to S1950 of FIG. 19 may be performed by the conversion unit 232 illustrated in FIG. 2, and step S1960 of FIG. 19 may be performed by the quantization unit 233 illustrated in FIG. 2. 19, steps S1970 to S1980 of FIG. 19 may be performed by the entropy encoding unit 240 illustrated in FIG. 2. In addition, the method disclosed in FIG. 19 may include the above-described embodiments herein. Therefore, in FIG. 19, a detailed description of contents overlapping with the above-described embodiments will be omitted or simplified.

Referring to FIG. 19, the encoding device may derive a prediction mode for a current block (S1900). That is, the encoding apparatus may determine whether to apply the intra prediction mode or the inter prediction mode to the current block, and derive the determined prediction mode of the current block.

The encoding device may perform prediction according to the prediction mode of the current block. At this time, when the prediction mode for the current block is an intra prediction mode, the encoding apparatus may perform intra prediction on the current block to derive prediction samples (S1910). Alternatively, when the prediction mode for the current block is the inter prediction mode, the encoding device may perform inter prediction on the current block and derive predicted samples of the current block as a prediction result.

The encoding apparatus may derive residual samples for the current block based on the prediction samples of the current block (S1920). That is, the encoding apparatus may derive residual samples based on a difference between original samples and prediction samples for the current block.

The encoding device may derive a horizontal transform kernel and a vertical transform kernel applied to residual samples of the current block (S1930), and generate transform index information based on the derived horizontal transform kernel and vertical transform kernel (S1940). .

In one embodiment, the encoding apparatus performs multi-transform selection (MTS; or AMT; or EMT) that applies transforms for horizontal and vertical directions separately to residual samples of the current block in consideration of conversion efficiency You can decide whether or not. If it is decided to apply multiple transform selection, the encoding apparatus may determine a transform type (transformation kernel) applied to the horizontal direction and the vertical direction respectively.

More specifically, the encoding apparatus may perform transformation on a plurality of transform combinations, and select an optimal transform combination from the plurality of transform combinations based on a rate of RD (Rate Distortion). Then, the encoding apparatus may generate transform index information corresponding to the selected optimal transform combination.

Here, the plurality of transform combinations are composed of combinations of a horizontal transform kernel and a vertical transform kernel, and may include transform combination candidates disclosed in Tables 18 to 25 above. In other words, a plurality of transform combinations can be represented by transform kernel sets consisting of a horizontal transform kernel and a vertical transform kernel, and each transform combination combines a transform type corresponding to a horizontal transform kernel and a transform type corresponding to a vertical transform kernel. Can be derived. In this case, the transform type corresponding to the horizontal transform kernel and the transform type corresponding to the vertical transform kernel may be one of a plurality of transform types. For example, DCT2, DCT4, DCT5, DCT7, DCT8, DST1, DST4, DST7, etc. may be used as the conversion type, and a plurality of conversion types (for example, DST7, DCT8) may be combined to form a plurality of conversion types. Transform combinations (combination candidates of the vertical transform kernel and the horizontal transform kernel) can be derived.

In one embodiment, a plurality of transform combinations can be derived by using DST7 and DCT8 transform types and combining them into a horizontal transform kernel and a vertical transform kernel. Also, a plurality of transform combinations may be mapped to index values (ie, transform index information). For example, a plurality of transform combinations may be mapped to transform index information having first to fifth index values. The transform combination corresponding to the first index value (eg, the index value 0) may be DCT type 2 in a vertical transform kernel and a horizontal transform kernel. The transform combination corresponding to the second index value (eg, the index value 1) may have a vertical transform kernel and a horizontal transform kernel of DST type 7. The transform combination corresponding to the third index value (eg, index value 2) may be a vertical transform kernel of DST type 7 and a horizontal transform kernel of DCT type 8. The transform combination corresponding to the fourth index value (eg, index value 3) may be a DCT type 8 vertical transformation kernel and a DST type 7 horizontal transformation kernel. The transform combination corresponding to the fifth index value (eg, the index value 4) may be DCT type 8 in the vertical transform kernel and the horizontal transform kernel.

According to an embodiment, a plurality of transform combinations may be configured and mapped to transform index information in consideration of the directionality of the intra prediction mode. For example, when the intra prediction mode of the current block is a non-directional mode (eg, DC mode or planner mode), multiple transform selection is performed by configuring transform combinations corresponding to the first to fifth index values as in the example above. It can also be applied to poetry. When the intra prediction mode of the current block is a directional mode (for example, a horizontal directional mode including 2 to 34 modes or a vertical directional mode including 35 to 66 modes), different transformation combinations from the above example are configured. Therefore, it can be applied when selecting multiple transforms. As described above, a method of configuring a plurality of transform combinations in consideration of the directionality of the intra prediction mode has been described in detail through Tables 18 to 23 above.

For example, if the intra prediction mode of the current block is a non-directional mode (eg, DC mode or planner mode), the encoding device converts vertical and horizontal directions based on transform combinations corresponding to the first to fifth index values. And select an optimal transform combination among these transform combinations. In addition, the encoding device may generate index values by deriving index values corresponding to the selected combination of transforms.

The encoding apparatus may perform transformation on residual samples of the current block based on the horizontal transform kernel and the vertical transform kernel, and derive transform coefficients for the current block (S1950).

In one embodiment, the encoding apparatus performs horizontal and vertical transforms on residual samples of the current block using a transform kernel (horizontal transform kernel and vertical transform kernel) selected as an optimal transform combination among a plurality of transform combinations. Can be done. Here, the transform kernel selected as the optimal transform combination refers to a horizontal transform kernel and a vertical transform kernel indicated by the transform index information.

The encoding apparatus may derive quantized transform coefficients by performing quantization based on the transform coefficients of the current block (S1960). Then, the encoding apparatus may generate residual information based on quantized transform coefficients for the current block (S1970).

The encoding device may encode image information including prediction mode information, residual information, and transform index information for the current block (S1970).

Here, the prediction mode information is information about a prediction mode applied when the current block is predicted, and may be information about an intra prediction mode or an inter prediction mode.

Also, the residual information may include value information of quantized transform coefficients, location information, a transform technique, transform kernel, quantization parameter, and the like.

In addition, the transform index information may be an index value indicating a transform combination including a horizontal transform kernel and a vertical transform kernel applied to the current block among a plurality of transform combinations.

In encoding the video information, the encoding device may determine whether multiple transform selection is available for the current block, and generate the determined information as the multiple transform selection available flag information. At this time, the multiple transform selection available flag information may be defined as intra available flag information and inter available flag information according to the prediction mode of the current block. The intra-available flag information may be a sps_mts_intra_enabled_flag syntax element disclosed in Tables 12 and 13 above, and indicates whether a multiple transform selection-based transform is available for an intra coding block. The inter-available flag information may be a sps_mts_inter_enabled_flag syntax element disclosed in Tables 12 and 13 above, and indicates whether a multiple transform selection-based transform is available for an inter coding block.

That is, if it is determined that the current block is a block coded in intra prediction mode and multiple transform selection is available, encoding may be performed by setting intra available flag information (eg, sps_mts_intra_enabled_flag) to a value of 1 and including it in video information. have. Alternatively, if it is determined that the current block is a block coded in the inter prediction mode and multiple transform selection is available, encoding may be performed by setting inter available flag information (eg, sps_mts_inter_enabled_flag) to a value of 1 and including it in video information. . In addition, when it is determined that multiple transform selection is not available for the intra coding block, the encoding apparatus may encode by setting intra available flag information (eg, sps_mts_intra_enabled_flag) to a value of 0 and including it in video information. Alternatively, if it is determined that multiple transform selection is not available for the inter-coding block, the encoding device may encode by setting inter-available flag information (eg, sps_mts_inter_enabled_flag) to a value of 0 and including it in video information. At this time, intra-available flag information (eg, sps_mts_intra_enabled_flag) and inter-available flag information (eg, sps_mts_inter_enabled_flag) may be signaled at a sequence parameter set (SPS) level.

If it is determined that multiple transform selection is available for the current block (i.e., the intra available flag information or the inter available flag information is 1 value), the encoding device determines the transform kernel (horizontal transform kernel and vertical transform kernel) applied to the current block. It is possible to encode and signal the indicated transform index information. At this time, the transform index information may be signaled through residual coding syntax or transform unit syntax. In other words, when the intra-available flag information or the inter-available flag information indicates 1, a syntax element for transform index information may exist in a bitstream including encoded image information.

In addition, in encoding video information, the encoding apparatus may determine whether to apply a transform based on multiple transform selection to the current block, and generate the determined information as multiple transform selection flag information. For example, the multiple conversion selection flag information may be the cu_mts_flag syntax element disclosed in Tables 14 and 15 above. When the multiple transform selection flag information (eg, cu_mts_flag) is 1, it may indicate that a multiple transform selection based transform is applied to the current block.

That is, when it is determined that the multiple transform selection-based transform is applied to the current block, the encoding apparatus may encode by setting the multiple transform selection flag information (eg, cu_mts_flag) to a value of 1 and including it in video information. Alternatively, if it is determined that a multi-conversion selection-based transform is not applied to the current block, the encoding apparatus may encode the multi-conversion selection flag information (eg, cu_mts_flag) as a value of 0 and include it in the video information.

At this time, when the multiple transform selection flag information (eg, cu_mts_flag) represents 1, the encoding device may generate and signal transformation index information representing transform kernels (horizontal transform kernels and vertical transform kernels) applied to the current block. When the multi-conversion selection flag information (eg, cu_mts_flag) indicates 0, the encoding device may not signal the decoding device by not generating a syntax element for the conversion index information.

According to an embodiment, in encoding the multi-conversion selection flag, when the intra-available flag information represents 1 or the inter-available flag information represents 1, the encoding apparatus may encode the multi-conversion selection flag by including it in the video information. have.

The method of encoding and signaling a syntax element for the intra available flag information, inter available flag information, multiple conversion selection flag information, and conversion index information may include contents disclosed in Tables 12 to 17.

As described above, the encoded image information may be output in the form of a bitstream. The bitstream may be transmitted to a decoding device through a network or storage medium.

The method disclosed in FIG. 20 may be performed by the decoding apparatus 300 disclosed in FIG. 3. Specifically, step S2000 of FIG. 20 may be performed by the entropy decoding unit 310 illustrated in FIG. 3, and steps S2010 to S2020 of FIG. 20 may include a prediction unit 330 and an intra prediction unit 331 illustrated in FIG. 3. 20, step S2030 in FIG. 20 may be performed by the entropy decoding unit 310 illustrated in FIG. 3, and step S2040 in FIG. 20 may be performed by the inverse quantization unit 321 illustrated in FIG. 3. 20, steps S2050 to S2060 in FIG. 20 may be performed by the inverse transform unit 322 illustrated in FIG. 3, and step S2070 in FIG. 20 may be performed by the adder 340 illustrated in FIG. 3. In addition, the method disclosed in FIG. 20 may include the embodiments described herein. Therefore, in FIG. 20, a detailed description of content overlapping with the above-described embodiments will be omitted or simplified.

Referring to FIG. 20, the decoding apparatus may obtain prediction mode information, residual information, and transform index information from a bitstream (S2000).

The decoding apparatus may derive the prediction mode for the current block based on the prediction mode information obtained from the bitstream (S2010).

The decoding apparatus may perform prediction on the current block according to the prediction mode information. At this time, when the prediction mode for the current block is an intra prediction mode, the decoding apparatus may perform intra prediction on the current block to derive prediction samples (S2020). Alternatively, when the prediction mode for the current block is an inter prediction mode, the decoding apparatus may perform inter prediction on the current block and derive predicted samples of the current block as a prediction result.

The decoding apparatus may derive quantized transform coefficients for the current block based on residual information obtained from the bitstream (S2030).

Here, the residual information may include value information of quantized transform coefficients, location information, a transform technique, a transform kernel, and quantization parameters.

The decoding apparatus may derive transform coefficients by performing inverse quantization based on the quantized transform coefficients of the current block (S2040).

The decoding apparatus may derive a horizontal transform kernel and a vertical transform kernel based on the transform index information obtained from the bitstream (S2050).

In one embodiment, the decoding apparatus may select a transform combination corresponding to transform index information from a plurality of transform combinations, and derive a horizontal transform kernel and a vertical transform kernel included in the selected transform combination.

For example, if the intra prediction mode of the current block is a non-directional mode (eg, a DC mode or a planner mode), the decoding apparatus uses transform index information generated based on transform combinations corresponding to the first to fifth index values. A conversion combination corresponding to an index value indicated by the conversion index information may be derived from conversion combinations obtained from an encoding device and corresponding to the first to fifth index values. In one example, when the transform index information indicates a second index value, the decoding apparatus may derive a transform combination corresponding to the second index value. At this time, the transform combination may include a vertical transform kernel and a horizontal transform kernel indicated by DST type 7.

The decoding apparatus may derive residual samples for the current block by performing inverse transform on transform coefficients of the current block based on the horizontal transform kernel and the vertical transform kernel (S2060).

The decoding apparatus may generate a reconstructed picture based on residual samples and prediction samples of the current block (S2070).

In one embodiment, the decoding apparatus may obtain reconstruction samples by combining residual samples of the current block and prediction samples. Based on these reconstruction samples, a current picture may be reconstructed. Thereafter, as described above, the decoding apparatus may apply an in-loop filtering procedure, such as deblocking filtering, SAO, and / or ALF procedure, to the reconstructed picture in order to improve subjective / objective image quality as needed.

According to an embodiment, the decoding apparatus may obtain information (that is, multiple transform selection available flag information) indicating whether multiple transform selection is available for the current block from the bitstream. At this time, the multiple transform selection available flag information may be defined as intra available flag information and inter available flag information according to the prediction mode of the current block. The intra-available flag information may be a sps_mts_intra_enabled_flag syntax element disclosed in Tables 12 and 13 above, and indicates whether a multiple transform selection-based transform is available for an intra coding block. The inter-available flag information may be a sps_mts_inter_enabled_flag syntax element disclosed in Tables 12 and 13 above, and indicates whether a multiple transform selection-based transform is available for an inter coding block.

For example, when the intra available flag information (eg, sps_mts_intra_enabled_flag) is 1 and the prediction mode of the current block is the intra prediction mode, the decoding apparatus may determine that multiple transform selection is available for the current block, and from the bitstream Conversion index information may be further obtained. Alternatively, when the inter available flag information (eg, sps_mts_inter_enabled_flag) is 1 and the prediction mode of the current block is the inter prediction mode, the decoding apparatus may determine that multiple transform selection is available for the current block, and transform index information from the bitstream You can acquire more. In this case, the decoding apparatus may perform an inverse transform by deriving a transform kernel (vertical and horizontal transform kernel) indicated by transform index information obtained from the bitstream.

If the intra available flag information (eg sps_mts_intra_enabled_flag) is 0 or the inter available flag information (eg sps_mts_inter_enabled_flag) is 0, the conversion index information is not explicitly signaled from the encoding device. In other words, a syntax element (eg, mts_idx) representing transform index information may not be present in the bitstream. In this case, the conversion index information can be inferred as a preset index value. For example, when transform combinations corresponding to the first to fifth index values are applied, the transform index information may be inferred as the first index value. The decoding apparatus infers transform index information into a first index value when a syntax element (eg, mts_idx) indicating transform index information is not obtained from a bitstream, and transform combinations (vertical and horizontal) corresponding to the first index value Transform kernel). At this time, the transform combination corresponding to the first index value may include a vertical transform kernel and a horizontal transform kernel indicated by DCT type 2.

The intra available flag information (eg, sps_mts_intra_enabled_flag) and inter available flag information (eg, sps_mts_inter_enabled_flag) may be signaled at a sequence parameter set (SPS) level. The conversion index information obtained when the intra available flag information or the inter available flag information is 1 may be signaled through a residual coding syntax level or a conversion unit syntax level.

Also, according to an embodiment, the decoding apparatus may obtain information (that is, multiple conversion selection flag information) indicating whether multiple conversion selection is applied to the current block from the bitstream. For example, the multiple conversion selection flag information may be the cu_mts_flag syntax element disclosed in Tables 14 and 15 above. For example, when the multiple transform selection flag information (eg, cu_mts_flag) is 1, it may indicate that an inverse transform based on multiple transform selection is applied to the current block.

Accordingly, the decoding apparatus may determine that the inverse transform based on the multiple transform selection is applied to the current block when the multiple transform selection flag information (eg, cu_mts_flag) is 1, and further obtains and decodes transform index information from the bitstream can do.

If the multiple conversion selection flag information (eg, cu_mts_flag) indicates 0, the conversion index information is not explicitly signaled from the encoding device. In other words, a syntax element (eg, mts_idx) representing transform index information may not be present in the bitstream. In this case, the conversion index information can be inferred as a preset index value. For example, when transform combinations corresponding to the first to fifth index values are applied, the transform index information may be inferred as the first index value. The decoding apparatus infers transform index information into a first index value when a syntax element (eg, mts_idx) indicating transform index information is not obtained from a bitstream, and transform combinations (vertical and horizontal) corresponding to the first index value Transform kernel). At this time, the transform combination corresponding to the first index value may include a vertical transform kernel and a horizontal transform kernel indicated by DCT type 2.

In decoding the multi-conversion selection flag information, if the intra-available flag information represents 1 or the inter-available flag information represents 1, the decoding apparatus may obtain and decode the multi-conversion selection flag information from the bitstream. That is, when the intra-available flag information indicates 1 or the inter-available flag information indicates 1, the encoding device encodes the syntax element related to the multiple conversion selection flag information or the conversion index information and signals it to the decoding device. In addition, multiple transform selection flag information and transform index information may be additionally obtained according to the intra available flag information or the inter available flag information.

How the syntax elements for the intra-available flag information, inter-available flag information, multiple conversion selection flag information, and conversion index information are signaled may include the contents disclosed in Tables 12 to 17.

Meanwhile, the method of performing the (inverse) transform based on the multiple transform selection described above in this specification may be implemented according to a specification as shown in Table 27 below.

In the above-described embodiment, the methods are described based on a flow chart as a series of steps or blocks, but the present invention is not limited to the order of steps, and some steps may occur in a different order or simultaneously with other steps as described above. have. In addition, those skilled in the art will understand that the steps shown in the flowchart are not exclusive, other steps may be included or one or more steps in the flowchart may be deleted without affecting the scope of the present invention.

The above-described method according to the present invention can be implemented in software form, and the encoding device and / or decoding device according to the present invention performs image processing such as TV, computer, smartphone, set-top box, display device, etc. Device.

When embodiments are implemented in software in the present invention, the above-described method may be implemented as a module (process, function, etc.) that performs the above-described functions. Modules are stored in memory and can be executed by a processor. The memory may be internal or external to the processor, and may be connected to the processor by various well-known means. The processor may include an application-specific integrated circuit (ASIC), other chipsets, logic circuits, and / or data processing devices. The memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media and / or other storage devices. That is, the embodiments described in the present invention may be implemented and implemented on a processor, microprocessor, controller, or chip. For example, the functional units illustrated in each drawing may be implemented and implemented on a computer, processor, microprocessor, controller, or chip. In this case, information for implementation (ex. Information on instructions) or an algorithm may be stored in a digital storage medium.

In addition, the decoding device and the encoding device to which the present invention is applied are a multimedia broadcast transmission / reception device, a mobile communication terminal, a home cinema video device, a digital cinema video device, a surveillance camera, a video communication device, a real-time communication device such as video communication, mobile streaming Devices, storage media, camcorders, video-on-demand (VoD) service providing devices, OTT video (Over the top video) devices, Internet streaming service providing devices, 3D (3D) video devices, VR (virtual reality) devices, AR (argumente) reality) devices, video telephony video devices, transportation terminal (ex. vehicles (including self-driving vehicles) terminals, airplane terminals, ship terminals, etc.) and medical video devices, and can be used to process video signals or data signals You can. For example, the OTT video (Over the top video) device may include a game console, a Blu-ray player, an Internet-connected TV, a home theater system, a smartphone, a tablet PC, and a digital video recorder (DVR).

Further, the processing method to which the present invention is applied can be produced in the form of a computer-implemented program, and can be stored in a computer-readable recording medium. Multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium. The computer-readable recording medium includes all kinds of storage devices and distributed storage devices in which computer-readable data is stored. The computer-readable recording medium includes, for example, Blu-ray Disc (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disk and optical. It may include a data storage device. In addition, the computer-readable recording medium includes media implemented in the form of a carrier wave (for example, transmission via the Internet). In addition, the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted through a wired or wireless communication network.

Further, an embodiment of the present invention may be implemented as a computer program product by program code, and the program code may be executed on a computer by an embodiment of the present invention. The program code can be stored on a computer readable carrier.

Referring to FIG. 21, a content streaming system to which the present invention is applied may largely include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

The encoding server serves to compress a content input from multimedia input devices such as a smartphone, a camera, and a camcorder into digital data to generate a bitstream and transmit it to the streaming server. As another example, when multimedia input devices such as a smartphone, a camera, and a camcorder directly generate a bitstream, the encoding server may be omitted.

The bitstream may be generated by an encoding method or a bitstream generation method to which the present invention is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

The streaming server transmits multimedia data to a user device based on a user request through a web server, and the web server serves as an intermediary to inform the user of the service. When a user requests a desired service from the web server, the web server delivers it to the streaming server, and the streaming server transmits multimedia data to the user. In this case, the content streaming system may include a separate control server, in which case the control server serves to control commands / responses between devices in the content streaming system.

The streaming server may receive content from a media storage and / or encoding server. For example, when content is received from the encoding server, the content may be received in real time. In this case, in order to provide a smooth streaming service, the streaming server may store the bitstream for a predetermined time.

Examples of the user device include a mobile phone, a smart phone, a laptop computer, a terminal for digital broadcasting, a personal digital assistants (PDA), a portable multimedia player (PMP), navigation, a slate PC, Tablet PC, ultrabook, wearable device (e.g., smartwatch, smart glass, head mounted display (HMD)), digital TV, desktop Computers, digital signage, and the like.

Each server in the content streaming system can be operated as a distributed server, and in this case, data received from each server can be distributed.

Claims

In the video decoding method performed by the decoding device,

Obtaining prediction mode information, residual information, and transform index information from the bitstream;

Deriving a prediction mode for the current block based on the prediction mode information;

If the prediction mode is an intra prediction mode, performing intra prediction on the current block to derive prediction samples;

Deriving quantized transform coefficients for the current block based on the residual information;

Deriving transform coefficients by performing inverse quantization based on the quantized transform coefficients;

Deriving a horizontal transform kernel and a vertical transform kernel based on the transform index information;

Deriving residual samples for the current block by performing an inverse transform on the transform coefficients based on the horizontal transform kernel and the vertical transform kernel; And

And generating a reconstructed picture based on the prediction samples and the residual samples,

Deriving the horizontal transform kernel and the vertical transform kernel,

And selecting the transform combination corresponding to the transform index information from a plurality of transform combinations, and deriving the horizontal transform kernel and the vertical transform kernel included in the selected transform combination.
The method of claim 1,

The method further includes obtaining, from the bitstream, intra available flag information indicating whether multiple transform selection is available for an intra coding block and inter available flag information indicating whether multiple transform selection is available for an inter coding block,

When the intra-available flag information is 1 or the inter-available flag information is 1, the decoding index information is obtained to derive the horizontal transform kernel and the vertical transform kernel applied to the current block. Way.
According to claim 2,

If the intra-available flag information is 1 or the inter-available flag information is 1, further comprising obtaining multiple transform selection flag information indicating whether multiple transform selection is applied to the current block from the bitstream,

And when the multiple conversion selection flag information indicates 1, obtaining and decoding the conversion index information from the bitstream.
The method of claim 1,

When the intra prediction mode of the current block is a non-directional mode, the plurality of transform combinations correspond to transform index information having first to fifth index values.
According to claim 4,

In the transform combination corresponding to the first index value, the vertical transform kernel and the horizontal transform kernel are DCT type 2,

In the conversion combination corresponding to the second index value, the vertical conversion kernel and the horizontal conversion kernel are DST type 7,

The transform combination corresponding to the third index value is the vertical transform kernel is DST type 7 and the horizontal transform kernel is DCT type 8,

The transform combination corresponding to the fourth index value is DCT type 8 in the vertical transform kernel and DST type 7 in the horizontal transform kernel,

The combination of transforms corresponding to the fifth index value is a video decoding method, characterized in that the vertical transform kernel and the horizontal transform kernel are DCT type 8.
According to claim 3,

When the multiplex conversion selection flag information indicates 0, a syntax element indicating the conversion index information does not exist in the bitstream.
The method of claim 6,

When a syntax element representing the transform index information does not exist in the bitstream,

The transform index information is inferred as a predetermined index value, the video decoding method.
The method of claim 5,

When a syntax element representing the transform index information does not exist in the bitstream,

The transform index information is inferred as the first index value, the video decoding method.
In the video encoding method performed by the encoding device,

Deriving a prediction mode for the current block;

If the prediction mode is an intra prediction mode, performing intra prediction on the current block to derive prediction samples;

Deriving residual samples for the current block based on the prediction samples;

Deriving a horizontal transform kernel and a vertical transform kernel applied to residual samples of the current block;

Generating transform index information based on the horizontal transform kernel and the vertical transform kernel;

Deriving transform coefficients for the current block by performing transform on the residual samples based on the horizontal transform kernel and the vertical transform kernel;

Deriving quantized transform coefficients by performing quantization based on the transform coefficients;

Generating residual information based on the quantized transform coefficients; And

And encoding image information including the prediction mode information, the residual information, and the transform index information,

The transform index information indicates a transform combination including the horizontal transform kernel and the vertical transform kernel among a plurality of transform combinations.
The method of claim 9,

Encoding the video information,

Encoding is performed by including intra available flag information indicating whether multi-conversion selection is available for the intra coding block and inter available flag information indicating whether multi-conversion selection is available for the inter coding block, in the video information, and encoding is performed.

When the intra available flag information indicates 1 or the inter available flag information indicates 1, a syntax element for the transform index information is present in a bitstream including the encoded video information.
The method of claim 10,

Encoding the video information,

When the intra available flag information indicates 1 or the inter available flag information indicates 1, encoding is performed by including, in the video information, multiple conversion selection flag information indicating whether multiple conversion selection is applied to the current block,

When the multi-conversion selection flag information indicates 1, the video encoding method characterized by generating and signaling the conversion index information.
The method of claim 9,

When the intra prediction mode of the current block is a non-directional mode, the plurality of transform combinations correspond to transform index information having first to fifth index values.
The method of claim 12,

In the transform combination corresponding to the first index value, the vertical transform kernel and the horizontal transform kernel are DCT type 2,

In the conversion combination corresponding to the second index value, the vertical conversion kernel and the horizontal conversion kernel are DST type 7,

The transform combination corresponding to the third index value is the vertical transform kernel is DST type 7 and the horizontal transform kernel is DCT type 8,

The transform combination corresponding to the fourth index value is DCT type 8 in the vertical transform kernel and DST type 7 in the horizontal transform kernel,

The combination of transforms corresponding to the fifth index value is characterized in that the vertical transform kernel and the horizontal transform kernel are DCT type 8.
The method of claim 11,

When the multi-conversion selection flag information indicates 0, a syntax element for the conversion index information is not signaled to a decoding device.