WO2024039209A1

WO2024039209A1 - Video signal processing method and apparatus therefor

Info

Publication number: WO2024039209A1
Application number: PCT/KR2023/012220
Authority: WO
Inventors: 김경용; 김동철; 손주형; 곽진삼
Original assignee: 주식회사 윌러스표준기술연구소
Priority date: 2022-08-17
Filing date: 2023-08-17
Publication date: 2024-02-22

Abstract

This processor of a video signal decoding apparatus may determine a first prediction mode of the current block, generate a prediction block of the current block on the basis of the first prediction mode, generate a residual block of the current block on the basis of a transformation matrix set determined on the basis of a second prediction mode, and reconstruct the current block on the basis of the prediction block and the residual block.

Description

Video signal processing method and device therefor

The present invention relates to a method and device for processing video signals, and more particularly, to a method and device for processing video signals for encoding or decoding video signals.

Compression encoding refers to a series of signal processing technologies for transmitting digitized information through communication lines or storing it in a form suitable for storage media. Targets of compression coding include audio, video, and text. In particular, the technology for performing compression coding on video is called video image compression. Compressive coding for video signals is accomplished by removing redundant information by considering spatial correlation, temporal correlation, and probabilistic correlation. However, due to recent developments in various media and data transmission media, more highly efficient video signal processing methods and devices are required.

The purpose of this specification is to increase the coding efficiency of video signals by providing a video signal processing method and apparatus for the same.

This specification provides a video signal processing method and a device therefor.

In this specification, a video signal decoding apparatus includes a processor, wherein the processor determines a first prediction mode of a current block, generates a prediction block of the current block based on the first prediction mode, and generates a second prediction block. A residual block of the current block may be generated based on a set of transformation matrices determined based on a prediction mode, and the current block may be restored based on the prediction block and the residual block.

In this specification, the video signal encoding device includes a processor, wherein the processor determines a first prediction mode of the current block, generates a prediction block of the current block based on the first prediction mode, and generates a second prediction block. A residual block of the current block may be generated based on a set of transformation matrices determined based on a prediction mode, and the current block may be restored based on the prediction block and the residual block.

In the present specification, in a computer-readable non-transitory storage medium storing a bitstream, the bitstream is decoded by a decoding method, the decoding method comprising: determining a first prediction mode of a current block; generating a prediction block of the current block based on the first prediction mode; generating a residual block of the current block based on a set of transformation matrices determined based on a second prediction mode; And it may include restoring the current block based on the prediction block and the residual block.

Additionally, in this specification, the residual block is a set of transform matrices of a multiple transform set (MTS) and/or a transform matrix of a low frequency non-separable transform (LFNST). It can be created based on at least one of the sets.

Additionally, in this specification, the first prediction mode and the second prediction mode may be different prediction modes.

Additionally, in this specification, the first prediction mode may be one of a planar mode based on the horizontal direction or a planar mode based on the vertical direction.

In addition, in this specification, the planar mode based on the horizontal direction is a prediction mode based on the value of the block at the (-1, y) position and the value of the block at the (W, -1) position, and the planar mode based on the vertical direction is a prediction mode based on the value of the block at the (-1, y) position. Plane mode is a prediction mode based on the value of the block at the (x, -1) position and the value of the block at the (-1, H) position, and the position of the upper left block of the current block is (0, 0). , the position of the prediction block may be (x, y).

Additionally, in this specification, i) the first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is a vertical angle mode, or ii) the first prediction mode is a planar mode based on the horizontal direction. It is a planar mode, and the second prediction mode may be a horizontal angle mode.

Additionally, in this specification, the first prediction mode may be a linear prediction mode.

Additionally, in this specification, the first prediction mode may be indicated by a syntax element included in the bitstream.

This specification provides a method for efficiently processing video signals.

The effects that can be obtained in this specification are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

1 is a schematic block diagram of a video signal encoding device according to an embodiment of the present invention.

Figure 2 is a schematic block diagram of a video signal decoding device according to an embodiment of the present invention.

Figure 3 shows an embodiment in which a coding tree unit is divided into coding units within a picture.

Figure 4 shows one embodiment of a method for signaling splitting of quad trees and multi-type trees.

Figures 5 and 6 show the intra prediction method according to an embodiment of the present invention in more detail.

Figure 7 is a diagram showing the positions of neighboring blocks used to construct a motion candidate list in inter prediction.

Figure 8 is a diagram showing the process of generating a prediction block using DIMD according to an embodiment of the present invention.

Figure 9 is a diagram showing the positions of surrounding pixels used to derive directional information according to an embodiment of the present invention.

Figure 10 is a diagram showing a method for mapping a directional mode according to an embodiment of the present invention.

Figure 11 is a diagram showing a histogram for deriving an intra prediction directional mode according to an embodiment of the present invention.

Figure 12 is a diagram showing a method of signaling DIMD mode according to an embodiment of the present invention.

Figure 13 is a diagram showing a method of signaling syntax elements related to intra prediction mode depending on whether DIMD mode is used according to an embodiment of the present invention.

FIG. 14 is a diagram illustrating a method of generating a prediction sample for restoring a current block according to an embodiment of the present invention.

Figure 15 is a diagram showing a method for determining an intra prediction mode according to an embodiment of the present invention.

Figure 16 shows a syntax structure including syntax elements related to DIMD according to an embodiment of the present invention.

Figure 17 is a diagram showing intra prediction directional mode and weight information for neighboring blocks of the current block according to an embodiment of the present invention.

Figure 18 is a diagram showing a method of determining DIMD combination information according to an embodiment of the present invention.

Figure 19 is a diagram showing a method of generating a prediction sample using intra prediction directional mode information and weights according to an embodiment of the present invention.

Figures 20 and 21 are diagrams showing pixel values of neighboring blocks used when deriving an intra prediction directional mode according to an embodiment of the present invention.

Figure 22 is a diagram showing a method of configuring an MPM list including the intra prediction directional mode of the current block according to an embodiment of the present invention.

Figures 23 and 24 are diagrams showing a template used to derive the intra prediction mode of the current block according to an embodiment of the present invention.

25 to 28 are diagrams showing a method of generating prediction samples (pixels) based on a plurality of reference pixel lines according to an embodiment of the present invention.

Figure 29 shows a method of predicting a sample using a plurality of reference pixel lines according to an embodiment of the present invention.

Figure 30 shows a method of determining a reference pixel line based on a template according to an embodiment of the present invention.

Figure 31 shows a method of setting a template for testing a reference pixel line adjacent to the current block according to an embodiment of the present invention.

Figure 32 is a structural diagram showing a method of determining an optimal reference pixel line using a plurality of reference pixel lines based on a template according to an embodiment of the present invention.

Figure 33 shows a method for generating prediction samples using planar mode according to an embodiment of the present invention.

Figure 34 shows a method for deriving a multiple transform set and an LFNST set for a vertical plane mode or a horizontal plane mode according to an embodiment of the present invention.

Figure 35 shows a mapping table according to an embodiment of the present invention.

Figure 36 shows a conversion type set table according to an embodiment of the present invention.

Figure 37 shows a conversion type combination table according to an embodiment of the present invention.

Figure 38 shows a threshold table for IDT conversion type according to an embodiment of the present invention.

Figure 39 shows a method by which a video signal processing device derives a first-order or second-order transformation matrix according to an embodiment of the present invention.

Figure 40 shows a DC prediction method on a vertical or horizontal line basis according to an embodiment of the present invention.

Figure 41 shows a plane prediction method in units of vertical or horizontal lines according to an embodiment of the present invention.

Figure 42 shows a prediction method in sub-block units using planar mode according to an embodiment of the present invention.

Figure 43 shows an intra prediction method based on bidirectional prediction according to an embodiment of the present invention.

Figure 44 shows the types of transform kernels that can be used in video coding according to an embodiment of the present invention.

Figures 45 and 46 show part of a sequence parameter set according to an embodiment of the present invention.

Figure 47 shows part of the general_constraint_info() syntax structure according to an embodiment of the present invention.

Figure 48 shows a method of constructing an MPM list using multiple prediction modes according to an embodiment of the present invention.

Figure 49 shows the location of a reference pixel used when a prediction block is generated using a multi-prediction mode according to an embodiment of the present invention.

Figure 50 shows a transform set table for LFNST and NSPT transforms according to an embodiment of the present invention.

Figure 51 shows a method of deriving a transform set for a linear mode in prediction using a linear mode according to an embodiment of the present invention.

The terms used in this specification are general terms that are currently widely used as much as possible while considering the function in the present invention, but this may vary depending on the intention of a person skilled in the art, custom, or the emergence of new technology. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in the description of the relevant invention. Therefore, we would like to clarify that the terms used in this specification should be interpreted based on the actual meaning of the term and the overall content of this specification, not just the name of the term.

In this specification, ‘A and/or B’ may be interpreted as meaning ‘including at least one of A or B.’

Some terms in this specification may be interpreted as follows. Coding can be interpreted as encoding or decoding depending on the case. In this specification, a device that performs encoding (encoding) of a video signal to generate a video signal bitstream is referred to as an encoding device or encoder, and a device that performs decoding (decoding) of a video signal bitstream to restore a video signal is referred to as a decoder. It is referred to as a device or decoder. Additionally, in this specification, a video signal processing device is used as a term that includes both an encoder and a decoder. Information is a term that includes values, parameters, coefficients, elements, etc., and the meaning may be interpreted differently depending on the case, so the present invention is not limited thereto. 'Unit' is used to refer to a basic unit of image processing or a specific location of a picture, and refers to an image area containing at least one of a luminance (luma) component and a chrominance (chroma) component. Additionally, 'block' refers to an image area containing specific components among the luminance component and chrominance component (i.e., Cb and Cr). However, depending on the embodiment, terms such as 'unit', 'block', 'partition', 'signal', and 'area' may be used interchangeably. Additionally, in this specification, 'current block' refers to a block currently scheduled to be encoded, and 'reference block' refers to a block for which encoding or decoding has already been completed and is used as a reference in the current block. Additionally, in this specification, terms such as 'luma', 'luma', 'luminance', and 'Y' may be used interchangeably. In addition, in this specification, terms such as 'chroma', 'chroma', 'color difference', and 'Cb or Cr' may be used interchangeably, and since the color difference component is divided into two types, Cb and Cr, each color difference component will be used separately. You can. Additionally, in this specification, a unit may be used as a concept that includes all coding units, prediction units, and transformation units. A picture refers to a field or frame, and depending on the embodiment, the above terms may be used interchangeably. Specifically, when the captured image is an interlaced image, one frame is divided into an odd (or odd, top) field and an even (or even, bottom) field, and each field consists of one picture unit. and can be encoded or decoded. If the captured image is a progressive image, one frame can be configured as a picture and encoded or decoded. Additionally, in this specification, terms such as 'error signal', 'residual signal', 'residual signal', 'residual signal', and 'difference signal' may be used interchangeably. Additionally, in this specification, terms such as 'intra prediction mode', 'intra prediction directional mode', 'intra-screen prediction mode', and 'intra-screen prediction directional mode' may be used interchangeably. Additionally, in this specification, terms such as 'motion' and 'movement' may be used interchangeably. In addition, in this specification, 'left', 'upper left', 'upper', 'upper right', 'right', 'lower right', 'bottom', and 'lower left' mean 'left', 'upper left', ' It can be used interchangeably with 'top', 'top right', 'bottom right', 'bottom right', 'bottom', and 'bottom left'. Additionally, element and member can be used interchangeably. POC (Picture Order Count) represents temporal location information of a picture (or frame), can be the playback order displayed on the screen, and each picture can have a unique POC.

Figure 1 is a schematic block diagram of a video signal encoding device 100 according to an embodiment of the present invention. Referring to Figure 1, the encoding device 100 of the present invention includes a transform unit 110, a quantization unit 115, an inverse quantization unit 120, an inverse transform unit 125, a filtering unit 130, and a prediction unit 150. ) and an entropy coding unit 160.

The conversion unit 110 obtains a conversion coefficient value by converting the residual signal, which is the difference between the input video signal and the prediction signal generated by the prediction unit 150. For example, Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), or Wavelet Transform may be used. Discrete cosine transform and discrete sine transform perform transformation by dividing the input picture signal into blocks. In transformation, coding efficiency may vary depending on the distribution and characteristics of values within the transformation area. The transformation kernel used for transformation of the residual block may be a transformation kernel with separable characteristics of vertical transformation and horizontal transformation. In this case, transformation for the residual block can be performed separately into vertical transformation and horizontal transformation. For example, the encoder can perform vertical transformation by applying a transformation kernel in the vertical direction of the residual block. Additionally, the encoder can perform horizontal transformation by applying a transformation kernel in the horizontal direction of the residual block. In this disclosure, a transform kernel may be used as a term to refer to a set of parameters used for transforming a residual signal, such as a transform matrix, transform array, transform function, or transform. For example, the transformation kernel may be any one of a plurality of available kernels. Additionally, transformation kernels based on different transformation types may be used for each of vertical transformation and horizontal transformation.

Higher conversion coefficients are distributed toward the top left of the block, and coefficients closer to '0' are distributed toward the bottom right of the block. As the size of the current block increases, there is a possibility that there will be more coefficients of '0' in the lower right area. In order to reduce the conversion complexity of large blocks, only the upper left area can be left and the remaining areas can be reset to '0'.

Additionally, error signals may exist only in some areas of the coding block. In this case, the conversion process may be performed only for some arbitrary areas. As an example, in a block of size 2Nx2N, an error signal may exist only in the first 2NxN block, and a conversion process is performed only on the first 2NxN block, but the conversion process is not performed on the second 2NxN block and may not be encoded or decoded. Here N can be any positive integer.

The encoder may perform additional transformations before the transform coefficients are quantized. The above-described transformation method may be referred to as a primary transform, and additional transformation may be referred to as a secondary transform. Secondary transformation may be optional for each residual block. According to one embodiment, the encoder may improve coding efficiency by performing secondary transformation on a region where it is difficult to concentrate energy in the low-frequency region only through primary transformation. For example, secondary transformation may be additionally performed on a block whose residual values appear large in directions other than the horizontal or vertical direction of the residual block. Unlike primary transformation, secondary transformation may not be performed separately into vertical transformation and horizontal transformation. This secondary transform may be referred to as Low Frequency Non-Separable Transform (LFNST).

The quantization unit 115 quantizes the transform coefficient value output from the transform unit 110.

In order to increase coding efficiency, rather than coding the picture signal as is, the picture is predicted using the already coded area through the prediction unit 150, and the residual value between the original picture and the predicted picture is added to the predicted picture to create a reconstructed picture. A method of obtaining is used. To prevent mismatches between the encoder and decoder, information available in the decoder must be used when performing prediction in the encoder. For this purpose, the encoder performs a process of restoring the current encoded block. The inverse quantization unit 120 inversely quantizes the transform coefficient value, and the inverse transform unit 125 restores the residual value using the inverse quantized transform coefficient value. Meanwhile, the filtering unit 130 performs a filtering operation to improve the quality of the reconstructed picture and improve coding efficiency. For example, deblocking filters, sample adaptive offset (SAO), and adaptive loop filters may be included. The filtered picture is output or stored in a decoded picture buffer (DPB, 156) to be used as a reference picture.

A deblocking filter is a filter for removing distortion within blocks created at the boundaries between blocks in a restored picture. The encoder can determine whether to apply a deblocking filter to the edge based on the distribution of pixels included in several columns or rows based on an arbitrary edge within the block. When a deblocking filter is applied to a block, the encoder can apply a long filter, strong filter, or weak filter depending on the deblocking filtering strength. Additionally, horizontal filtering and vertical filtering can be processed in parallel. Sample adaptive offset (SAO) can be used to correct the offset from the original image on a pixel basis for a residual block to which a deblocking filter has been applied. In order to correct the offset for a specific picture, the encoder divides the pixels included in the image into a certain number of areas, determines the area to perform offset correction, and uses a method (Band Offset) to apply the offset to the area. You can. Alternatively, the encoder can use a method of applying an offset (Edge Offset) by considering the edge information of each pixel. Adaptive Loop Filter (ALF) is a method of dividing pixels included in an image into predetermined groups, then determining one filter to be applied to the group, and performing differential filtering for each group. Information related to whether to apply ALF may be signaled in units of coding units, and the shape and filter coefficients of the ALF filter to be applied may vary for each block. Additionally, an ALF filter of the same type (fixed type) may be applied regardless of the characteristics of the target block to be applied.

The prediction unit 150 includes an intra prediction unit 152 and an inter prediction unit 154. The intra prediction unit 152 performs intra prediction within the current picture, and the inter prediction unit 154 performs inter prediction using the reference picture stored in the decoded picture buffer 156. Perform. The intra prediction unit 152 performs intra prediction from the reconstructed areas in the current picture and transmits intra encoding information to the entropy coding unit 160. Intra encoding information may include at least one of an intra prediction mode, a Most Probable Mode (MPM) flag, an MPM index, and information about a reference sample. The inter prediction unit 154 may again include a motion estimation unit 154a and a motion compensation unit 154b. The motion estimation unit 154a refers to a specific region of the reconstructed reference picture, finds the part most similar to the current region, and obtains a motion vector value that is the distance between regions. Motion information (reference direction indication information (L0 prediction, L1 prediction, bidirectional prediction), reference picture index, motion vector information, etc.) about the reference area obtained from the motion estimation unit 154a is transmitted to the entropy coding unit 160. so that it can be included in the bitstream. Using the motion information transmitted from the motion estimation unit 154a, the motion compensation unit 154b performs inter-motion compensation to generate a prediction block for the current block. The inter prediction unit 154 transmits inter encoding information including motion information about the reference region to the entropy coding unit 160.

According to an additional embodiment, the prediction unit 150 may include an intra block copy (IBC) prediction unit (not shown). The IBC prediction unit performs IBC prediction from the reconstructed samples in the current picture and transmits IBC encoding information to the entropy coding unit 160. The IBC prediction unit refers to a specific region in the current picture and obtains a block vector value indicating a reference region used for prediction of the current region. The IBC prediction unit may perform IBC prediction using the obtained block vector value. The IBC prediction unit transmits IBC encoding information to the entropy coding unit 160. IBC encoding information may include at least one of reference area size information and block vector information (index information for block vector prediction of the current block within the motion candidate list, block vector difference information).

When the above picture prediction is performed, the transform unit 110 obtains a transform coefficient value by transforming the residual value between the original picture and the predicted picture. At this time, transformation may be performed on a specific block basis within the picture, and the size of the specific block may vary within a preset range. The quantization unit 115 quantizes the transform coefficient value generated by the transform unit 110 and transmits the quantized transform coefficient to the entropy coding unit 160.

The quantized transform coefficients in the form of a two-dimensional array can be rearranged into a one-dimensional array for entropy coding. The scanning method for the quantized transform coefficient may be determined depending on the size of the transform block and the intra-screen prediction mode. As an example, diagonal, vertical, and horizontal scans may be applied. This scan information can be signaled in block units and can be derived according to already established rules.

The entropy coding unit 160 generates a video signal bitstream by entropy coding information representing quantized transform coefficients, intra encoding information, and inter encoding information. The entropy coding unit 160 may use a variable length coding (VLC) method or an arithmetic coding method. The variable length coding (VLC) method converts input symbols into continuous codewords, and the length of the codewords may be variable. For example, frequently occurring symbols are expressed as short codewords, and infrequently occurring symbols are expressed as long codewords. As a variable length coding method, Context-based Adaptive Variable Length Coding (CAVLC) can be used. Arithmetic coding converts consecutive data symbols into a single decimal number using the probability distribution of each data symbol. Arithmetic coding can obtain the optimal decimal bits needed to express each symbol. As arithmetic coding, context-based adaptive binary arithmetic code (CABAC) can be used.

CABAC is a method of binary arithmetic encoding using multiple context models created based on probabilities obtained through experiments. The context model can also be called a context model. First, if the symbols are not in binary form, the encoder binarizes each symbol using exp-Golomb, etc.

Binarized

0 or 1 can be described as a bin. The CABAC initialization process is divided into context initialization and arithmetic coding initialization. Context initialization is a process of initializing the probability of occurrence of each symbol, and is determined depending on the type of symbol, quantization parameter (QP), and slice type (whether I, P, or B). The context model with this initialization information can use probability-based values obtained through experimentation. The context model provides the probability of occurrence of LPS (Least Probable Symbol) or MPS (Most Probable Symbol) for the symbol currently being coded and information (valMPS) about which empty value among 0 and 1 corresponds to the MPS. One of several context models is selected through a context index (ctxIdx), and the context index can be derived through information on the current block to be encoded or information on surrounding blocks. Initialization for binary arithmetic coding is performed based on the probability model selected from the context model. Binary arithmetic coding is divided into probability intervals using the probability of occurrence of 0 and 1, and then coding is carried out through the process where the probability interval corresponding to the bin to be processed becomes the entire probability interval for the next bin to be processed. Location information within the probability interval where the last bin was processed is output. However, since the probability interval cannot be divided indefinitely, when it is reduced to within a certain size, a renormalization process is performed to widen the probability interval and the corresponding location information is output. Additionally, after each bin is processed, a probability update process may be performed in which the probability of the next bin to be processed is newly set through information on the processed bin.

The generated bitstream is encapsulated in a NAL (Network Abstraction Layer) unit as a basic unit. NAL units are divided into VCL (Video Coding Layer) NAL units containing video data and non-VCL NAL units containing parameter information for decoding video data. There are various types of VCL or non-VCL NAL units. . The NAL unit consists of NAL header information and data, RBSP (Raw Byte Sequence Payload), and the NAL header information includes summary information about the RBSP. The RBSP of the VCL NAL unit includes an encoded integer number of coding tree units. In order to decode a bitstream in a video decoder, the bitstream must first be separated into NAL units, and then each separated NAL unit must be decoded. Meanwhile, the information required for decoding the video signal bitstream will be transmitted in a picture parameter set (PPS), sequence parameter set (SPS), video parameter set (VPS), etc. You can.

Meanwhile, the block diagram of FIG. 1 shows the encoding device 100 according to an embodiment of the present invention, and the separately displayed blocks show elements of the encoding device 100 logically distinguished. Accordingly, the elements of the above-described encoding device 100 may be mounted as one chip or as a plurality of chips depending on the design of the device. According to one embodiment, the operation of each element of the above-described encoding device 100 may be performed by a processor (not shown).

Figure 2 is a schematic block diagram of a video signal decoding device 200 according to an embodiment of the present invention. Referring to FIG. 2, the decoding device 200 of the present invention includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 225, a filtering unit 230, and a prediction unit 250.

The entropy decoding unit 210 entropy decodes the video signal bitstream and extracts transform coefficient information, intra encoding information, and inter encoding information for each region. For example, the entropy decoder 210 may obtain a binarization code for transform coefficient information of a specific area from a video signal bitstream. Additionally, the entropy decoding unit 210 inversely binarizes the binarization code to obtain a quantized transform coefficient. The inverse quantization unit 220 inversely quantizes the quantized transform coefficient, and the inverse transform unit 225 restores the residual value using the inverse quantized transform coefficient. The video signal processing device 200 restores the original pixel value by summing the residual value obtained from the inverse transform unit 225 with the predicted value obtained from the prediction unit 250.

Meanwhile, the filtering unit 230 improves image quality by performing filtering on the picture. This may include a deblocking filter to reduce block distortion and/or an adaptive loop filter to remove distortion of the entire picture. The filtered picture is output or stored in the decoded picture buffer (DPB, 256) to be used as a reference picture for the next picture.

The prediction unit 250 includes an intra prediction unit 252 and an inter prediction unit 254. The prediction unit 250 generates a prediction picture using the coding type decoded through the entropy decoding unit 210, transform coefficients for each region, intra/inter coding information, etc. To restore the current block on which decoding is performed, the current picture including the current block or the decoded area of other pictures can be used. Only the current picture is used for reconstruction, that is, a picture (or tile/slice) that performs intra prediction or intra BC prediction is used as an intra picture or I picture (or tile/slice), intra prediction, both inter prediction and intra BC prediction. A picture (or tile/slice) that can be performed is called an inter picture (or tile/slice). To predict sample values of each block among inter pictures (or tiles/slices), a picture (or tile/slice) that uses up to one motion vector and reference picture index is called a predictive picture or P picture (or , tile/slice), and a picture (or tile/slice) using up to two motion vectors and a reference picture index is called a bi-predictive picture or B picture (or tile/slice). In other words, a P picture (or tile/slice) uses at most one set of motion information to predict each block, and a B picture (or tile/slice) uses at most two sets of motion information to predict each block. Use a set. Here, the motion information set includes one or more motion vectors and one reference picture index.

The intra prediction unit 252 generates a prediction block using intra encoding information and reconstructed samples in the current picture. As described above, intra encoding information may include at least one of an intra prediction mode, a Most Probable Mode (MPM) flag, and an MPM index. The intra prediction unit 252 predicts sample values of the current block using reconstructed samples located to the left and/or above the current block as reference samples. In this disclosure, reconstructed samples, reference samples, and samples of the current block may represent pixels. Additionally, sample values may represent pixel values.

According to one embodiment, the reference samples may be samples included in neighboring blocks of the current block. For example, the reference samples may be samples adjacent to the left border and/or samples adjacent to the upper boundary of the current block. In addition, the reference samples are samples of neighboring blocks of the current block, which are located on a line within a preset distance from the left border of the current block and/or are located on a line within a preset distance from the upper border of the current block. These may be samples that do. At this time, the surrounding blocks of the current block are the left (L) block, upper (A) block, Below Left (BL) block, Above Right (AR) block, or Above Left block adjacent to the current block. AL) may include at least one block.

The inter prediction unit 254 generates a prediction block using the reference picture and inter encoding information stored in the decoded picture buffer 256. Inter-encoding information may include a set of motion information (reference picture index, motion vector information, etc.) of the current block with respect to the reference block. Inter prediction may include L0 prediction, L1 prediction, and bi-prediction. L0 prediction refers to prediction using one reference picture included in the L0 picture list, and L1 prediction refers to prediction using one reference picture included in the L1 picture list. This may require one set of motion information (eg, motion vector and reference picture index). In the pair prediction method, a maximum of two reference regions can be used, and these two reference regions may exist in the same reference picture or in different pictures. That is, in the pair prediction method, up to two sets of motion information (e.g., a motion vector and a reference picture index) can be used, and the two motion vectors may correspond to the same reference picture index or may correspond to different reference picture indices. It may be possible to respond. At this time, the reference pictures are pictures located temporally before or after the current picture, and may be pictures that have already been reconstructed. According to one embodiment, the two reference regions used in the bi-prediction method may be regions selected from the L0 picture list and the L1 picture list, respectively.

The inter prediction unit 254 may obtain a reference block of the current block using a motion vector and a reference picture index. The reference block exists in a reference picture corresponding to a reference picture index. Additionally, the sample value of the block specified by the motion vector or its interpolated value may be used as a predictor of the current block. For motion prediction with sub-pel unit pixel accuracy, for example, an 8-tap interpolation filter can be used for the luminance signal and a 4-tap interpolation filter can be used for the chrominance signal. However, the interpolation filter for motion prediction in subpel units is not limited to this. In this way, the inter prediction unit 254 performs motion compensation to predict the texture of the current unit from the previously restored picture. At this time, the inter prediction unit can use a motion information set.

According to an additional embodiment, the prediction unit 250 may include an IBC prediction unit (not shown). The IBC prediction unit can reconstruct the current region by referring to a specific region containing reconstructed samples in the current picture. The IBC prediction unit may perform IBC prediction using the IBC encoding information obtained from the entropy decoding unit 210. IBC encoding information may include block vector information.

The predicted value output from the intra prediction unit 252 or the inter prediction unit 254 and the residual value output from the inverse transform unit 225 are added to generate a restored video picture. That is, the video signal decoding apparatus 200 restores the current block using the prediction block generated by the prediction unit 250 and the residual obtained from the inverse transform unit 225.

Meanwhile, the block diagram of FIG. 2 shows a decoding device 200 according to an embodiment of the present invention, and the separately displayed blocks show elements of the decoding device 200 logically distinguished. Accordingly, the elements of the above-described decoding device 200 may be mounted as one chip or as a plurality of chips depending on the design of the device. According to one embodiment, the operation of each element of the above-described decoding device 200 may be performed by a processor (not shown).

Meanwhile, the technology proposed in this specification is applicable to both encoder and decoder methods and devices, and parts described as signaling and parsing may be described for convenience of explanation. In general, signaling can be described as encoding each syntax from an encoder's perspective, and parsing can be described as interpreting each syntax from a decoder's perspective. That is, each syntax can be signaled by being included in the bitstream from the encoder, and the decoder can parse the syntax and use it in the restoration process. At this time, the sequence of bits for each syntax arranged according to the prescribed hierarchical structure can be referred to as a bitstream.

One picture may be divided into sub-pictures, slices, tiles, etc. and encoded. A subpicture may include one or more slices or tiles. If one picture is divided into multiple slices or tiles and encoded, it can be displayed on the screen only when all slices or tiles in the picture have been decoded. On the other hand, when one picture is encoded with several subpictures, only arbitrary subpictures can be decoded and displayed on the screen. A slice may contain multiple tiles or subpictures. Alternatively, a tile may include multiple subpictures or slices. Subpictures, slices, and tiles can be encoded or decoded independently of each other, which is effective in improving parallel processing and processing speed. However, there is a disadvantage in that the bit amount increases because encoded information of other adjacent subpictures, other slices, and other tiles cannot be used. Subpictures, slices, and tiles can be divided into multiple Coding Tree Units (CTUs) and encoded.

Figure 3 shows an embodiment in which a Coding Tree Unit (CTU) is divided into Coding Units (CUs) within a picture. In the process of coding a video signal, a picture can be divided into a sequence of coding tree units (CTUs). A coding tree unit may be composed of a luma coding tree block (CTB), two chroma coding tree blocks, and its encoded syntax information. One coding tree unit may consist of one coding unit, or one coding tree unit may be divided into multiple coding units. One coding unit may be composed of a luminance coding block (CB), two chrominance coding blocks, and its encoded syntax information. One coding block can be divided into several sub-coding blocks. One coding unit may consist of one transform unit (TU), or one coding unit may be divided into several transform units. One transformation unit may be composed of a luminance transformation block (Transform Block, TB), two chrominance transformation blocks, and its encoded syntax information. A coding tree unit may be divided into a plurality of coding units. A coding tree unit may be a leaf node without being split. In this case, the coding tree unit itself may be a coding unit.

A coding unit refers to a basic unit for processing a picture in the video signal processing process described above, that is, intra/inter prediction, transformation, quantization, and/or entropy coding. The size and shape of a coding unit within one picture may not be constant. The coding unit may have a square or rectangular shape. A rectangular coding unit (or rectangular block) includes a vertical coding unit (or vertical block) and a horizontal coding unit (or horizontal block). In this specification, a vertical block is a block whose height is greater than its width, and a horizontal block is a block whose width is greater than its height. Additionally, in this specification, a non-square block may refer to a rectangular block, but the present invention is not limited thereto.

Referring to Figure 3, the coding tree unit is first divided into a quad tree (Quad Tree, QT) structure. That is, in a quad tree structure, one node with a size of 2NX2N can be divided into four nodes with a size of NXN. In this specification, a quad tree may also be referred to as a quaternary tree. Quad-tree partitioning can be performed recursively, and not all nodes need to be partitioned to the same depth.

Meanwhile, the leaf nodes of the aforementioned quad tree can be further divided into a multi-type tree (MTT) structure. According to an embodiment of the present invention, in a multi-type tree structure, one node may be divided into a binary or ternary tree structure with horizontal or vertical division. That is, there are four division structures in the multi-type tree structure: vertical binary division, horizontal binary division, vertical ternary division, and horizontal ternary division. According to an embodiment of the present invention, the width and height of the nodes in each tree structure may both have values that are powers of 2. For example, in a Binary Tree (BT) structure, a node of size 2NX2N may be divided into two NX2N nodes by vertical binary division and into two 2NXN nodes by horizontal binary division. Additionally, in the Ternary Tree (TT) structure, a node of size 2NX2N is divided into nodes of (N/2)X2N, NX2N and (N/2)X2N by vertical ternary division, and horizontal ternary division By division, it can be divided into nodes of 2NX(N/2), 2NXN, and 2NX(N/2). This multi-type tree partitioning can be performed recursively.

Leaf nodes of a multi-type tree can be coding units. If the coding unit is not larger than the maximum transformation length, the coding unit can be used as a unit of prediction and/or transformation without further division. As an example, if the width or height of the current coding unit is greater than the maximum transform length, the current coding unit may be split into a plurality of transform units without explicit signaling regarding splitting. Meanwhile, in the above-described quad tree and multi-type tree, at least one of the following parameters may be defined in advance or transmitted through an RBSP of a higher level set such as PPS, SPS, VPS, etc. 1) CTU size: the root node size of the quad tree, 2) minimum QT size (MinQtSize): minimum allowed QT leaf node size, 3) maximum BT size (MaxBtSize): maximum allowed BT root node size, 4) Maximum TT Size (MaxTtSize): Maximum TT root node size allowed, 5) Maximum MTT Depth (MaxMttDepth): Maximum allowed depth of MTT split from leaf nodes of QT, 6) Minimum BT Size (MinBtSize): Allowed Minimum BT leaf node size, 7) Minimum TT size (MinTtSize): Minimum TT leaf node size allowed.

Figure 4 shows one embodiment of a method for signaling splitting of quad trees and multi-type trees. Preset flags can be used to signal division of the above-described quad tree and multi-type tree. Referring to Figure 4, a flag 'split_cu_flag' indicating whether to split a node, a flag 'split_qt_flag' indicating whether to split a quad tree node, a flag 'mtt_split_cu_vertical_flag' indicating the splitting direction of a multi-type tree node, or a multi-type tree node. At least one of the flags 'mtt_split_cu_binary_flag' that indicates the split shape of the type tree node can be used.

According to an embodiment of the present invention, 'split_cu_flag', a flag indicating whether to split the current node, may be signaled first. If the value of 'split_cu_flag' is 0, it indicates that the current node is not split, and the current node becomes a coding unit. If the current node is a coating tree unit, the coding tree unit includes one undivided coding unit. If the current node is a quad tree node 'QT node', the current node is a leaf node 'QT leaf node' of the quad tree and becomes a coding unit. If the current node is a multi-type tree node 'MTT node', the current node is a leaf node 'MTT leaf node' of the multi-type tree and becomes a coding unit.

If the value of 'split_cu_flag' is 1, the current node can be divided into nodes of a quad tree or multi-type tree according to the value of 'split_qt_flag'. The coding tree unit is the root node of the quad tree and can be first divided into a quad tree structure. In the quad tree structure, 'split_qt_flag' is signaled for each node 'QT node'. If the value of 'split_qt_flag' is 1, the node is split into 4 square nodes, and if the value of 'split_qt_flag' is 0, the node becomes a leaf node 'QT leaf node' of the quad tree, and the node becomes a multi-square node. -Divided into type nodes. According to an embodiment of the present invention, quad tree division may be limited depending on the type of the current node. Quad tree splitting may be allowed if the current node is a coding tree unit (root node of the quot tree) or a quot tree node, and quot tree splitting may not be allowed if the current node is a multi-type tree node. Each quad tree leaf node 'QT leaf node' can be further divided into a multi-type tree structure. As described above, if 'split_qt_flag' is 0, the current node can be split into multi-type nodes. To indicate the split direction and split shape, 'mtt_split_cu_vertical_flag' and 'mtt_split_cu_binary_flag' may be signaled. If the value of 'mtt_split_cu_vertical_flag' is 1, vertical splitting of the node 'MTT node' is indicated, and if the value of 'mtt_split_cu_vertical_flag' is 0, horizontal splitting of the node 'MTT node' is indicated. Additionally, if the value of 'mtt_split_cu_binary_flag' is 1, the node 'MTT node' is divided into two rectangular nodes, and if the value of 'mtt_split_cu_binary_flag' is 0, the node 'MTT node' is divided into three rectangular nodes.

In the tree division structure, the luminance block and the chrominance block can be divided into the same form. That is, the chrominance block can be divided by referring to the division type of the luminance block. If the current chrominance block is smaller than a certain size, the chrominance block may not be divided even if the luminance block is divided.

In the tree division structure, the luminance block and the chrominance block may have different forms. At this time, division information for the luminance block and division information for the chrominance block may be signaled, respectively. Additionally, not only the division information but also the encoding information of the luminance block and the chrominance block may be different. As an example of an embodiment, at least one intra coding mode of a luminance block and a chrominance block, encoding information for motion information, etc. may be different.

A node to be divided into the smallest unit can be processed as one coding block. When the current block is a coding block, the coding block may be divided into several sub-blocks (sub-coding blocks), and the prediction information of each sub-block may be the same or different. As an example embodiment, when the coding unit is an intra mode, the intra prediction mode of each subblock may be the same or different from each other. Additionally, when the coding unit is in inter mode, the motion information of each sub-block may be the same or different. Additionally, each sub-block may be encoded or decoded independently from each other. Each sub-block can be distinguished through a sub-block index (sbIdx). Additionally, when a coding unit is divided into sub-blocks, it may be divided horizontally or vertically or diagonally. In intra mode, the mode in which the current coding unit is divided into 2 or 4 sub-blocks horizontally or vertically is called ISP (Intra Sub Partitions). In inter mode, the mode in which the current coding block is divided diagonally is called GPM (Geometric partitioning mode). In GPM mode, the position and direction of the diagonal line are derived using a predetermined angle table, and the index information of the angle table is signaled.

Picture prediction (motion compensation) for coding is performed on coding units that are no longer divided (i.e., leaf nodes of coding tree units). The basic unit that performs such prediction is hereinafter referred to as a prediction unit or prediction block.

Hereinafter, the term unit used in this specification may be used as a replacement for the prediction unit, which is a basic unit for performing prediction. However, the present invention is not limited to this, and can be understood more broadly as a concept including the coding unit.

Figures 5 and 6 show the intra prediction method according to an embodiment of the present invention in more detail. As described above, the intra prediction unit predicts sample values of the current block using reconstructed samples located to the left and/or above the current block as reference samples.

First, Figure 5 shows an example of reference samples used for prediction of the current block in intra prediction mode. According to one embodiment, the reference samples may be samples adjacent to the left boundary and/or samples adjacent to the upper boundary of the current block. As shown in Figure 5, when the size of the current block is WXH and samples of a single reference line adjacent to the current block are used for intra prediction, a maximum of 2W+2H+1 located to the left and/or above the current block Reference samples can be set using the surrounding samples.

Meanwhile, pixels of multiple reference lines may be used for intra prediction of the current block. Multiple reference lines may be composed of n lines located within a preset range from the current block. According to one embodiment, when pixels of multiple reference lines are used for intra prediction, separate index information indicating lines to be set as reference pixels may be signaled, and may be called a reference line index.

Additionally, when at least some samples to be used as reference samples have not yet been reconstructed, the intra prediction unit may obtain reference samples by performing a reference sample padding process. Additionally, the intra prediction unit may perform a reference sample filtering process to reduce the error of intra prediction. That is, filtered reference samples can be obtained by performing filtering on surrounding samples and/or reference samples obtained through a reference sample padding process. The intra prediction unit predicts samples of the current block using the reference samples obtained in this way. The intra prediction unit predicts samples of the current block using unfiltered or filtered reference samples. In this disclosure, peripheral samples may include samples on at least one reference line. For example, neighboring samples may include adjacent samples on a line adjacent to the boundary of the current block.

Next, Figure 6 shows an example of prediction modes used for intra prediction. For intra prediction, intra prediction mode information indicating the intra prediction direction may be signaled. Intra prediction mode information indicates one of a plurality of intra prediction modes constituting an intra prediction mode set. If the current block is an intra prediction block, the decoder receives intra prediction mode information of the current block from the bitstream. The intra prediction unit of the decoder performs intra prediction on the current block based on the extracted intra prediction mode information.

According to an embodiment of the present invention, the intra prediction mode set may include all intra prediction modes used for intra prediction (eg, a total of 67 intra prediction modes). More specifically, the intra prediction mode set may include a planar mode, a DC mode, and multiple (e.g., 65) angular modes (i.e., directional modes). Each intra prediction mode may be indicated through a preset index (i.e., intra prediction mode index). For example, as shown in FIG. 6, intra prediction mode index 0 indicates planar mode, and intra prediction mode index 1 indicates DC mode. Additionally, intra prediction mode indices 2 to 66 may respectively indicate different angle modes. The angle modes each indicate different angles within a preset angle range. For example, the angle mode may indicate an angle within an angle range between 45 degrees and -135 degrees clockwise (i.e., a first angle range). The angle mode can be defined based on the 12 o'clock direction. At this time, intra prediction mode index 2 indicates horizontal diagonal (HDIA) mode, intra prediction mode index 18 indicates horizontal (HORizontal, HOR) mode, and intra prediction mode index 34 indicates diagonal (DIA) mode. The mode is indicated, and intra prediction mode index 50 indicates vertical (VER) mode, and intra prediction mode index 66 indicates vertical diagonal (VDIA) mode.

Meanwhile, the preset angle range may be set differently depending on the shape of the current block. For example, if the current block is a rectangular block, a wide-angle mode indicating an angle exceeding 45 degrees or less than -135 degrees clockwise may be additionally used. If the current block is a horizontal block, the angle mode may indicate an angle within an angle range (i.e., a second angle range) between (45+offset1) degrees and (-135+offset1) degrees in a clockwise direction. At this time, angle modes 67 to 76 outside the first angle range may be additionally used. Additionally, if the current block is a vertical block, the angle mode may indicate an angle within an angle range between (45-offset2) degrees and (-135-offset2) degrees clockwise (i.e., a third angle range). . At this time, angle modes -10 to -1 outside the first angle range may be additionally used. According to an embodiment of the present invention, the values of offset1 and offset2 may be determined differently depending on the ratio between the width and height of the rectangular block. Additionally, offset1 and offset2 can be positive numbers.

According to a further embodiment of the present invention, the plurality of angle modes constituting the intra prediction mode set may include a basic angle mode and an extended angle mode. At this time, the extended angle mode may be determined based on the basic angle mode.

According to one embodiment, the basic angle mode corresponds to the angle used in intra prediction of the existing HEVC (High Efficiency Video Coding) standard, and the extended angle mode corresponds to the angle newly added in intra prediction of the next-generation video codec standard. It may be a mode that does this. More specifically, the default angle mode is the intra prediction mode {2, 4, 6,... , 66}, and the extended angle mode is the intra prediction mode {3, 5, 7,... , 65} may be an angle mode corresponding to one of the following. That is, the extended angle mode may be an angle mode between basic angle modes within the first angle range. Accordingly, the angle indicated by the extended angle mode can be determined based on the angle indicated by the basic angle mode.

According to another embodiment, the basic angle mode may be a mode corresponding to an angle within a preset first angle range, and the extended angle mode may be a wide angle mode outside the first angle range. That is, the default angle mode is the intra prediction mode {2, 3, 4, … , 66}, and the extended angle mode is the intra prediction mode {-14, -13, -12,... , -1} and {67, 68, … , 80} may be an angle mode corresponding to one of the following. The angle indicated by the extended angle mode may be determined as the angle opposite to the angle indicated by the corresponding basic angle mode. Accordingly, the angle indicated by the extended angle mode can be determined based on the angle indicated by the basic angle mode. Meanwhile, the number of expansion angle modes is not limited to this, and additional expansion angles may be defined depending on the size and/or shape of the current block. Meanwhile, the total number of intra prediction modes included in the intra prediction mode set may vary depending on the configuration of the basic angle mode and extended angle mode described above.

In the above embodiment, the spacing between extended angle modes may be set based on the spacing between corresponding basic angle modes. For example, the extended angle modes {3, 5, 7, … , 65} are the corresponding fundamental angular modes {2, 4, 6, … , 66} can be determined based on the interval between them. Additionally, the extended angle modes {-14, -13, … , -1} are the corresponding opposite fundamental angular modes {53, 53,... , 66} is determined based on the spacing between the extended angle modes {67, 68,... , 80} are the corresponding opposite fundamental angular modes {2, 3, 4, … , 15} can be determined based on the interval between them. The angular spacing between the extended angle modes may be set to be equal to the angular spacing between the corresponding basic angle modes. Additionally, the number of extended angle modes in the intra prediction mode set may be set to less than the number of basic angle modes.

According to an embodiment of the present invention, the extended angle mode may be signaled based on the basic angle mode. For example, a wide angle mode (i.e., extended angle mode) may replace at least one angle mode (i.e., basic angle mode) within the first angle range. The basic angle mode that is replaced may be an angle mode corresponding to the opposite side of the wide angle mode. That is, the basic angle mode that is replaced is an angle mode that corresponds to an angle in the opposite direction of the angle indicated by the wide-angle mode or to an angle that differs from the angle in the opposite direction by a preset offset index. According to an embodiment of the present invention, the preset offset index is 1. The intra-prediction mode index corresponding to the replaced basic angle mode may be remapped to the wide-angle mode to signal the corresponding wide-angle mode. For example, wide angle mode {-14, -13, … , -1} is the intra prediction mode index {52, 53, … , 66}, respectively, and the wide-angle mode {67, 68, … , 80} is the intra prediction mode index {2, 3, … , 15} can be signaled respectively. In this way, the intra prediction mode index for the basic angle mode signals the extended angle mode, so that even if the configurations of the angle modes used for intra prediction of each block are different, the same set of intra prediction mode indexes are used for signaling of the intra prediction mode. can be used Accordingly, signaling overhead due to changes in intra prediction mode configuration can be minimized.

Meanwhile, whether to use the extended angle mode may be determined based on at least one of the shape and size of the current block. According to one embodiment, if the size of the current block is larger than the preset size, the extended angle mode may be used for intra prediction of the current block, otherwise, only the basic angle mode may be used for intra prediction of the current block. According to another embodiment, if the current block is a non-square block, the extended angle mode may be used for intra prediction of the current block, and if the current block is a square block, only the basic angle mode may be used for intra prediction of the current block.

The intra prediction unit determines reference samples and/or interpolated reference samples to be used for intra prediction of the current block, based on intra prediction mode information of the current block. When the intra prediction mode index indicates a specific angle mode, a reference sample or an interpolated reference sample corresponding to the specific angle from the current sample of the current block is used for prediction of the current pixel. Accordingly, different sets of reference samples and/or interpolated reference samples may be used for intra prediction depending on the intra prediction mode. After intra prediction of the current block is performed using reference samples and intra prediction mode information, the decoder restores the sample values of the current block by adding the residual signal of the current block obtained from the inverse transformer to the intra prediction value of the current block. .

Movement (motion) information used for inter prediction may include reference direction indication information (inter_pred_idc), reference picture indices (ref_idx_l0, ref_idx_l1), and motion (motion) vectors (mvL0, mvL1). Reference picture list utilization information (predFlagL0, predFlagL1) may be set according to the reference direction indication information. As an example of an embodiment, in the case of unidirectional prediction using an L0 reference picture, predFlagL0=1 and predFlagL1=0 may be set. In the case of unidirectional prediction using an L1 reference picture, predFlagL0=0 and predFlagL1=1 can be set. In the case of bidirectional prediction using both L0 and L1 reference pictures, predFlagL0=1 and predFlagL1=1 can be set.

When the current block is a coding unit, the coding unit may be divided into several sub-blocks, and the prediction information of each sub-block may be the same or different. As an example embodiment, when the coding unit is an intra mode, the intra prediction mode of each subblock may be the same or different from each other. Additionally, when the coding unit is in inter mode, the motion information of each sub-block may be the same or different. Additionally, each sub-block may be encoded or decoded independently from each other. Each sub-block can be distinguished through a sub-block index (sbIdx).

The motion vector of the current block is likely to be similar to the motion vector of neighboring blocks. Therefore, the motion vector of the neighboring block can be used as a motion vector predictor (mvp), and the motion vector of the current block can be derived using the motion vector of the neighboring block. Additionally, in order to increase the accuracy of the motion vector, the motion vector difference (mvd) between the optimal motion vector of the current block found in the original image by the encoder and the motion prediction value may be signaled.

The motion vector may have various resolutions, and the resolution of the motion vector may vary on a block basis. Motion vector resolution can be expressed in integer units, half-pixel units, 1/4 pixel units, 1/16 pixel units, integer pixel units of 4, etc. Since images such as screen content are in the form of simple graphics such as text, there is no need to apply an interpolation filter, so integer units and integer pixel units of 4 can be selectively applied on a block basis. Blocks encoded in affine mode, which can express rotation and scale, have significant changes in shape, so integer units, 1/4 pixel units, and 1/16 pixel units can be selectively applied on a block basis. Information on whether to selectively apply motion vector resolution on a block basis is signaled with amvr_flag. If applied, which motion vector resolution to apply to the current block is signaled with amvr_precision_idx.

For blocks to which bidirectional prediction is applied, when applying weight average, the weights between the two prediction blocks can be applied the same or different, and information about the weights is signaled through bcw_idx.

To increase the accuracy of motion prediction values, merge or advanced motion vector prediction (AMVP) methods can be selectively used on a block basis. The Merge method is a method that configures the motion information of the current block to be the same as the motion information of neighboring blocks adjacent to the current block. The Merge method has the advantage of increasing the coding efficiency of motion information by spatially propagating motion information without change in a motion region with homogeneity. There is. On the other hand, the AMVP method is a method that predicts motion information in the L0 and L1 prediction directions respectively and signals the most optimal motion information in order to express accurate motion information. The decoder derives motion information for the current block through the AMVP or Merge method and then uses the reference block located in the motion information derived from the reference picture as a prediction block for the current block.

A method of deriving motion information in Merge or AMVP may be a method in which a motion candidate list is constructed using motion prediction values derived from neighboring blocks of the current block, and then index information for the optimal motion candidate is signaled. In the case of AMVP, since motion candidate lists are derived for each of L0 and L1, the optimal motion candidate indices (mvp_l0_flag, mvp_l1_flag) for each of L0 and L1 are signaled. In the case of Merge, since one motion candidate list is derived, one merge index (merge_idx) is signaled. The motion candidate list derived from one coding unit may vary, and a motion candidate index or merge index may be signaled for each motion candidate list. At this time, a mode in which there is no information about the remaining blocks in blocks encoded in Merge mode can be called Merge Skip mode.

In this specification, motion candidate and motion information candidate may have the same meaning. Additionally, the motion candidate list and the motion information candidate list in this specification may have the same meaning.

SMVD (Symmetric MVD) is a method of reducing the amount of bits of transmitted motion information by ensuring that the MVD (Motion Vector Difference) values in the L0 and L1 directions are symmetrical in the case of bi-directional prediction. MVD information in the L1 direction, which is symmetrical to the L0 direction, is not transmitted, and reference picture information in the L0 and L1 directions is also not transmitted and can be derived during the decoding process.

OBMC (Overlapped Block Motion Compensation) generates prediction blocks for the current block using the motion information of neighboring blocks when the motion information between blocks is different, and then weight averages the prediction blocks to create the final prediction block for the current block. How to create it. This has the effect of reducing the blocking phenomenon that occurs at the block boundaries of motion compensated images.

In general, merge movement candidates have low movement accuracy. To increase the accuracy of these merge motion candidates, the MMVD (Merge mode with MVD) method can be used. The MMVD method is a method of correcting motion information using one candidate selected from among several motion difference value candidates. Information about the correction value of motion information obtained through the MMVD method (for example, an index indicating one candidate selected from motion difference value candidates, etc.) may be included in the bitstream and transmitted to the decoder. Compared to the existing motion information difference value being included in the bitstream, the amount of bits can be saved by including information on the correction value of the motion information in the bitstream.

The TM (Template Matching) method is a method of compensating motion information by constructing a template using surrounding pixels of the current block and finding a matching area with the highest similarity to the template. Template matching (TM) is a method of performing motion prediction in a decoder without including motion information in the bitstream in order to reduce the size of the encoded bitstream. At this time, since the decoder does not have the original image, it can roughly derive motion information for the current block using already restored neighboring blocks.

The DMVR (Decoder-side Motion Vector Refinement) method is a method of correcting motion information through the correlation of already restored reference images in order to find more accurate motion information. It uses the bidirectional motion information of the current block to compare the two reference pictures. This is a method of using the point with the best matching between reference blocks in a reference picture within a certain area as a new bidirectional movement. When such DMVR is performed, the encoder corrects the motion information by performing DMVR in one block unit, then divides the block into sub-blocks and performs DMVR in each sub-block unit to correct the motion information of the sub-block again. This can be done, and this can be called MP-DMVR (Multi-pass DMVR).

The LIC (Local Illumination Compensation) method is a method of compensating for luminance changes between blocks. It derives a linear model using surrounding pixels adjacent to the current block, and then compensates for the luminance information of the current block through the linear model.

Existing video coding methods perform motion compensation considering only horizontal, vertical, and horizontal movements, so coding efficiency deteriorates when coding videos that include movements such as enlargement, reduction, and rotation that are commonly encountered in reality. To express such movements for enlargement, reduction, and rotation, an Affine model-based motion prediction technology that uses a 4 (rotation) or 6 (enlargement, reduction, rotation) parameter model can be applied.

BDOF (Bi-Directional Optical Flow) is used to correct the prediction block by estimating the amount of pixel change based on optical-flow from the reference block of the block composed of bi-directional movement. The motion of the current block can be corrected using motion information derived from the BDOF of the VVC.

PROF (Prediction refinement with optical flow) is a technology to improve the accuracy of sub-block-level affine motion prediction to be similar to the accuracy of pixel-level motion prediction. PROF, similar to BDOF, is a technology that obtains the final prediction signal by calculating correction values on a pixel basis for affine motion compensated pixel values in sub-block units based on optical flow.

When generating a prediction block for the current block, the CIIP (Combined Inter-/Intra-picture Prediction) method performs a weighted average of the prediction blocks generated by the intra-picture prediction method and the prediction blocks generated by the inter-picture prediction method to create the final prediction block. This is a method to create .

The IBC (Intra Block Copy) method is a method that finds the part most similar to the current block in an already reconstructed area in the current picture and uses the corresponding reference block as a prediction block for the current block. At this time, information related to the block vector, which is the distance between the current block and the reference block, may be included in the bitstream. The decoder can calculate or set the block vector for the current block by parsing information related to the block vector contained in Beaststream.

The BCW (Bi-prediction with CU-level Weights) method does not generate a prediction block by averaging two prediction blocks that have been motion-compensated from different reference pictures, but applies weights adaptively on a block-by-block basis to compensate for motion. This is a method of performing a weighted average on two prediction blocks.

In the Intra TMP (Template Matching Prediction) method, a video signal processing device constructs a reference template using pixel values of neighboring blocks adjacent to the current block, and finds the part most similar to the constructed reference template in the already restored area within the current picture. Then, this is a method of using the reference block (part already found in the restored area) as a prediction block for the current block.

The MHP (Multi-hypothesis prediction) method is a method of performing weight prediction through various prediction signals by transmitting additional motion information to unidirectional and bidirectional motion information when predicting between screens.

CCLM (Cross-component linear model) is a method of constructing a linear model using the high correlation between a luminance signal and a chrominance signal located at the same location as the luminance signal, and then predicting the chrominance signal through the linear model. After constructing a template using restored blocks among neighboring blocks adjacent to the current block, parameters for the linear model are derived through the template. Next, depending on the video format, the restored current luminance block is selectively down-sampled to fit the size of the chrominance block. Finally, the chrominance block of the current block is predicted using the down-sampled luminance block and the corresponding linear model. At this time, the method of using two or more linear models is called MMLM (Multi-model Linear mode).

In independent scalar quantization, the restored coefficients t' _k for input coefficients t _k depend only on the associated quantization index q _k . That is, the quantization index for any restored coefficient has a different value from the quantization indexes for other restored coefficients. At this time, t' _k may be a value including the quantization error at t _k and may be different or the same depending on the quantization parameter. Here, _t'k may be named a restored transform coefficient or a dequantized transform coefficient, and the quantization index may be named a quantized transform coefficient.

In Uniform Reconstruction Quantizers (URQ), the reconstructed coefficients have the characteristic of being arranged at equal intervals. At this time, the distance between two adjacent restored values can be referred to as the quantization step size. The restored values may include 0, and the entire set of available restored values may be uniquely defined depending on the quantization step size. The quantization step size may vary depending on the quantization parameter.

In the existing method, the set of allowable restored transform coefficients decreases due to quantization, and the number of elements of this set may be finite. Because of this, there is a limit to minimizing the average error between the original image and the restored image. Vector quantization can be used as a method to minimize this average error.

A simple form of vector quantization method used in video encoding is sign data hiding. This is a method in which the encoder does not encode the sign of one non-zero coefficient, and the decoder determines the sign for the corresponding coefficient depending on whether the sum of the absolute values of all coefficients is even or odd. To this end, at least one coefficient may be increased or decreased by '1' in the encoder, and at least one coefficient is selected to be optimal in terms of cost for rate-distortion, so that the value is It can be adjusted. As an example embodiment, a coefficient having a value close to the boundary of the quantization interval may be selected.

Another vector quantization method is Trellis-Coded Quantization, and in video coding, it is used as an optimal path search technique to obtain an optimized quantization value in dependent quantization. On a block basis, quantization candidates for all coefficients within a block are placed in a trellis graph, and the optimal trellis path between optimized quantization candidates is taken into account, taking into account the cost of rate-distortion. and explore. Specifically, dependent quantization applied to video encoding may be designed such that the set of acceptable restored transform coefficients for a transform coefficient depends on the value of the transform coefficient that precedes the current transform coefficient in reconstruction order. At this time, by selectively using multiple quantizers according to the transformation coefficient, the average error between the original image and the restored image is minimized, thereby increasing coding efficiency.

Among intra prediction coding technologies, the MIP (Matrix Intra Prediction) method is a matrix-based intra prediction method. Unlike prediction methods that have directionality from pixels of neighboring blocks adjacent to the current block, pixels on the left and top of neighboring blocks are used as a predefined matrix matrix. This is a method of obtaining a prediction signal using the and offset values.

In order to derive the intra prediction mode of the current block, based on a template, which is a random area restored while adjacent to the current block, the intra prediction mode for the template derived through the surrounding pixels of the template is used to restore the current block. It can be used for. First, the decoder generates a prediction template for the template using surrounding pixels (references) adjacent to the template, and can use the intra prediction mode that generates the prediction template most similar to the already restored template to restore the current block. This method can be called TIMD (Template intra mode derivation).

In general, an encoder can determine a prediction mode for generating a prediction block and generate a bitstream containing information about the determined prediction mode. The decoder can set the intra prediction mode by parsing the received bitstream. At this time, the bit amount of information about the prediction mode may be about 10% of the total bitstream size. In order to reduce the bit amount of information about the prediction mode, the encoder may not include information about the intra prediction mode in the bitstream. Accordingly, the decoder can derive (determine) an intra prediction mode for restoration of the current block using the characteristics of the surrounding blocks, and can restore the current block using the derived intra prediction mode. At this time, in order to derive the intra prediction mode, the decoder applies a Sobel filter horizontally and vertically to each surrounding pixel (pixel) adjacent to the current block to infer directionality information, and then converts the directionality information into the intra prediction mode. A mapping method can be used. The method by which the decoder derives an intra prediction mode using neighboring blocks can be described as DIMD (Decoder side intra mode derivation).

Surrounding blocks may be blocks in a spatial location or blocks in a temporal location. Surrounding blocks that are spatially adjacent to the current block are Left (A1) block, Left Below (A0) block, Above (B1) block, Above Right (B0) block, or Above Left. , B2) It can be at least one of the blocks. The neighboring block temporally adjacent to the current block may be a block containing the upper left pixel position of the bottom right (BR) block of the current block in the corresponding picture (Collocated picture). If a neighboring block temporally adjacent to the current block is encoded in intra mode or a neighboring block temporally adjacent to the current block exists in an unusable position, the horizontal and vertical dimensions of the current block in the picture corresponding to the current picture (Collocated picture) A block containing the center (Ctr) pixel position of can be used as a temporal neighboring block. Motion candidate information derived from the corresponding picture may be referred to as TMVP (Temporal Motion Vector Predictor). Only one TMVP can be derived from one block, and after dividing one block into several sub-blocks, each TMVP candidate can be derived for each sub-block. The TMVP derivation method on a sub-block basis may be referred to as sbTMVP (sub-block Temporal Motion Vector Predictor).

Whether the methods described herein will be applied depends on slice type information (e.g., whether it is an I slice, a P slice, or a B slice), whether it is a tile, whether it is a subpicture, the size of the current block, the depth of the coding unit, and the current block. It may be determined based on at least one of information about whether it is a luminance block or a chrominance block, whether it is a reference frame or a non-reference frame, reference order, and temporal hierarchy according to the hierarchy. Information used to determine whether the methods described in this specification will be applied may be information previously agreed upon between the decoder and the encoder. Additionally, this information may be determined according to profile and level. This information can be expressed as variable values, and the bitstream can include information about the variable values. That is, the decoder can determine whether the above-described methods are applied by parsing information about variable values included in the bitstream. For example, it may be determined whether the above-described methods will be applied based on the horizontal or vertical length of the coding unit. If the horizontal or vertical length is 32 or more (e.g., 32, 64, 128, etc.), the above-described methods can be applied. Additionally, the above-described methods can be applied when the horizontal or vertical length is less than 32 (e.g., 2, 4, 8, 16). Additionally, the above-described methods can be applied when the horizontal or vertical length is 4 or 8.

Referring to FIG. 8, the decoder can derive a prediction block using surrounding samples (blocks, pixels). At this time, the neighboring sample may be a neighboring block (pixel) of the current block. Specifically, the decoder can determine intra prediction modes and weight information for restoration of the current block through a histogram for directionality information (angle information) using surrounding samples as input.

Figure 9(a) shows when all neighboring blocks of the current block are available to derive directional information, and Figure 9(b) shows that the upper border of the current block is divided into sub-pictures, slices, and It shows when it is a tile or CTU boundary, and Figure 9(c) shows when the left boundary of the current block is a sub-picture, slice, tile, or CTU boundary. Meanwhile, if the neighboring block and the current block do not belong to the same sub-picture, slice, tile, or CTU, the neighboring block may not be used to derive directional information. The gray dots in FIG. 9 indicate the positions of pixels used to derive actual directional information, and the dotted lines indicate sub-picture, slice, tile, and CTU boundaries. Additionally, referring to FIGS. 9(d) to 9(f), in order to derive directional information, pixels located at the boundary may be padded by one pixel outside the boundary. Through this padding, it may be possible to derive more accurate directional information.

To derive directional information about a pixel at a specific location, a 3x3 Sobel filter of Equation 1 may be applied in the horizontal and vertical directions, respectively. A in Equation 1 may mean pixel information (values) of restored neighboring blocks of the 3x3 current block. And the directional information (θ) can be determined using Equation 2. In order to reduce the computational complexity for deriving directional information, the decoder can derive directional information (θ) only by calculating Gy/Gx in Equation 1 without calculating the atan function in Equation 2.

Referring to FIG. 9, directionality information can be calculated for every gray dot shown in FIG. 9, and directionality information can be mapped to an angle in intra prediction mode. The intra prediction mode set may include a planar mode, a DC mode, and a plurality (e.g., 65) angular modes (i.e., direction modes). There may be 67 intra prediction modes, and the directionality information (angle, θ) calculated through Equation 2 may be a value in real numbers. Therefore, a process of mapping directional information to a specific intra prediction directional mode is necessary. The directional mode described herein may be the same as the angular mode.

Referring to Figure 10, the intra prediction directional mode can be divided into four sections based on 0 degrees (index 18), 45 degrees (index 34), 90 degrees (index 50), and 135 degrees (index 66) (Figure 6). Referring to FIG. 10, the section for determining the intra prediction directional mode can be divided into four sections from section 0 to section 3. Section 0 can be from -45 degrees to 0 degrees, section 1 can be from 0 degrees to 45 degrees, section 2 can be from 45 degrees to 90 degrees, and section 3 can be from 90 degrees to 135 degrees. . At this time, each section may include 16 intra prediction directional modes. In the directional mode, any one of four sections can be determined by comparing the signs and magnitudes of Gx and Gy calculated through Equation 1. For example, if Gx and Gy are positive numbers and the absolute value of Gx is greater than the absolute value of Gy, section 1 may be selected. The intra prediction directionality mode mapped to each section can be determined through directionality information (θ) calculated from Equation 2. Specifically, the decoder expands the value by multiplying the direction information (θ) by 2^16. And the decoder can compare the extended value with the numbers in a predefined table to find the closest value to the extended value and determine the intra prediction directionality mode based on the closest value. At this time, the number of predefined table values can be 17. Specifically, the values of the predefined table may be {0, 2048, 4096, 6144, 8192, 12288, 16384, 20480, 24576, 28672, 32768, 36864, 40960, 47104, 53248, 59392, 65536} . At this time, the difference between predefined table values may be set differently depending on the difference between the angles of the intra prediction directional mode.

Meanwhile, if atan calculation is not performed to reduce computational complexity and the directional angle is obtained using only Gy/Gx, the difference between predefined table values may be inconsistent with the distance between angles of the intra prediction directional mode. atan has the characteristic that the slope gradually decreases as the input value increases. Therefore, the table defined above must also be set with values taking into account not only the difference between the angles of the intra prediction directional mode but also the nonlinear characteristics of atan. For example, the difference between the defined table values can be set to gradually decrease. Conversely, the difference between the defined table values can be set to gradually increase.

If the horizontal and vertical lengths of the current block are different, the available intra prediction directionality mode may vary. That is, if the horizontal and vertical lengths of the current block are different, the section for deriving the intra prediction directional mode may vary. In other words, the section for deriving the intra prediction directional mode can be changed based on the horizontal and vertical lengths of the current block (for example, the ratio of the horizontal length to the vertical length, etc.). For example, if the width of the current block is longer than the height, intra prediction modes may be remapped from 67 to 80, and intra prediction modes in the opposite direction may be excluded from 2 to 15. For example, if the horizontal length of the current block is n (integer) times longer than the vertical length (e.g. 2 times), the intra prediction mode {3, 4, 5, 6, 7, 8} is {67, 68, 69, 70, 71, 72} can be reset (mapped) respectively. Additionally, if the horizontal length of the current block is longer than the vertical length, the intra prediction mode may be reset to a value obtained by adding '65' to the intra prediction mode. Meanwhile, if the horizontal length of the current block is shorter than the vertical length, the intra prediction mode may be reset to the value of the intra prediction mode minus '67'.

A histogram can be used to derive an intra prediction directional mode for reconstruction of the current block. As a result of obtaining directionality information about neighboring blocks, if there are more blocks without directionality than blocks with directionality, the prediction mode for the block without directionality may have the highest cumulative value on the histogram. However, since a directional mode must be derived to restore the current block, the prediction mode for a block without directionality can be excluded even if the cumulative value on the histogram is the highest. In other words, a gentle area with no gradient or directionality between neighboring pixels may not be used to derive an intra prediction directionality mode. For example, the prediction mode for a block without directionality may be a planar mode or a DC mode. If the left neighboring block is in planar mode or DC mode, the left neighboring block may not be used to derive directional information, and directional information may be derived using only the upper neighboring block. If the surrounding blocks of the current block contain a mixture of gentle areas and areas with directionality, the decoder can generate a histogram using the G value calculated as in Equation 3 to emphasize directionality. At this time, the histogram may not be based on a frequency in which '1' is added to each intra-prediction directional mode generated, but may be a cumulative value in which the calculated G value is added for each intra-prediction directional mode generated.

The X-axis of FIG. 11 represents the intra prediction directional mode, and the Y-axis represents the cumulative value of G values. The decoder may select the intra prediction directionality mode with the largest cumulative G value among the intra prediction directionality modes. In other words, the decoder can select the intra prediction directionality mode for the current block based on the accumulated value. Referring to FIG. 11, modeA with the largest cumulative value and modeB with the second largest cumulative value may be selected as the intra prediction directional mode. In order to generate a prediction block for the current block, the decoder can generate a final prediction block by performing a weighted average of the prediction block generated in modeA, the prediction block generated in modeB, and finally the prediction blocks generated in planar mode. At this time, the weight of each prediction block can be determined using the accumulated values of modeA and modeB. For example, the weight for a prediction block generated in planar mode may be set to 1/3 of the total weight. The weight for the prediction block generated by modeA can be set to a weight corresponding to the sum of the cumulative values of modeA and modeB divided by the cumulative value of modeA. The weight for the prediction block generated by modeB can be determined as the difference between the modeA weight and 1/3 of the total weight from the total weight. In order to calculate the weight more accurately, the decoder can expand the range of the weight by multiplying the weight for the prediction block generated by modeA by a random value. The weight for the prediction block generated in modeB and the weight for the prediction block generated in planar mode can also be expanded in the same way.

Specifically, Figure 12 shows a signaling method used to store syntax elements regarding whether to apply DIMD mode in a bitstream and transmit them to the decoder. Referring to FIG. 12, a syntax element (cu_dimd_flag) for whether the DIMD mode is used to generate a prediction block for the current block determines whether the encoding mode of the current block is intra mode and whether the DIMD mode set in the SPS is activated. If the syntax element (sps_dimd_enabled_flag) indicates that DIMD mode is activated (e.g., if the value of sps_dimd_enabled_flag is 1), the encoding mode of the current block is not SKIP mode, the current block is a luminance block, and the current block is not inter encoding mode. Can be parsed. At this time, if cu_dimd_flag is '1', it can indicate that the current block is decoded in DIMD mode, and if cu_dimd_flag is '0', it can indicate that the current block is not decoded in DIMD mode. Meanwhile, if cu_dimd_flag is not parsed, the value of cu_dimd_flag may be set to '0'. sps_dimd_enabled_flag can be controlled by syntax elements included in profile, tier and level syntax. For example, sps_dimd_enabled_flag can be controlled by gci_no_dimd_constraint_flag, a syntax element included in general_conatraints_info() syntax. It can be defined to perform the following actions. If the value of gci_no_dimd_constraint_flag is 1, the sps_dimd_enabled_flag value for all pictures in OlsinScope can be 0. If the value of gci_no_dimd_constraint_flag is 0, there may be no separate constraint (gci_no_dimd_constraint_flag equal to 1 specifies that sps_dimd_enabled_flag for all pictures in OlsInScope shall be equal to 0. gci_no_dimd_constraint_flag equal to 0 does not impose such a constraint.).

When the current block is decoded in DIMD mode, additional information (syntax elements) related to the encoding mode may not be parsed. Referring to FIG. 13, when the value of cu_dimd_flag is 1, additional information related to the encoding mode of the current block (e.g., intra_mip_flag, intra_subpartitions_mode_flag, intra_luma_mpm_flag, intra_luma_not_planar_flag, intra_luma_mpm_idx, intra_luma_mpm_remainder, etc.) may not be parsed. Afterwards, if a residual signal exists, whether a transform coefficient for the residual signal exists and syntax elements related to the residual signal may be parsed.

Specifically, FIG. 14 is a structural diagram showing a process for more effectively deriving an intra prediction mode to improve encoding performance and generating a prediction sample using the derived intra prediction mode.

Referring to FIG. 14, in the 'prediction mode generator' process, the decoder can derive an intra prediction directional mode for the current block using samples of neighboring blocks adjacent to the current block. At this time, the decoder can derive at least one intra prediction directional mode and derive a weight for each mode. For example, to derive the intra prediction directionality mode, the decoder can use a histogram-based method to derive directionality information of neighboring block samples through a random filter and determine frequently occurring directionality information as the intra prediction directionality mode. . In addition, as a method to derive the intra prediction directionality mode, only the upper pixel adjacent to the current block is used to generate an intra prediction pixel for the left pixel adjacent to the current block, and the intra prediction mode with the least distortion is selected as the intra prediction directionality for the current block. A method of determining by mode may also be used.

Referring to FIG. 14, in the 'intra prediction' process, prediction blocks for the current block can be generated using the intra prediction directional modes and weights derived in the 'prediction mode generator' process. The number of prediction blocks can be determined according to the number of intra prediction directional modes derived in the prediction mode generator process. For example, if the number of derived intra prediction directional modes is 2, there may be 2 prediction blocks for the current block. PDPC (Position dependent intra prediction combination) filtering using a method described later may be applied to the prediction blocks generated in the 'intra prediction' process.

Position dependent intra prediction combination (PDPC) filtering may be applied to each of the prediction blocks generated in the 'intra prediction' process. If PDPC filtering is applied to each prediction block, complexity may increase in terms of the decoder, so if a prediction block is predicted in DIMD mode, PDPC filtering may not be applied to the corresponding prediction block. Additionally, PDPC filtering can be applied only to either modeA, which has the largest cumulative value, or modeB, which has the second largest cumulative value. For example, PDPC filtering can be applied only for modeA. Additionally, whether PDPC filtering is applied may be determined depending on the weight of each directional mode. For example, it may be determined whether PDPC will be applied to all or part of modeA and modeB based on the difference between the weight for modeA and the weight for modeB. For example, if the difference between the weight for modeA and the weight for modeB is less than a certain value, PDPC filtering can be applied to both modeA and modeB. Additionally, it can be determined whether PDPC filtering is applied to modeA and modeB by comparing each of the weight for modeA and the weight for modeB with a specific value. If the weight is greater than a certain value, PDPC filtering can be applied to the directional mode of the weight. For example, if the weight for modeA is greater than a certain value and the weight for modeB is less than a certain value, PDPC filtering may be applied to modeA, and PDPC filtering may not be applied to modeB. Additionally, regardless of the directional mode, a preset form of PDPC filtering can be applied only to the final prediction block to which the weighted average has been applied through the weighted prediction process (see FIG. 14). Additionally, in the 'weighted prediction' process, PDPC filtering can be applied using modeA to the final prediction block to which the weighted average is applied. In the 'weighted prediction' process, PDPC filtering can be applied using modeB to the final prediction block to which the weighted average is applied.

Referring to FIG. 14, in the 'other prediction' process, the decoder can additionally generate a prediction block for the current block. For example, the decoder may generate an intra prediction block using at least one of planar mode, DC mode, and Matrix Intra Prediction (MIP). Weight information for each of the intra prediction directionality modes and prediction directionality derived from the 'prediction mode generator' process, quantization parameter information of the current block, horizontal or vertical length of the current block, and whether the current block is luminance or chrominance. 'other prediction' using at least one of information about, intra prediction mode around the current block, and information about the presence or absence of transform coefficients around the current block (which may correspond to Additional information A, B, and C in FIG. 14) ' It can be decided whether the process will be performed or not. Below, we will explain how to determine whether the 'other prediction' process is performed.

In the 'other prediction' process, information about which mode (e.g., planar, DC, MIP mode, etc.) the decoder will use may be pre-arranged or signaled through SPS. For example, the decoder can determine the mode based on a syntax element (sps_dimd_default_mode) indicating which mode to use. The decoder can decide which mode to use among planar mode, DC mode, and MIP mode depending on the value of sps_dimd_default_mode. For example, if the value of sps_dimd_default_mode is '0', you can instruct to use flat mode, if the value of sps_dimd_default_mode is '1', you can instruct to use DC mode, and if the value of sps_dimd_default_mode is a value other than 0 and 1, you can instruct to use MIP mode. Additionally, if the current block is a luminance block and transform coefficients of neighboring blocks exist, the decoder may generate a prediction block using at least one of planar mode, DC mode, and MIP mode. If the current block is a chrominance block and the transform coefficient of the surrounding block does not exist, a prediction block can be generated using at least one of planar mode, DC mode, and MIP mode. Additionally, if the weights of the intra prediction directional modes derived in the 'prediction mode generator' process are similar to each other (for example, if the difference between the weights of each directional mode is smaller than a certain threshold, etc.), the 'other prediction' process It may not be performed. If the weights of the intra prediction directional modes derived in the 'prediction mode generator' process are similar to each other, the decoder can generate a prediction block using at least one of the planar mode, DC mode, and MIP mode (i.e., 'other prediction ' process is performed). If the difference between the weights of the intra-prediction directional modes derived in the 'prediction mode generator' process is large (for example, if the difference between the weights of each directional mode is larger than a certain threshold), there is a lot of change between pixels in the surrounding blocks. Therefore, the decoder can generate a prediction block using at least one of planar mode, DC mode, and MIP mode. Additionally, if the horizontal and vertical lengths of the current block are different, the decoder can generate a prediction block using at least one of planar mode, DC mode, and MIP mode. Conversely, when the horizontal and vertical lengths of the current block are the same, the decoder can generate a prediction block using at least one of planar mode, DC mode, and MIP mode.

In the 'weighted prediction' process, the decoder can generate one prediction sample by averaging the weights of multiple intra prediction blocks generated in the 'intra prediction' and 'other prediction' processes. The weight for each intra prediction block is the intra prediction directional mode and weight information derived from the 'prediction mode generator' process, the quantization parameter information of the current block, the horizontal or vertical length of the current block, and whether the current block is luminance or chrominance. It may be determined based on at least one of information, an intra prediction mode around the current block, and information about the presence or absence of a transform coefficient around the current block.

Figure 15 shows the 'prediction mode generator' process of Figure 14 in more detail. Referring to FIG. 15, the 'prediction mode generator' process of FIG. 14 can derive intra prediction directionality through histogram analysis. Specifically, in the 'Histogram analysis' process of FIG. 15, the decoder can derive intra prediction directionality by analyzing the histogram obtained using surrounding samples adjacent to the current block. At this time, the decoder determines the horizontal length and vertical length of the current block, quantization parameter information, possible intra prediction direction mode information among neighboring blocks of the current block, information on the presence or absence of residual signals in neighboring blocks of the current block, and the luminance of the current block. The intra prediction directionality mode and weight for the current block can be derived using at least one of the information about whether it is a block or a color difference block. Below, a method for deriving the intra prediction directionality mode and weight for the current block will be described.

Intra prediction directional mode can be set based on frequency. The decoder can obtain a histogram for the intra prediction directionality mode for a neighboring block, analyze the histogram, and select the frequently occurring intra prediction directionality mode and the second most frequently occurring mode as the prediction directionality mode. Additionally, the intra prediction directional mode may be set based on an accumulated value (eg, G value in FIG. 11). The decoder can select the intra prediction directionality mode with the highest weight and the mode with the second highest weight as the prediction directionality mode by analyzing the histogram obtained as the cumulative value obtained by adding the G value to each of the intra prediction directionality modes. Additionally, the decoder can select the intra prediction directionality mode based on the cumulative value of the distance between the intra prediction directionality modes of neighboring blocks and the G value. The distance between directional modes may mean the index difference of the directional modes. For example, the distance difference between the directional mode of index 66 and the directional mode of index 2 may be 64. Alternatively, since 66 is the last index of the directional mode, the distance difference between the directional mode at index 66 and the directional mode at index 2 may be 2. The decoder can obtain a histogram with the cumulative value of each of the intra prediction direction modes for the neighboring block with the G value added, analyze the histogram, and select the intra prediction direction mode with the highest cumulative value first. Next, the decoder selects the modes corresponding to the remaining cumulative values except the highest cumulative value (e.g., the mode with the second highest cumulative value, the mode with the third highest cumulative value, and the fourth highest cumulative value). The mode with the highest cumulative value and the mode with the smallest distance between directional modes (closest mode) can be used. Meanwhile, the decoder first selects the intra prediction directional mode with the highest cumulative value, and the decoder selects the modes corresponding to the remaining cumulative values excluding the highest cumulative value (e.g., the mode with the second highest cumulative value, Among the modes with the third highest cumulative value, the mode with the fourth highest cumulative value, etc.), the mode with the highest cumulative value and the mode with the largest distance between directional modes (the mode that is farthest away) can be used. The cumulative value for each intra prediction directional mode can be used when determining weights for the intra prediction directional modes finally determined in the 'Histogram analysis' process.

There may be two or more intra prediction directional modes for the current block derived by the decoder in FIG. 15 during the 'Histogram analysis' process. If there are two or more intra prediction directional modes derived from 'Histogram analysis', the distance between each intra prediction directional mode may be similar to or different from each other. Additionally, cumulative values between intra prediction directional modes may be similar to or different from each other. Therefore, in order to derive the optimal prediction sample for the current block, the most optimal combination must be selected among various mode combinations. In addition, the decoder can combine not only the intra prediction directional mode derived in the 'prediction mode generator' process of FIG. 14 but also the encoding mode derived in 'other prediction' to derive the optimal prediction sample for the current block. Information about this combination may be included in the bitstream. The mode combination described herein may mean using one of mode A, mode B, planar mode, DC mode, and MIP mode, or combining some or all of them.

Next, referring to FIG. 15, the decoder uses weight information corresponding to the intra prediction modes determined in the 'Histogram analysis' process and the derived intra prediction modes in the 'Prediction mode analysis' process to determine the optimal level for the current block. The optimal combination for deriving prediction samples can be selected. Specifically, the decoder uses the derived intra prediction modes and the corresponding weight information to determine whether to use weighted average to generate a prediction sample for the current block, which intra prediction mode to use, and information about the intra prediction mode. Information on how to set the weights can be determined. In addition, in the 'Prediction mode analysis' process, the decoder uses at least one of the intra prediction modes determined in the 'Histogram analysis' process, weight information corresponding to the determined intra prediction modes, and intra prediction modes of neighboring blocks. The optimal combination for deriving a prediction sample for the current block can be selected. Specifically, the decoder can determine whether to use a weighted average to generate a prediction sample for the current block, which intra prediction mode to use, and how to set the weights for the intra prediction mode. At this time, the horizontal or vertical length of the current block, quantization parameter information, available intra prediction mode information among neighboring blocks of the current block, information on the presence or absence of residual signals in neighboring blocks of the current block, and whether the current block is a luminance block. The optimal combination information for prediction modes for generating a prediction sample for the current block can be derived by using at least one of the information about whether it is a chrominance block. The combination information may include intra prediction directional mode information and weights for the intra prediction directional mode. For example, if the weight of the mode with the second highest weight among the two induced intra prediction directional modes is '0' or within a random value, the decoder does not apply weighted average when generating a prediction block for the current block. , a prediction block can be generated using only one intra prediction directional mode with the highest weight. At this time, the arbitrary value is an integer greater than 1 and may be 10. Additionally, if at least one of the two derived intra-prediction directional modes is DC mode, Planar mode, or MIP mode (i.e., not a directional mode), the decoder applies a weighted average when generating a prediction block for the current block. Instead, a prediction block can be generated using only one intra prediction directional mode with the highest weight. Additionally, if at least one of the two derived intra prediction modes is DC mode, Planar mode, or MIP mode (i.e., not directional mode), the decoder may apply weighted average when generating the prediction block for the current block. You can.

Referring to FIG. 16, if the current block is encoded with DIMD (i.e., the value of cu_dimd_flag is 1), the decoder uses syntax for DIMD combination information (information about the mode combined to obtain the prediction sample, mode combination information) Elements (cu_dimd_mode) can be additionally parsed. At this time, the way cu_dimd_mode is parsed may differ depending on the number of prediction modes to be combined. For example, if the number of combinations is '2', the decoder can only parse one bin. At this time, if the value of cu_dimd_mode is '0', the decoder can generate a prediction sample using modeA and modeB. If the value of cu_dimd_mode is '1', the decoder will generate prediction samples using modeA and modeB and the planar mode, or generate prediction samples using modeA and the planar mode, or generate prediction samples using modeB and the planar mode. You can. If the number of combinations is '4', the decoder can parse two bins. At this time, if the value of cu_dimd_mode is '0', the decoder can generate a prediction sample using modeA and modeB. If the cu_dimd_mode value is '1', prediction samples can be generated using modeA, modeB, and planar mode. If the value of cu_dimd_mode is '2', the decoder can generate prediction samples using modeA and planar mode. If the value of cu_dimd_mode is '3', the decoder can generate prediction samples using modeB and planar mode.

If syntax elements for DIMD combination information are included in the bitstream, there may be a problem that the bit amount increases. To solve this problem, syntax elements for DIMD combination information are not included in the bitstream, and the decoder can derive combination information through information on the current block and neighboring blocks. As described above, the decoder can derive optimal combination information to generate a prediction sample for the current block.

Specifically, FIG. 17 shows weight information corresponding to the intra prediction directional mode for each block derived in the 'Histogram analysis' step described with reference to FIG. 15. Referring to FIG. 17, the size of the weight (WeightX) can be expressed in alphabetical order. For example, the highest weight can be expressed as 'WeightA' and the second highest weight can be expressed as 'WeightB'. Weight information may exist as many times as the number (X) of derived intra prediction directional modes. Additionally, the intra prediction mode corresponding to the WeightA weight may be modeA. Depending on the characteristics of neighboring blocks adjacent to the current block, the characteristics of the intra prediction directional mode and the corresponding weight information may be different. Referring to FIG. 17, Case 1 represents a case where the intra prediction directional modes of modeA and modeB and the corresponding weight information are similar to each other. Case 2 represents a case where the difference between the intra prediction directional modes of modeA and modeB and the corresponding weight information is significantly different. Case 3 represents a case where the intra prediction directional modes of modeA and modeB are similar, but the difference between the corresponding weight information is significantly different. Case 4 represents a case where the intra prediction directional modes of modeA and modeB are significantly different, but the corresponding weight information is similar.

Specifically, Figure 18 shows a method of determining optimal DIMD combination information through the difference between intra prediction directional modes (modeA, modeB) derived in the 'prediction mode generator' process and the corresponding weights (WeightA, WeightB).

Referring to FIG. 18, i) when the difference in absolute value between ModeA and ModeB is less than an arbitrary threshold value (Tmode1, e.g. 10) and WeightA is less than an arbitrary threshold value (Tweight1, e.g. 0.7), The optimal DIMD combination for the current block may be a combination of modeA and modeB, and the decoder may generate a prediction sample by combining modeA and modeB. ii) If i) does not apply and WeightA is above an arbitrary threshold (Tweight2, for example, 0.85), the optimal DIMD combination for the current block may be to use only modeA, and the decoder uses only modeA to generate the prediction sample. can be created. iii) If ii) does not apply and the absolute value difference between modeA and modeB is greater than an arbitrary threshold (Tmode2, e.g. 15), the optimal DIMD combination for the current block is derived from modeA and the 'other prediction' process. It may be a combination of some or all of the encoded modes (e.g., planar mode, DC mode, MIP mode), and the decoder may use a encoding mode (e.g., planar mode, Prediction samples can be generated by combining some or all of (DC mode, MIP mode). iv) If iii) is not applicable, the optimal DIMD combination for the current block is modeA, modeB, and some or all of the encoding modes (e.g., planar mode, DC mode, MIP mode) derived in the 'other prediction' process. It may be a combination, and the decoder may generate prediction samples by combining modeA, modeB, and some or all of the encoding modes (e.g., planar mode, DC mode, MIP mode) derived in the 'other prediction' process. .

Below, a method of determining optimal DIMD combination information through the difference between the intra prediction directionality modes (modeA, modeB) derived by the decoder and the corresponding weights (WeightA, WeightB) will be described.

By comparing the weights of modeA and modeB with the sum of all weights in the histogram (see FIG. 11), the decoder can obtain DIMD combination information. Specifically, DIMD combination information can be obtained by comparing the weights of modeA and modeB with all (e.g., sum of weights) weights of direction information (including weights of modeA and modeB) for neighboring blocks of the current block. .

For example, when there is one derived intra prediction directionality mode (modeA or modeB), if the ratio of the weight of the corresponding prediction directionality mode among all weights is greater than a certain ratio, the corresponding prediction directionality mode can be selected. there is. Meanwhile, if the ratio of the weight of the derived intra-prediction directional mode among all weights is equal to or smaller than a specific ratio, the DIMD combination information combines at least one of the derived intra-prediction directional mode, planar mode, DC mode, and MIP mode. It may be. At this time, the specific ratio may be 1/2, 2/3, 3/4, 3/8, etc.

As another example, when there are two derived intra prediction directionality modes (modeA and modeB), and the ratio of the sum of the weights of each intra prediction directionality mode among all weights is greater than a certain ratio, the two corresponding intra prediction directionality A mode can be selected. Meanwhile, if the ratio of the sum of the weights of the two derived intra-prediction directional modes among all weights is equal to or smaller than a specific ratio, the DIMD combination information is at least one of the derived intra-prediction directional mode, planar mode, DC mode, and MIP mode. More than one may be selected. For example, modeA, modeB and planar mode can be selected. Alternatively, modeA and modeB may be selected. At this time, the specific ratio may be 1/2, 2/3, 3/4, 3/8, etc.

Specifically, Figure 19 shows the 'Intra prediction' and 'weighted prediction' processes of Figure 14. Referring to FIG. 19, when there are a plurality of intra prediction directional modes derived by the decoder, the decoder uses the weight information of each of the plurality of intra prediction directional modes to perform weighted prediction using their weight information to obtain a prediction sample. can be obtained. Weight information may be reset based on at least one of the horizontal length, vertical length, quantization parameter information, and information about whether the current block is luminance or chrominance (Additional information) of the current block.

Figures 20 and 21 show pixel (pixel) values of neighboring blocks used when deriving an intra prediction directional mode in the 'histogram analysis' process of Figure 14. When deriving an intra prediction directional mode, filtering calculations are required for all surrounding pixels located to the left and above of the current block. At this time, the surrounding pixels may be pixels on a line adjacent to or spaced apart from the boundary of the current block. That is, in order to derive the intra prediction directional mode, filtering calculations must be performed on all neighboring pixels on lines adjacent to or spaced apart from the left and upper boundaries of the current block, so there is a problem that delay may occur due to computational complexity. To solve this problem, the decoder can separate filtering calculations for neighboring pixels located to the left and above of the current block and derive an intra prediction directional mode for neighboring pixels located to the left and above in parallel. In addition, the decoder may derive directional information by performing filtering calculations only on surrounding pixels corresponding to an arbitrarily determined location.

Figures 20(a) and (c) show surrounding pixels located to the left of the current block used in filtering calculations to derive an intra prediction directional mode. Figures 20(b) and 20(d) show surrounding pixels located above the current block used in filtering calculations to derive an intra prediction directional mode.

The intra-directional information that can be mapped may vary depending on the location of the reference pixel that the decoder uses to obtain the histogram described in FIG. 11.

For example, the horizontal and vertical lengths of the current block may be the same. At this time, the intra prediction directionality mode mapped when deriving directionality information for the surrounding pixel located to the left of the current block in FIG. 20(a) can only be used for indices -14 to 34. In FIG. 20(b), the intra prediction directionality mode mapped when deriving directionality information for neighboring pixels located above the current block can be used only for indices 34 to 80.

Meanwhile, there may be cases where the horizontal and vertical lengths of the current block are different, and the positions of surrounding pixels used when deriving directional information may vary depending on the horizontal and vertical lengths of the current block. For example, if the vertical length of the current block is longer than the horizontal length, the decoder may derive directional information by using only the surrounding pixels located on the left side of the current block, without using the surrounding pixels located on the upper side. This has the effect of reducing computational complexity by using only neighboring pixels located on the left rather than using neighboring pixels located on the upper side. If the horizontal length of the current block is longer than the vertical length, the decoder can derive directional information by applying a greater weight to neighboring pixels located above than to neighboring pixels located to the left of the current block. Weights can use specific pre-arranged values. For example, if the horizontal length of the current block is greater than the vertical length, a weight of 1 may be used for neighboring pixels located to the left of the current block, and a weight of 2 may be used for neighboring pixels located above. In other words, because the current block is longer horizontally than vertically, it is more effective to derive an intra prediction directionality mode using directionality information about surrounding pixels located above rather than surrounding pixels located to the left of the current block. Because.

The decoder can perform filtering calculations only on pixels corresponding to a certain number of pixels located around the current block. At this time, the specific number may be a multiple of N, and N may be 2, 3, 4, etc. Information about N may be included in picture header information. Referring to FIG. 21, based on the position moved from the upper left corner of the current block to (-2, -2) on the x and y axes, the decoder performs filtering calculations only at positions corresponding to multiples of '2', Directional information can be derived. Additionally, the decoder may derive directionality information if the current block is a luminance block, and may not derive directionality information if the current block is a chrominance block. The decoder can apply the directional information found in the luminance block to the chrominance block. Meanwhile, directionality information for the luminance block and directionality information for the chrominance block can be obtained respectively. The chrominance block may not use the directional information found in the luminance block, but may use information obtained using at least one of planar mode, DC mode, horizontal mode, vertical mode, and MIP mode.

The intra prediction directionality mode of the current block is likely to be similar to the intra prediction directionality mode of neighboring blocks. Therefore, in order to encode the intra prediction directional mode of the current block, an MPM (Most Probable Mode) list is constructed using the intra prediction directional mode of neighboring blocks, and information about whether the intra prediction directional mode of the current block exists in the MPM list and Information about existing locations may be included in the bitstream. That is, information about the intra prediction directional mode of the current block may not be separately included in the bitstream. Therefore, since the intra prediction direction mode of the current block is determined based on information about whether the intra prediction direction mode of the current block exists in the MPM list and information about the location where it exists, the intra prediction direction mode of the current block depends on whether the MPM list is effectively constructed. Information (i.e., bit quantity) for deriving the prediction directional mode may vary.

The method of deriving an intra prediction directionality mode using the directionality characteristics of surrounding pixels of the current block can also be used in the process of constructing an MPM list. The decoder can add the intra prediction directionality mode for the current block derived using the directionality characteristics of surrounding pixels of the current block to the MPM list and use it to encode the intra prediction directionality mode for the current block. This can be used when neighboring blocks of the current block are not encoded in intra prediction mode or when there is no intra prediction directional mode such as MIP (Matrix intra prediction) mode.

Neighboring blocks adjacent to the current block may contain blocks without an intra prediction directional mode and blocks with an intra prediction directional mode. If the neighboring block located to the left of the current block is a block without an intra prediction directionality mode, the decoder can calculate the directionality characteristics using only the surrounding pixels located above the current block and derive the intra prediction directionality mode of the current block. . Or, if the intra prediction directionality mode exists in the neighboring block located above the current block, and the intra prediction directionality mode does not exist in the neighboring block located to the left of the current block, the decoder uses the intra prediction directionality mode of the neighboring block located above the current block. It can be included in the MPM list, and the intra prediction directionality mode derived through the directionality characteristics of surrounding pixels located on the left can be included in the MPM list.

Referring to FIG. 22, DIMD mode may be preferentially included in the MPM list. There may be multiple intra prediction directional modes of the current block derived through the DIMD mode. Therefore, the decoder can obtain prediction samples through multiple prediction. Among the neighboring blocks of the current block, the intra prediction directionality mode of a block with an intra prediction directionality mode may be added to the MPM list. If there is an empty spot in the MPM list, a mode modified by +1 or -1 in modeA can be added to the list, and DC mode, horizontal mode, vertical mode, and MIP mode can be added. Instead of DIMD mode, TIMD (Template based Intra Mode Derivation) mode may be preferentially included in the MPM list. Additionally, both DIMD and TIMD modes can be included in the MPM list. Additionally, at least one of the two intra directional prediction modes derived using the DIMD mode and the two intra directional prediction modes derived using the TIMD mode may be included in the MPM list. Additionally, if the intra-prediction directional mode is used in a first region of one of the GPM blocks divided into two regions, the MPM list can be used to derive the intra-prediction directional mode for the first region. The MPM list may include at least one of two intra prediction directionality modes derived using DIMD mode and two intra prediction directionality modes derived using TIMD mode.

If the MPM list includes DIMD mode, information about whether the current block is encoded in DIMD mode can be derived through a syntax element (mpm_idx). Therefore, additional information related to DIMD may not need to be signaled. At this time, if the current block is encoded in DIMD mode, the reference line index may be 0 (mrl_ref_idx is 0). Additionally, when DIMD mode is used, mrl_ref_idx is not parsed and the value of mrl_ref_idx may be inferred as 0. Additionally, the MPM list may include intra prediction directional modes derived using DIMD mode. If the intra prediction directional mode derived from DIMD is selected, mrl_ref_idx may be reset. For example, the value of mrl_ref_idx can be reset to any of 0, 1, 2... Based on the value of mrl_ref_idx obtained by parsing mrl_ref_idx, the decoder determines whether to add the intra prediction directionality mode derived using the DIMD mode in the MPM list or the priority of the intra prediction directionality mode derived using the DIMD mode in the MPM list. You can decide. For example, if the value of mrl_ref_idx is non-0, the decoder may not include the intra prediction directional mode derived using the DIMD mode in the MPM list. Alternatively, only when the value of mrl_ref_idx is 0, the decoder may include an intra prediction directional mode derived using the DIMD mode in the MPM list. Alternatively, only when the reference pixel line determined from the value of mrl_ref_idx is 0, 1, 2, or 3, the decoder may include an intra prediction directional mode derived using DIMD mode in the MPM list. Alternatively, if the reference pixel line determined from the value of mrl_ref_idx is greater than 3, the decoder may not include the intra prediction directional mode derived using the DIMD mode in the MPM list. Alternatively, if the value of mrl_ref_idx is not 0, the decoder may include an intra prediction directional mode derived using DIMD mode in the MPM list.

The intra prediction directional mode derived from the DIMD mode can be used to reorder intra prediction mode candidates in the MPM list. The decoder can construct an MPM list starting from neighboring blocks of the current block and then derive an intra prediction directional mode using DIMD mode. The decoder can reorder intra prediction mode candidates in the MPM list using the derived intra prediction directional mode. At this time, the decoder determines the derived intra prediction directionality mode, the horizontal or vertical length of the current block, quantization parameter information, intra prediction mode information available among neighboring blocks of the current block, and the presence of residual signals of neighboring blocks of the current block. The MPM list can be rearranged using one or more of information about whether the current block is a luminance block or a chrominance block.

The decoder can reorder the MPM list using the difference between the intra prediction mode candidates in the MPM list and the derived intra prediction directional mode. For example, the decoder may calculate the difference between the derived intra-prediction directional mode and each of the intra-prediction mode candidates in the MPM list and sort the MPM list in order of the smallest difference (including 0). The intra prediction mode candidate with the smallest difference in the MPM list may be set to have the smallest index value in the MPM list. Additionally, the derived intra prediction directional mode may be set to have the highest priority within the MPM list and may be set to have the smallest index value. And, after the derived intra-prediction directionality mode, the decoder can calculate the difference between the derived intra-prediction directionality mode and each of the intra-prediction mode candidates in the MPM list and sort the MPM list in order of the smallest difference (including 0). Additionally, when two MPM lists are used, the first MPM list may be organized in the order of intra prediction mode candidates similar to the derived intra prediction directional mode. The difference between the derived intra-prediction directional mode and the intra-prediction mode candidates in the MPM list may be organized in descending order. The second MPM list can be constructed using candidates that do not have high similarity to the derived intra prediction directional mode. For example, the second MPM list may be organized in the order of the largest difference between the derived intra-prediction directional mode and the intra-prediction mode candidates in the MPM list. If the size of the MPM list is fixed, there may be unfilled empty space in the MPM list. At this time, a new prediction candidate derived using one or more of the candidates already included in the MPM list or frequently occurring candidates may be added to the empty space. For example, a new prediction candidate may be a candidate corresponding to a number added or subtracted by a random amount in the '+' or '-' direction from the mode number (index) of an already included candidate. At this time, the arbitrary sizes are '1', '2', '3',... It may be a natural number such as, and information about an arbitrary size may be included in the picture header information. Additionally, when two MPM lists are used, the first MPM list may be composed of prediction modes obtained by referring to the prediction mode of the neighboring block of the current block, and the second MPM list may be composed of prediction modes derived by DIMD. It can be. At this time, if the number of prediction modes included in the MPM list is smaller than the number of prediction modes that can be included in the predefined MPM list, prediction modes derived by applying an offset to the prediction modes included in the MPM list may be added.

The intra prediction directional mode derived from the DIMD mode can be used to reassemble the intra prediction mode candidates in the MPM list. The decoder may construct an MPM list based on the prediction modes of neighboring blocks of the current block and then derive the intra prediction directional mode through the DIMD mode. Additionally, the decoder can recombine the intra prediction mode candidates in the MPM list using the derived intra prediction directional mode and reconstruct them into multiple prediction candidates. At this time, the decoder determines the derived intra prediction directionality mode, the horizontal or vertical length of the current block, quantization parameter information, intra prediction mode information available among neighboring blocks of the current block, and the presence of residual signals of neighboring blocks of the current block. The MPM list can be reassembled using one or more of information about whether the current block is a luminance block or a chrominance block. Below, the MPM list recombination method will be described.

Using the difference between the derived intra-prediction directional mode and intra-prediction mode candidates in the MPM list, the decoder can reassemble the MPM list. For example, the decoder may include in the MPM list a multi-prediction candidate constructed by combining the intra-prediction directional mode derived by selecting candidates whose difference is less than or equal to a certain value and the candidates (existing intra-prediction modes in the MPM list). . At this time, the decoder may include the corresponding candidates in the MPM list in order from smallest to largest difference. Next, the decoder can sequentially insert candidates whose difference is greater than a random value into the MPM list. At this time, random values are 1, 2, 3, … It may be a natural number such as: For example, the index of the derived intra prediction mode is '18', the indices of candidates in the MPM list are '16', '21', '34', '1', '66', and the random value is 5. It can be assumed. At this time, indices '16' and '21', which have a difference within '5' from the derived intra prediction mode, can be changed to multi-prediction candidates, and the candidates in the MPM list have indices '16, 18', '21, 18'. ', '34', '1', and '66' prediction modes. That is, '16, 18' and '21, 18' may be multiple prediction candidates. For example, when the decoder selects '16, 18', a multiple prediction candidate in the MPM list, the decoder performs a weighted average of the prediction sample generated by the prediction mode of index 16 and the prediction sample generated by the prediction mode of index 18 to obtain the final result. A prediction block can be created. At this time, if the number of MPM lists is limited to 5, the MPM list can be '16, 18', '21, 18', '16', '21', and '34'. Additionally, when two MPM lists are used, the first MPM list may be composed of candidates that are recombined using candidates similar to the derived intra prediction directional mode. The second MPM list may be composed of candidates from the first MPM list and candidates that do not have high similarity to the derived intra prediction directional mode. Accordingly, the first MPM list may be composed of multiple prediction candidates, and the second MPM candidates may be composed of single prediction candidates. Alternatively, the first MPM list may be composed of both single prediction candidates and multiple prediction candidates, and the second MPM candidates may be composed of only single prediction candidates. For example, the derived intra prediction mode is index '18', the prediction mode candidates in the first MPM list are index '16', '21', '34', '1', '66', and in the second MPM list The prediction mode candidates are indices '50', '2', '8', '30', and '40', and an arbitrary value can be 5. At this time, indices '16' and '21' that have a difference of less than 5 from the derived intra prediction mode index 18 may be changed to multi-prediction candidates. At this time, the first MPM list may consist of indices '16, 18', '21, 18', '16', '18', and '34', and the second MPM list may consist of indices '1', '66', and ' It can consist of '50', '2', '8', '30', and '40'.

The intra prediction directional mode can be encoded based on whether it is in the MPM list and, if so, where it is located. If the intra prediction directional mode does not exist in the MPM list, the intra prediction directional mode may be encoded based on the total number of intra prediction directional modes minus the total number of prediction modes in the MPM list. Specifically, there are a total of 67 intra-prediction directional modes, and among them, encoding can be performed on the total number of prediction modes in the MPM list, 5, and 61 types excluding the planar mode. At this time, 61 intra prediction directional modes can be encoded using fixed length coding, so encoding for a total of 6 bins is required.

Referring to FIG. 23, the decoder may use a template, which is a reconstructed random area (pixel(s)) adjacent to the current block, to derive the intra prediction mode of the current block. First, the decoder can generate a prediction template for the template using surrounding pixels (references) adjacent to the template. Additionally, the decoder can use the intra prediction mode for the prediction template that is most similar to the already restored template to restore the current block. The method of deriving the intra prediction mode of the current block using the above-described template may be described as TIMD (Template intra mode derivation). At this time, the intra prediction mode may be a mode with indices 0 to 67, and may only apply to the intra prediction mode within the MPM list derived from the neighboring blocks of the current block. At this time, the intra prediction mode may be an intra prediction mode in the MPM list derived from a neighboring block of the current block and a mode that differs from the corresponding intra prediction mode by an arbitrary number. Any number can be 1, 2, 3, ... Alternatively, only the directional mode may be applicable to the intra prediction mode for the template, and the non-directional mode (planar mode, DC mode) may not be applicable.

Hereinafter, a method for deriving an intra prediction directional mode using the TIMD mode will be described.

i) The decoder can set the size of the template. The horizontal or vertical size (length) of the template may be 4, and if the horizontal or vertical size (length) of the current block is 8 or less, the horizontal or vertical size (length) of the template may be set to 2. ii) The decoder can set the type of template. The type of template can be divided into a type that uses only the left sample, a type that uses only the upper sample, and a type that uses all of the left, upper, and upper left samples. The decoder can determine the type of template depending on whether the surrounding block is valid or whether the surrounding block can be used to derive an intra prediction directional mode. Meanwhile, if neighboring blocks cannot be used to derive the intra prediction directional mode, the TIMD mode may be set to a planar mode, and weighted averaging may not be performed. iii) The decoder can construct a template for the current block. iv) The decoder may derive an intra prediction directionality mode for neighboring blocks located to the left, top, top left, top right, and bottom left of the current block to determine whether the current block has directionality. v) If the neighboring blocks of the current block are not all directional (e.g., non-directional mode (DC mode, planar mode, MIP mode, etc.)), the decoder selects one intra prediction directional mode with the minimum cost. You can select and not perform TIMD mode. At this time, weighted average using multiple prediction blocks may not be performed. vi) If there is one or more blocks with directionality among the neighboring blocks of the current block, the process described later can be performed. The process described later can be performed based on the intra prediction directional mode that exists in the MPM list. This is because complexity may increase if all 67 intra-prediction directional modes are checked. a. The decoder can construct an MPM list. b. Next, the decoder can modify the MPM list by adding DC mode, horizontal mode, and vertical mode to the MPM list if they do not exist in the MPM list. c. The decoder can compare costs by performing evaluation on all intra prediction directional modes in the modified list. The decoder can select a first mode with the lowest cost and a second mode with the second lowest cost. d. To increase accuracy, the decoder may additionally perform evaluation on the intra prediction directional mode corresponding to the index of the intra prediction directional mode of the first mode and the index of the intra prediction directional mode of the second mode by 1 less or greater than the index. . The decoder may perform additional evaluation and again select the third mode with the lowest cost and the fourth mode with the second lowest cost. Meanwhile, the first mode and the third mode may be the same, and the second mode and the fourth mode may be the same. e. The decoder may decide whether to perform weighted averaging based on the costs of the third mode and the fourth mode. If the difference between the cost of the third mode and the cost of the fourth mode is less than a certain value, the decoder can perform weighted averaging, and the weights of the third mode and fourth mode are based on the costs of the third mode and fourth mode. can be decided. If the difference between the cost of the third mode and the cost of the fourth mode is greater than a specific value, the decoder may generate a prediction block using only the third mode without performing weighted average. At this time, the specific value may be a pre-arranged value.

The size of the template may vary depending on the horizontal or vertical length of the current block. For example, as shown in FIG. 23(a), an upper template that is longer than the horizontal length of the current block may be configured. At this time, the vertical length of the upper template may be a predetermined length. Likewise, a left template that is longer than the vertical length of the current block can be configured. At this time, the horizontal length of the left template may be a pre-arranged length. The pre-arranged length can be 1, 2, 3, ....

If the current block is located at a CTU boundary (when any of the upper, lower, left, and right boundaries of the current block are included in the boundary of the CTU), the reference pixel for deriving/predicting the template used for TIMD mode may be changed. Referring to FIG. 24, when the upper boundary of the current block is included in the boundary of the CTU, there may be one reference line located above the current block used to configure the template. This is to minimize line buffer memory usage. Therefore, the decoder can perform TIMD mode by configuring only the left template of the current block, without configuring the upper template of the current block. At this time, the reference pixel for predicting the left template may be the upper reference pixel (above reference) and the left reference pixel (left reference) of the current block. At this time, the height of the left template may be the same as the height of the current block, as shown in Figure 24(a). In addition, as shown in Figure 24(b), the decoder checks whether the neighboring block to the left of the current block is a block for which restoration has already been completed, and if it is a block for which restoration has been completed, the decoder configures the height of the left template to be larger than the height of the current block. can do.

In general, the accuracy of prediction samples for the current block can increase as the decoder refers to more adjacent pixels of the current block. On the other hand, if many surrounding pixels are referenced, the required memory increases. Additionally, if there are blocks that have not yet been restored among the surrounding blocks adjacent to the current block, the corresponding area cannot be used as a template. In order to increase memory and effectively process unrestored areas, the length of the upper template is the same as the horizontal length of the current block, and the length of the left template is the same as the vertical length of the current block, as shown in Figure 23(b). can be set.

The decoder may use an intra prediction mode derived using a template to obtain a prediction sample for the current block. The decoder can generate prediction samples using neighboring pixels adjacent to the current block, and can adaptively select which neighboring pixels to use to generate the prediction samples. Additionally, the decoder may use multiple reference lines to generate prediction samples, and in this case, index information of multiple reference lines may be included in the bitstream.

For entropy coding, a context for the index of multiple reference lines for TIMD mode can be newly defined. The increase in context types may be related to memory and context switching complexity. Therefore, the context used to code and decode the index of a multiple reference line used in TIMD mode may be a reused context for the index of an existing multiple reference line.

Conversion of the residual signal of the current block can be performed in two steps. The primary transformation may be adaptively applying transformations such as DCT-II, DST-VII, DCT-VIII, DCT5, DST4, DST1, and identity transformation (IDT) horizontally and vertically, respectively. A secondary transformation may be additionally applied to the transformation coefficient for which the primary transformation has been completed, and the secondary transformation may be calculated as a matrix product between the primary transformed transformation coefficient and a predefined matrix. The secondary transform can be described as a low frequency non-separable transform (LFNST). The matrix transformation set for secondary transformation may vary depending on the intra prediction mode of the current block. Coefficient information of the transformation matrix used for secondary transformation may be included in the bitstream.

When a secondary transform is applied to the current block to which DIMD mode or TIMD mode is applied, the transform set for the secondary transform may be determined based on the intra prediction mode derived in DIMD mode or TIMD mode. Coefficient information of the transformation matrix used for secondary transformation may be included in the bitstream. The decoder can set matrix coefficient information of secondary transformation for DIMD mode or TIMD mode by parsing coefficient information included in the bitstream. At this time, one of the two intra prediction modes derived from the TIMD mode can be used to select the first or second transform set. By comparing the costs of each of the two intra prediction directional modes, the intra prediction directional mode with the smallest cost can be used to select the first or second transform set. Additionally, one of the two intra prediction directional modes derived from DIMD can be used to select either the first-order transform or the second-order transform set. By comparing the weights of each of the two intra prediction modes, the intra prediction directional mode with the highest weight can be used to select the first or second transformation set.

TIMD mode has high complexity because it predicts the template of the current block and uses the intra prediction mode derived from the template to generate the prediction block of the current block. Therefore, when the decoder generates a prediction template for the template area, the existing reference sample filtering process may not be performed. Additionally, if ISP mode is applied to the current block or if the current block is applied in CIIP mode, TIMD mode may not be applied. ISP mode or CIIP mode may not be applied to the current block to which TIMD mode is applied, or syntax related to ISP or CIIP may not be parsed. At this time, the value of the syntax related to the ISP or CIIP that is not parsed may be inferred as a pre-designated value.

Template prediction can be performed by dividing into a left template area adjacent to the current block and an upper template area, and an intra prediction mode can be derived for each template. Additionally, two or more intra prediction modes can be derived for each template, and there can be four or more intra prediction modes for the current block. If there are two or more intra prediction modes, the prediction sample for the current block can be generated using all of the derived intra prediction modes, and the decoder can generate the final prediction block for the current block by weight-averaging the generated prediction samples. You can. At this time, at least three of two or more intra prediction modes, planar mode, DC mode, and MIP mode derived from template prediction can be used to generate the prediction sample. For example, when the decoder generates (obtains) a prediction sample for the current block, the decoder generates the final prediction sample by weighting the intra prediction modes derived from template prediction and the prediction samples generated using the planar mode. can do.

Even when CIIP mode is applied, prediction samples can be generated using the methods described above. CIIP mode is a method that uses both intra prediction and inter prediction when generating prediction samples (blocks) for the current block. The prediction sample for the current block may be generated as a weighted average between intra prediction samples and inter prediction samples.

When intra prediction samples are generated by applying CIIP mode, DIMD mode or TIMD mode can be used. At this time, when DIMD mode is used, intra prediction samples can be generated based on DIMD combination information. For example, the decoder may generate a first prediction sample using the intra prediction mode with the highest weight and generate a second prediction sample using the intra prediction mode with the second highest weight. And, the decoder may generate a final intra prediction block by performing a weighted average of the first prediction sample and the second prediction sample. At this time, the decoder may generate a final intra prediction block by performing a weighted average of a total of 3 prediction samples, including a sample predicted in planar mode, a first prediction sample, and a second prediction sample, among the neighboring blocks of the current block. When TIMD mode is used, intra prediction samples can be generated based on TIMD combination information. For example, the decoder may generate two prediction samples using each of the two intra prediction modes. Then, the decoder can generate the final intra prediction sample by performing a weighted average of the two prediction samples. At this time, the decoder can generate the final intra prediction sample by performing a weighted average of the two prediction samples and the sample predicted in planar mode.

The accuracy of intra prediction samples may vary depending on location. That is, pixels located at a distance from neighboring pixels used for prediction within the prediction sample may contain more residual signals than pixels located at a closer location. Accordingly, the decoder can divide prediction samples into vertical, horizontal, and diagonal directions depending on the direction of the intra prediction mode, and set weights differently depending on the distance from neighboring pixels used for prediction. This can be applied to intra-prediction blocks created using CIIP mode, or to intra-prediction blocks created using more than one intra-prediction mode, for each pixel within a prediction block depending on the distance between the location of the reference pixel and the pixel location within the prediction block. Weights can be set differently. As an example of an embodiment, when the intra prediction mode of the current block is a mode having a vertical direction or a direction similar to vertical, a higher weight is set as the pixel position of the prediction block is closer to the top pixel, and the pixel position farther from the top pixel is set. The lower the weight, the lower the weight can be set for each pixel position.

If the current block is encoded in CIIP mode, the decoder can generate the final prediction block by weighting the intra prediction sample and the inter prediction sample. The pixel-level weight in the inter-prediction sample can be set considering the pixel-level weight in the intra-prediction sample. For example, the pixel weight of the inter prediction sample may be the sum of all weights minus the pixel weight of the intra prediction sample. At this time, the sum of all weights may be the sum of the weights of the intra prediction samples and the weights of the inter prediction samples on a pixel basis.

When two or more intra prediction modes are used to generate prediction samples, the decoder may generate prediction samples based on each intra prediction mode and perform a weighted average on the generated prediction samples to generate the final prediction sample. When generating prediction samples for each intra prediction mode, pixel-level weights according to the intra prediction mode may be applied.

The pixel-level weights include the intra prediction mode, the horizontal length and vertical length of the current block, quantization parameters, information about whether the current block is luminance or chrominance, whether the surrounding block is intra-coded, and the presence of residual transform coefficients of the surrounding block. It can be set based on at least one thing, such as presence/absence information.

Referring to FIG. 25(a), the video signal processing device includes a first reference pixel line (reference line 1) adjacent to the current block 2501 and a second reference pixel line (reference line 2) upwardly adjacent to the first reference pixel line. ) A prediction sample 2502 within the current block can be generated based on . The predicted sample sample 2502 in FIG. 25(a) is only a sample corresponding to a position according to an embodiment of the present invention, and the pixel position is not limited to this. In this specification, the meaning of generating by a video signal processing device may be the same as the meaning of acquiring by the video signal processing device. FIG. 25(b) is a view showing FIG. 25(a) in more detail. For example, the video signal processing device uses a smoothing filter, a cubic, or a Gaussian filter depending on the intra prediction mode to determine the first prediction pixel through six reference pixels of the first reference pixel line. (2503) can be generated. And, the video signal processing device uses the six reference pixels of the second reference pixel line to generate a second prediction pixel (2504) using a smoothing filter, cubic or Gaussian filter according to the intra prediction mode. ) can be created. The video signal processing device may generate the third prediction pixel 2505 by performing a weighted average on the generated first prediction pixel 2503 and the second prediction pixel 2504 using an arbitrarily determined weight. At this time, the six reference pixels of the second reference pixel line are located one pixel to the right of each pixel of the first reference pixel line, considering the intra prediction mode of the current block, the pixel position to be generated, the position of the reference pixel line, etc. These may be reference pixels of the moved location. The video signal processing device may generate a prediction sample 2502 within the current block based on the third prediction pixel 2505. Alternatively, the video signal processing device may generate the current intra-block prediction sample 2502 based on the third prediction pixel 2505 and the distance between the third prediction pixel 2505 and the current intra-block prediction sample 2502. The weight used by the video signal processing device to generate the third prediction pixel 2505 may be an integer of 0 or more. For example, the weight of the first prediction pixel 2503 may be 3, and the weight of the second prediction pixel 2504 may be 1. At this time, the intra prediction mode of the current block, the location of the generated pixel (e.g., the location of the first prediction pixel 2503 in FIG. 25, the location of the second prediction pixel 2504, and the location of the third prediction pixel 2505) ), the location of the reference pixel used to generate a prediction sample based on at least one or more such as the location of the reference pixel line (e.g., the location of the first reference pixel line and the second reference pixel line in FIG. 25) (FIG. The positions of the six reference pixels of the first reference pixel line of 25 and the positions of the six reference pixels of the second reference pixel line of 25) may vary.

Referring to FIG. 26(a), the video signal processing device may generate a prediction sample 2601 within the current block using reference pixels in which two reference pixel lines are positioned at the same vertical direction. That is, each of the six pixels of the first reference pixel line (Reference line 1) and each of the six pixels of the second reference pixel line (Reference line 2) used to generate the prediction sample within the current block have a vertical position. may be the same. The position of each pixel can be determined regardless of the position of the reference pixel line. Referring to FIG. 26(b), the video signal processing device may generate a first prediction pixel 2602 using six pixels of the first reference line. The video signal processing device may obtain a second prediction pixel 2603 using six pixels 2607 of the second reference line. At this time, when a sample of the same second reference line in the vertical direction as the first reference line is used, the right pixel 2606 of the second reference line may not be used. Accordingly, the video signal processing device constructs the right pixel 2606 of the second reference line by copying (padding) the pixel 2605 and then constructing the right pixel 2606 of the second reference line ( 2606) can be used to generate the second prediction pixel 2603. The video signal processing device may obtain the third prediction pixel 2604 using the first prediction pixel 2602 and the second prediction pixel 2603. The video signal processing device may generate a prediction sample 2601 within the current block using the third prediction pixel 2604. FIG. 26 is different from FIG. 25 only in the pixels of the first and second reference pixel lines used to generate the prediction sample within the current block, and the method of generating the prediction sample within the current block may be the same as that of FIG. 25.

When constructing an MPM list, the video signal processing device may include a DIMD mode that derives an intra prediction directional mode from a reconstructed neighboring block adjacent to the current block in the MPM list. At this time, when there are two intra prediction directionality modes derived from the DIMD mode, the video signal processing device can include both of the two intra prediction directionality modes in the MPM list.

When a video signal processing device generates a prediction sample for the current block using a plurality of reference pixel lines (method of FIGS. 25 and 26), the DIMD mode included in the MPM list is such that pixels adjacent to the current block are not used, A video signal processing device may derive an intra prediction directional mode by performing a DIMD method at a pixel position indicated by a reference pixel line. And the derived intra prediction directional mode may be included in the MPM list.

Additionally, the video signal processing device may derive an intra-prediction directional mode using DIMD for each reference pixel line and then add the derived intra-prediction directional mode to the MPM list. The reference pixel line may be a line positioned 1, 3, 5, 7, or 12 pixels away from the upper left position of the current block. Alternatively, the reference pixel line may be a reference pixel line located above and adjacent to the current block. At this time, the intra prediction directional mode derived by the video signal processing device using DIMD for each reference pixel line may be added to the MPM list. If the derived intra-prediction directional modes overlap, the overlapping intra-prediction directional modes may be excluded from the MPM list. Reference pixel lines may be indexed herein. For example, a reference pixel line adjacent to the current block is referred to as reference pixel line 0, and a reference pixel line that is 1 pixel away from the current block, 2 pixels away, ... n pixels away from the current block is referred to as reference pixel line 1, respectively. Line 2, ... may be indexed and referred to as reference pixel line n.

For example, when the reference pixel lines for the current block are 1 and 3, the video signal processing device may perform DIMD using

reference pixel lines

0, 1, and 2 and derive an intra prediction directional mode. Additionally, when the reference pixel lines for the current block are 5 and 7, the video signal processing device may perform DIMD using

reference pixel lines

5, 6, and 7 and derive an intra prediction directional mode. Additionally, when the reference pixel line for the current block is 12, the video signal processing device may perform DIMD using

reference pixel lines

11, 12, and 13 and derive an intra prediction directional mode.

If the current block is encoded in DIMD mode, the encoder can generate and signal a bitstream including information related to the reference pixel line that should be used to derive the intra prediction directional mode. And the encoder can generate an intra prediction block using the reference pixel line on which DIMD was performed and the intra prediction directional mode derived from DIMD. When the current block is decoded in DIMD mode, the decoder may parse information related to the reference pixel line and then derive an intra prediction directional mode using the reference pixel line corresponding to the information related to the reference pixel line. And the decoder can generate an intra prediction block using reference pixel line information and an intra prediction directional mode derived from DIMD.

Referring to FIG. 27, a video signal processing device may generate a virtual new reference pixel line using a plurality of reference pixel lines and generate a prediction sample within the current block based on the new reference pixel line.

Circles in the same horizontal row in FIG. 27 may mean pixels located on the same reference pixel line. Figure 27(a) shows that the position of the reference pixel used when generating a prediction sample using at least one of the intra prediction mode of the current block, the pixel position to be generated, and the position of the reference pixel line varies for each reference pixel line. . The video signal processing device generates samples 2701-a to 2701-d at the '1' position for each reference pixel line using a smoothing filter, a cubic filter, or a Gaussian filter depending on the intra prediction mode, and multiple '1' position samples 2701-a to 2701-d. A new virtual reference pixel line 2702 can be created using samples at the 1' position. Meanwhile, in order to reduce complexity, samples at the '1' position may not be generated, and a virtual new reference pixel line 2702 may be generated using integer reference pixels closest to the '1' position. A video signal processing device can generate prediction samples within the current block using a virtual new reference pixel line. Referring to FIG. 27(b), the video signal processing device can use four reference pixels at the same position in the vertical direction to create a virtual new reference pixel line 2705. That is, the pixels for generating the virtual new reference pixel line 2705 are each of the six reference pixels of the first reference pixel line and the six reference pixels of the second reference pixel line are pixels at the same position in the vertical direction. You can. The video signal processing device may acquire the first prediction pixel 2703 using at least one of a virtual new reference pixel line and position, an intra prediction mode of the current block, and a pixel position to be generated. The video signal processing device may generate a prediction sample 2704 of the current block using the first prediction pixel 2703. The number of reference pixel lines used to generate a virtual new reference pixel line may be two or more. For example, 2 to 5 reference pixel lines may be used.

Referring to FIG. 28, the video signal processing device may use four reference pixels close to the location of the new reference pixel 2801 to be created to generate a virtual new reference pixel line. At this time, the location of the new reference pixel may be the same as the reference pixel line in the vertical direction. Additionally, the location of the new reference pixel may be adjacent to the direction of the intra prediction mode of the current block.

Referring to FIG. 29, a video signal processing device may receive a plurality of reference pixel lines and perform intra prediction to generate a prediction block within the current block. Different prediction blocks can be generated depending on which reference pixel line is used, and the video signal processing device can generate a final prediction block by performing a weight average according to the weights input for each prediction block. At this time, the weight may be a preset value. For example, the weight of the sample predicted by the main reference pixel line may be 3, and the weight of the sample predicted by the sub-reference pixel line may be 1. At this time, the weight is based on at least one of the following: the size of the current block, the horizontal or vertical size of the current block, the intra prediction mode of the current block, quantization parameter information, and the distance (or difference) between the main reference pixel line and the sub-reference pixel line. It can be decided. Additionally, the reference pixel line may be determined based on at least one of the size of the current block, the horizontal or vertical size of the current block, the intra prediction mode of the current block, quantization parameter information, and MRL information. For example, the main reference pixel line may be a reference pixel line adjacent to the current block, and the sub-reference pixel line may be a reference pixel line indicated by the MRL. As another example, the main reference pixel line may be a reference pixel line indicated by the MRL, and the sub-reference pixel line may be a reference pixel line that is located at a certain location away from the reference pixel line indicated by the MRL, and may be located at a certain location. may be an integer from -N to +N, and N may be an integer greater than 0.

The intra prediction mode used in the method of generating a prediction sample within the current block described above may be the same for each reference pixel line. Or, conversely, the intra prediction mode used in the method of generating the prediction sample within the current block described above may be different for each reference pixel line. That is, the signaled intra prediction mode can be used in the main reference pixel line, and in the sub-reference pixel line, a random value (corresponding to the index) is added or subtracted from (the index of) the intra prediction mode used in the main reference pixel line. Predictive mode may be used. At this time, the arbitrary value may be an integer of 1 or more. Additionally, the video signal processing device may determine whether to increase or decrease a random value depending on the value of the intra prediction mode used in the main reference pixel line. For example, the video signal processing device may increase the angle of the intra prediction mode by a random value if it is a negative number, and may decrease it by a random value if it is a positive number.

Hereinafter, a method for determining the optimal reference pixel line for the current block (for restoration of the current block) based on the template will be described.

For convenience of explanation in this specification, a method of determining the optimal reference pixel line for the current block (for restoration of the current block) may be described as TMRL (Template-based multiple reference line intra prediction).

Referring to FIG. 30, a video signal processing device can configure a reference template using a reference pixel line adjacent to the current block. The video signal processing device may generate a prediction sample for the position of the reference template using

reference pixel lines

1, 2, 3... (

Reference lines

1, 2, 3...). The video signal processing device may calculate the cost between the generated prediction sample and the samples of the reference template. At this time, the cost can be calculated through methods such as SAD (Sum of Absolute Differences) or MRSAD (Mean-Removed SAD). The reference pixel corresponding to the minimum cost may be the optimal reference pixel. Additionally, the encoder may rearrange the calculated costs in ascending order, construct a list of reference pixel lines, and then generate and signal a bitstream containing information about the index of the optimal reference pixel line. The decoder can construct a list of reference pixel lines through the above-described method, parse the index for the optimal reference pixel line included in the bitstream, and generate a prediction sample using the reference pixel line indicated by the index. .

In order to reduce complexity in determining the optimal reference pixel line, the optimal reference pixel line is not used for all intra prediction modes, but the optimal reference pixel line can be used only for the intra prediction modes included in the MPM list. That is, the encoder constructs a list of reference pixel lines by combining intra prediction modes included in the MPM list and a plurality of reference pixel lines, and predicts samples generated using each candidate in the list for reference pixel lines and The cost between standard templates can be calculated. Additionally, the encoder can rearrange the list in ascending order based on cost and then reconstruct the list using only a few combinations with low costs. The encoder may generate and signal a bitstream containing information about the index for the optimal combination information among the combination information (arbitrary intra prediction mode and arbitrary reference pixel lines) in the reconstructed list. The decoder can construct a list of identical reference pixel lines through the above-described method, then parse the index for the optimal combination information included in the bitstream and generate a prediction sample using the optimal combination information indicated by the index. .

When prediction samples are generated using multiple reference pixel lines, different weights may be applied to prediction samples generated through each reference pixel line. At this time, a template-based method can be used to derive optimal weights. That is, the encoder uses a combination of intra prediction modes included in the MPM list, a plurality of reference pixel lines, and a weight for the samples predicted by each reference pixel line (e.g., one of 3:1 and 2:2). A list of reference pixel lines can be constructed, and the cost between the generated prediction sample and the reference template can be calculated using each candidate in the list of reference pixel lines. Then, the video signal processing device can calculate the MPM list based on the cost. After reordering in ascending order, the MPM list can be reconstructed using only a few low-cost combinations. The encoder combines information in the reconstructed list (any intra prediction mode, any reference pixel lines, weights for the samples predicted by each reference pixel line (e.g., one of 3:1, 2:2)) A bitstream containing information about the index of the optimal combination information can be generated and signalled. The decoder constructs a list of the same reference pixel lines through the above-described method and then inputs the optimal combination information included in the bitstream. By parsing the index, a prediction sample can be created using the optimal combination information indicated by the index.

In the method of determining a reference pixel line based on a template described with reference to FIG. 30, a reference pixel line adjacent to the current block may be used to configure the template. Accordingly, the reference pixel line determined based on the template may be determined using reference pixel lines that are not adjacent to the current block. Below, a method in which reference pixel lines adjacent to the current block can also be used to determine the reference pixel line based on the template will be described.

Once the reference template is configured as shown in FIG. 31, the reference pixel line adjacent to the current block can also be used to determine the reference pixel line based on the template. Referring to FIG. 31(a), a reference template can be constructed that includes only reference pixels adjacent to the left side of the current block. The encoder generates a prediction sample for the reference template using at least one of Reference pixel line 0 on the upper side of the current block and/or Reference pixel line 1 on the left side of the current block. , the cost between the reference template and the predicted sample can be calculated. The encoder can use

reference pixel lines

1, 2, 3 ... (

Reference lines

1, 2, 3 ...) to generate prediction samples for the reference template and calculate the cost between the reference template and the prediction sample. Figure 31(b) shows a case where the reference template includes only reference pixels adjacent to the top of the current block. Likewise, the encoder uses at least one of Reference pixel line 0 on the left of the current block and/or Reference pixel line 1 on the top of the current block to generate a prediction sample for the reference template. You can generate and calculate the cost between the reference template and the predicted sample. The encoder can use

reference pixel lines

1, 2, 3 ... (

Reference lines

1, 2, 3 ...) to generate prediction samples for the reference template and calculate the cost between the reference template and the prediction sample.

The encoder may rearrange the calculated costs in ascending order to construct a list of reference pixel lines, then generate and signal a bitstream containing information about the index of the optimal reference pixel line. The decoder constructs a list of reference pixel lines through the above-described method and then generates a prediction sample using the optimal reference pixel line determined by parsing information about the index of the optimal reference pixel line included in the bitstream. can do.

Referring to FIG. 32, the encoder may receive a plurality of reference pixel lines and perform intra prediction to generate prediction blocks for the template. Different prediction blocks may be generated depending on which reference pixel line is used. The encoder can perform a weighted average according to various weight information input to each of the prediction blocks to finally generate a prediction block for the template. A plurality of prediction blocks may be generated depending on which reference pixel lines are used and what weights are used. After calculating the cost between each prediction block and the reference template, the encoder can rearrange them in ascending order based on the cost corresponding to each prediction block and construct a separate list using only the top few candidates of a predetermined number. At this time, the predetermined number may be an integer of 2 or more, or may be 10. The encoder can generate a prediction block for the current block through the combination information used to generate the prediction block in a separate list. In addition, the encoder can select the optimal candidate from the list in terms of picture quality and bit quantity, then generate and signal a bitstream containing information about the index of the optimal candidate. The decoder constructs the same separate list through the above-described method, and uses the optimal combination information indicated by the index of the optimal candidate determined by parsing the information about the index of the optimal candidate included in the bitstream to predict the sample. can be created.

When the current block is encoded in intra prediction mode, the encoded intra prediction mode may be any one of angular mode, planar mode, DC mode, and MIP mode. Prediction according to the angle mode may be prediction performed according to 65 angles, and prediction according to MIP mode may be prediction performed based on a predefined matrix. Angle mode can be effective in blocks where features such as edges exist within the current block. However, if the current block has gentle characteristics, discontinuous edges at the boundaries between blocks may be generated in the block predicted using the angle mode, or visible outlines may be generated inside the block. This may be a factor in reducing coding efficiency. Additionally, DC mode may have the disadvantage of generating visible edges at boundaries between blocks at low bit rates. Plane mode can generate prediction blocks without discontinuities by improving edge problems caused by angle mode and DC mode.

Referring to FIG. 33, according to the planar mode, the video signal processing device may generate a linearly predicted value in the vertical direction and a linearly predicted value in the horizontal direction to generate a prediction sample within the current block. The video signal processing device may generate a prediction sample (value) within the current block by performing a weighted average of the linearly predicted value in the vertical direction and the linearly predicted value in the horizontal direction.

The linearly predicted value in the vertical direction (predV (x, y)) can be generated based on Equation 4, and the linearly predicted value in the horizontal direction (predH (x, y)) can be generated based on Equation 5. And, a new prediction value (pred (x, y)) can be generated based on Equation 6. In Equations 4 to 6, W is the horizontal size (width) of the current block, and H is the horizontal size (width) of the current block. It can be the vertical size (height). rec(x, y) can mean the pixel value at (x, y) coordinates. Predicted values (predV(x, y), predH(x, y), pred(x, y)) may mean the predicted pixel value at (x, y) coordinates.

A video signal processing device can only use linear prediction in the vertical direction when performing prediction related to the current block according to the planar mode. Alternatively, the video signal processing device may use only linear prediction in the horizontal direction when performing prediction related to the current block according to the planar mode. Therefore, the planar mode can be divided into three modes. That is, in addition to the conventional method of weighting prediction blocks generated using linear prediction in the vertical and horizontal directions, a vertical plane mode that uses only linear prediction in the vertical direction, and a vertical plane mode that uses only linear prediction in the horizontal direction. It can be divided into horizontal plane mode. The encoder can generate and signal a bitstream containing information about which prediction mode among the three planar modes the current block used. The decoder may generate a prediction block for the current block based on the planar mode determined by parsing information about which prediction mode was used included in the bitstream.

An explicit method of signaling by including information about which plane mode was used in the bitstream may have the problem of increasing the bit amount. In order to save the amount of bits, the decoder determines the size of the current block, the horizontal or vertical size of the current block, the ratio of the horizontal and vertical sizes of the current block, the number of pixels (pixels) in the current block, and whether the current block is a luminance signal or a chrominance signal. Recognition, which plane mode was used using at least one of the intra prediction direction mode information of neighboring blocks adjacent to the current block, MPM list information for the current block, and intra prediction direction mode information derived from DIMD or TIMD. It can be induced implicitly.

Vertical plane mode and horizontal plane mode can be more effective when the shape of the current block is rectangular rather than square. Accordingly, the vertical plane mode and the horizontal plane mode can be applied (activated) only when the horizontal and vertical sizes of the current block are different from each other. In other words, the encoder compares the horizontal and vertical sizes of the current block and then selects a plane mode, whether the vertical plane mode, the horizontal plane mode, or the conventional plane mode is applied only if the sizes are different. Signaling can be performed by generating a bitstream containing information. The decoder can compare the horizontal and vertical sizes of the current block and parse the planar mode selection information only if the sizes are different. Meanwhile, when the horizontal and vertical sizes of the current block are the same, the conventional planar mode can be applied.

For convenience of explanation in this specification, information about which prediction mode among three plane modes the current block uses can be described as plane mode selection information.

Plane mode selection information may be signaled for each coding unit. However, if the planar mode selection information is signaled in response to all coding units, the bit amount may increase, so the encoder can generate and signal a bitstream including the planar mode selection information only when a specific condition is satisfied. If a specific condition is satisfied, the decoder can determine the intra prediction directionality mode for the current block by parsing the plane mode selection information and generate a prediction block for the current block using the determined intra prediction directionality mode. At this time, the specific conditions are the horizontal and vertical size of the current block, the ratio of the horizontal and vertical size of the current block, and the intra prediction direction mode of the current block is a specific mode (e.g., planar mode, DC mode, vertical direction mode, horizontal direction mode), whether the coding mode of the current block is DIMD, TIMD, IntraTMP, IBC, ISP, MIP coding mode, and conditions related to the index information of the reference pixel line used when generating the prediction block for the current block. You can. Whether to encode or decode the plane selection information may be determined depending on whether at least one of the specific conditions is satisfied. Specifically, the specific conditions are 1) if the intra prediction directionality mode of the current block is a conventional planar mode (i.e., if the index of the intra prediction directionality mode is 0), 2) the horizontal and vertical sizes of the current block are equal to the maximum transformation block size. is equal to or smaller than, and the product of the horizontal and vertical sizes of the current block is larger than the product of the minimum conversion block size and the minimum conversion block size. 3) The horizontal and vertical sizes of the current block may be different. At this time, the size of the minimum transform block may be an integer, such as 4 or 8, and the size of the maximum transform block may be an integer, such as 64, 128, 256, etc. When at least one of the above-mentioned specific conditions 1) to 3) is satisfied, the encoder can generate and signal a bitstream including planar mode selection information, and the decoder parses the planar mode selection information and stores it in the current block. The intra prediction directional mode can be determined.

If the current block is a chrominance component block, the vertical plane mode and the horizontal plane mode are not applied, and only the conventional plane mode can be applied. That is, when the current block is a chrominance component block, the video signal processing device can generate a prediction block using the conventional planar mode. Accordingly, if the current block is a chrominance component block, the video signal processing device may not signal or parse the planar mode selection information. Meanwhile, even if the current block is a chrominance component block, the vertical plane mode or the horizontal plane mode can be applied in the same way as the luminance component block. For example, when the vertical plane mode is applied to the current block, the decoder uses the vertical plane mode to match the luminance block and chrominance block of the current block to the prediction block for the luminance block and chrominance block of the current block. can be created.

If the horizontal size of the current block is larger than the vertical size, the number of pixels adjacent to the top is greater than the number of pixels adjacent to the left of the current block, so the decoder uses the vertical plane mode to create a prediction block for the current block. can be created. If the vertical size of the current block is larger than the horizontal size, the number of pixels adjacent to the left is greater than the number of pixels adjacent to the top of the current block, so the decoder uses the horizontal plane mode to create a prediction block for the current block. can be created. If the horizontal and vertical sizes of the current block are the same, the decoder can generate a prediction block for the current block using the conventional planar mode. Since the planar mode is implicitly determined according to the horizontal and vertical sizes of the current block, the encoder does not need to generate a bitstream including planar mode selection information. The decoder can generate a prediction block of the current block according to the plane mode determined by the horizontal and vertical sizes of the current block.

Vertical plane mode and horizontal plane mode may not be applied in ISP mode. If the current block is encoded in ISP mode and the current block is encoded in planar mode, the encoder may not signal planar mode selection information (that is, the bitstream may not include planar mode selection information). If the current block is encoded in ISP mode and the current block is encoded in flat mode, the decoder may not parse the planar mode selection information. Conversely, vertical plane mode and horizontal plane mode can be applied in ISP mode. If the current block is encoded in ISP mode and the current block is encoded in planar mode, the encoder may generate and signal a bitstream including planar mode selection information. If the current block is encoded in ISP mode and the current block is encoded in plane mode, the decoder may generate a prediction block for the current block using the plane mode determined by parsing the plane mode selection information.

The video signal processing device can replace the horizontal mode (angle mode 18 in FIG. 6) with the horizontal plane mode among the conventional intra prediction directional modes. Additionally, the video signal processing device can replace the vertical mode (angle mode 50 in FIG. 6) among the conventional intra prediction directional modes with the vertical plane mode. In other words, the video signal processing device uses a horizontal plane mode and a vertical plane mode instead of the horizontal direction mode (angle mode 18 in FIG. 6) and the vertical direction mode (angle mode 50 in FIG. 6) among the conventional intra prediction directionality modes. You can use it. At this time, the video signal processing device selects the horizontal direction mode (angle mode 18 in FIG. 6) and the vertical direction mode (angle mode 50 in FIG. 6) among the conventional intra prediction directionality modes only when the horizontal and vertical sizes of the current block are different. angle mode) can be replaced with horizontal plane mode and vertical plane mode.

Vertical plane mode and horizontal plane mode can be effective for small blocks. Accordingly, the video signal processing device can apply the vertical plane mode and the horizontal plane mode only when the size of the current block is smaller than an arbitrary size. At this time, the arbitrary size may be 16 or 32 in width or height. That is, the video signal processing device can apply the vertical plane mode and the horizontal plane mode to the current block when the horizontal or vertical size of the current block is less than or equal to 32. Alternatively, the video signal processing device may not apply the vertical plane mode and the horizontal plane mode to the current block when either the horizontal or vertical size of the current block is greater than 32. For example, if either the horizontal or vertical size of the current block is larger than 32, the encoder may not include planar mode selection information in the bitstream. If the encoding mode of the current block is planar mode and either the horizontal or vertical size of the current block is greater than 32, the decoder does not parse the planar mode selection information and performs prediction of the current block using the conventional planar mode. can do.

DIMD and TIMD modes are modes that generate a prediction block by weight-averaging blocks predicted from various intra prediction modes. At this time, blocks predicted using planar mode can be used for DIMD and TIMD modes. When a video signal processing device generates a predicted block using the plane mode used in DIMD and TIMD modes, it can use any one of a vertical plane mode, a horizontal plane mode, and a conventional plane mode. Planar mode selection information may be signaled by being included in a bitstream. The decoder can parse the planar mode selection information and determine which of the three planar modes to use. For example, if the planar mode selection information indicates a vertical plane mode, the prediction block used in DIMD and TIMD modes may be a block predicted using the vertical plane mode.

MIP mode can be an effective mode for complex areas. To improve the encoding performance of the MIP mode, a multi-prediction based MIP mode can be used. In other words, the multi-prediction-based MIP mode is a method of generating the final prediction block for the current block by weighting the prediction block generated by the MIP method and the prediction block generated based on the intra-prediction directional mode. The encoder can generate and signal a bitstream that includes both encoding information for the MIP mode and encoding information for the intra prediction directional mode. The decoder parses both the encoding information for the MIP mode and the encoding information for the intra prediction directional mode, generates a prediction block to which the MIP mode is applied and a prediction block to which the intra prediction directional mode is applied, and then performs a weighted average of the two prediction blocks. The final prediction block for the current block can be generated.

MIP mode can be performed adaptively on a block basis. The encoder may generate and signal a bitstream containing information about whether the multi-prediction based MIP mode is used. The decoder can parse information about whether the multi-prediction-based MIP mode is used to determine whether the multi-prediction-based MIP mode or the single-prediction-based MIP mode is used when generating a prediction block for the current block. Single prediction-based MIP mode is a method of generating a prediction block using only the MIP mode.

In order to reduce complexity and signal encoded information, if the intra prediction mode of the current block is MIP, the encoder may additionally include in the bitstream information about whether to use the multi-prediction based MIP mode and signal it. there is. If a multi-prediction-based MIP mode is used in the current block, the encoder can additionally include information about the intra-prediction directional mode in the bitstream and signal it. If the intra-prediction mode of the current block is MIP, the decoder additionally parses information about whether the multi-prediction-based MIP mode is used, and if the multi-prediction-based MIP mode is used, the decoder additionally parses information about the intra-prediction directional mode. You can.

If the intra prediction directional mode is additionally signaled, the bit quantity may increase and compression efficiency may be reduced. If a multi-prediction based MIP mode is used in the current block, only a certain intra prediction directional mode can be used. At this time, any given intra prediction directionality mode may be one of the MPM list, and the encoder may signal by including index information about the intra prediction directionality mode to be used in the multi-prediction-based MIP mode among the MPM list in the bitstream. If the multi-prediction-based MIP mode is used in the current block, the decoder can parse the index information and determine the intra-prediction directional mode to be used in the multi-prediction-based MIP mode from the MPM list.

When a multi-prediction based MIP mode is used in the current block, a vertical plane mode, a horizontal plane mode, and a conventional plane mode can be used to generate a prediction block based on the intra prediction directional mode. That is, the encoder can generate and signal a bitstream including planar mode selection information. If the multi-prediction-based MIP mode is used in the current block, the decoder can parse the planar mode selection information to determine the intra-prediction directional mode to be used in the multi-prediction-based MIP mode.

If a multi-prediction based MIP mode is used in the current block, the video signal processing device can determine which of the three planar modes to use using the intra-prediction directional mode of the DIMD derived from surrounding pixels of the current block. For example, when a multi-prediction based MIP mode is used in the current block, the video signal processing device generates a prediction block based on the intra-prediction directional mode if the DIMD (or TIMD) mode of the current block is less than 34. To do this, a horizontal plane mode can be applied. Alternatively, if the DIMD (or TIMD) mode of the current block is equal to or greater than 34, the vertical plane mode may be applied to the current block to generate a prediction block based on the intra prediction directional mode.

If a planar mode is used in the current block, the video signal processing device generates a prediction block for each mode using the vertical plane mode and the horizontal plane mode, and applies a weight to each prediction block to create the final prediction block. can be created. At this time, the weight for each prediction block may be the same or different depending on the intra prediction directional mode of the DIMD derived from the surrounding pixels of the current block. If the intra prediction directional mode of the DIMD derived from the surrounding pixels of the current block is greater than or equal to a random value, the largest weight may be applied to the block predicted using the vertical plane mode. For example, the weight for a block predicted using a vertical plane mode may be 3, and the weight for a block predicted using a horizontal plane mode may be 1. If the intra-prediction directional mode of the DIMD derived from the surrounding pixels of the current block is smaller than a random value, the greatest weight may be applied to the block predicted using the horizontal plane mode. For example, the weight for a block predicted using the horizontal plane mode may be 3, and the weight for a block predicted using the vertical plane mode may be 1. The arbitrary value is an integer, such as 34 (angle mode 34 in FIG. 6). DIMD information derived from neighboring pixels of the current block may include the first intra prediction directionality mode, the second intra prediction directionality mode, and information on whether to perform weight prediction. If weight prediction is not performed among the DIMD information of neighboring pixels of the current block, the video signal processing device may generate a prediction block by applying the same weight.

When the CIIP mode is used to generate a prediction block of the current block, the prediction block of the current block is weighted based on the prediction block in which intra prediction mode is used (intra prediction block) and the prediction block in which inter prediction mode is used (inter prediction block). It can be generated by averaging. At this time, when the intra prediction directional mode used to generate an intra prediction block in CIIP mode is a planar mode, the encoder can generate and signal a bitstream including planar mode selection information. If CIIP mode is used and planar mode is used, the decoder may parse the planar mode selection information to determine the intra prediction mode used to generate the intra prediction block.

If the current block is predicted using the intra prediction mode, the continuity of pixel values may be broken at the boundary between the current block and neighboring blocks. To resolve this discontinuity, PDPC (Position dependent intra prediction combination) filtering may be applied to the generated prediction block. When the vertical plane mode is used to generate the prediction block of the current block, PDPC filtering based on the plane mode is not performed, but PDPC filtering based on the horizontal angle mode (e.g., angle mode 18 in FIG. 6) is performed. It can be. Alternatively, when the vertical plane mode is used to generate the prediction block of the current block, PDPC filtering based on the plane mode is not performed, and PDPC filtering is performed based on the vertical angle mode (e.g., angle mode 50 in FIG. 6). This can be done. In addition, when the horizontal plane mode is used to generate the prediction block of the current block, PDPC filtering based on the plane mode is not performed, but PDPC filtering is performed based on the vertical angle mode (e.g., angle mode 50 in FIG. 6). This can be done. Additionally, when the horizontal plane mode is used to generate the prediction block of the current block, PDPC filtering based on the plane mode is not performed, but PDPC filtering is performed based on the horizontal angle mode (e.g., angle mode No. 18 in FIG. 6). This can be done. In this specification, the meaning that the current block is predicted may be the same as the meaning that the prediction block of the current block is generated.

Referring to FIG. 33 and Equation 4, when the vertical plane mode is used to generate a prediction block of the current block, the pixel value of rec(-1, H), which is a fixed position, changes flexibly according to the x-axis coordinate. A prediction block of the current block may be generated based on the pixel value of rec(x, -1). Likewise, referring to FIG. 33 and Equation 5, when the horizontal plane mode is used to generate a prediction block of the current block, the pixel value of rec(W, -1), which is a fixed position, and the y-axis coordinate are flexible. A prediction block of the current block can be generated based on the pixel value of rec(-1, y) that changes to . In other words, the prediction block can be generated so that the continuity of pixel values is maintained at the boundary between neighboring blocks, so if the prediction block of the current block is generated using the vertical plane mode or the horizontal plane mode, PDPC will not be performed. You can.

PDPC filtering for the planar mode described above can be applied even when the current block is encoded in CIIP mode.

When the video signal processing device constructs the MPM list for the current block, if the neighboring block adjacent to the current block is predicted in the vertical plane mode or the horizontal plane mode, the video signal processing device selects the intra prediction directional mode of the neighboring block. You can set it to planar mode and include it in the MPM list. Or, when the video signal processing device constructs the MPM list for the current block, if neighboring blocks adjacent to the current block are predicted in vertical plane mode or horizontal plane mode, set the intra prediction directionality mode of the neighboring block to DC mode. It can be included in the MPM list. Alternatively, when the video signal processing device constructs the MPM list for the current block, if the neighboring block adjacent to the current block is predicted in the horizontal plane mode, the intra prediction direction mode of the neighboring block is set to the horizontal angle mode (No. 18 in FIG. 6 You can set it to angle mode and include it in the MPM list. Alternatively, when the video signal processing device constructs the MPM list for the current block, if the neighboring block adjacent to the current block is predicted in the vertical plane mode, the intra prediction direction mode of the neighboring block is set to the vertical angle mode (50 in FIG. 6 It can be included in the MPM list by setting it to angle mode). Conversely, since the horizontal plane mode may have vertical characteristics, when the video signal processing device constructs the MPM list for the current block, if the surrounding blocks adjacent to the current block are predicted to be in the horizontal plane mode, the video signal processing device can be included in the MPM list by setting the intra prediction direction mode of the neighboring block to vertical angle mode (angle mode 50 in FIG. 6). Alternatively, when the video signal processing device constructs the MPM list for the current block, if the neighboring block adjacent to the current block is predicted in the vertical direction plane mode, the video signal processing device changes the intra prediction directionality mode of the neighboring block to the horizontal angle mode ( It can be set to angle mode 18 in FIG. 6 and included in the MPM list.

The vertical plane mode and horizontal plane mode proposed in this specification may not be signaled as any specific mode, but may be included and signaled as any one of the intra prediction directional modes. Referring to FIG. 6, there are a total of 67 intra prediction directional modes, of which 0 (planar mode) and 1 (DC mode) are non-directional modes, and 2 to 66 are directional modes (angle mode). Intra prediction directional modes can be expanded to include newly defined vertical plane modes and horizontal plane modes. For example, in the existing intra prediction directional mode, mode 2 can be set to the vertical plane mode and mode 3 can be set to the horizontal plane mode. That is, numbers 0 (flat mode), 1 (DC mode), 2 (vertical flat mode), and 3 (horizontal flat mode) are non-directional modes, and numbers 4 to 68 are the existing directional modes from 2 to 66. It can be set the same as the mode.

When the vertical plane mode is used to generate the prediction block of the current block, the characteristics of the error (residual) signal (block) may be similar to the characteristics of the error signal in the vertical angle mode (angle mode 50 in FIG. 6). . The multiple transform set (MTS) applied to the error signal may vary based on the intra prediction directional mode of the current block. That is, when the prediction block of the current block is generated using the vertical plane mode, the encoder does not use the set of transformation matrices of the plane mode for the error signal, but the set of transformation matrices of the vertical angle mode (angle mode 50 in FIG. 6). A set of transformation matrices can be used to perform a first-order transformation process based on multiple transformation sets. Alternatively, the encoder may perform the primary transformation using a predetermined transform set regardless of the multiple transform sets. Additionally, the encoder can perform secondary transformation through LFNST on the primary transformed transform coefficients using multiple transform sets or a predetermined transform set. The transformation matrix used when performing LFNST may vary depending on the intra prediction directionality mode. That is, when the vertical plane mode is used to generate the prediction block of the current block, the encoder does not use the set of second-order LFNST transformation matrices for the plane mode for the error signal, but uses the vertical angle mode (number 50 in Figure 6). The transformation process can be performed using a set of second-order LFNST transformation matrices (in angular mode). When the horizontal plane mode is used to generate the prediction block of the current block, the encoder uses the horizontal angle mode (angle mode 18 in Figure 6) rather than the planar mode to generate the first or second transformation matrix (or set of matrices). , set of matrices, set of kernels) can be derived. When the vertical plane mode is used to generate the prediction block of the current block, the decoder does not use the set of first-order LFNST transform matrices for the planar mode for the error signal, but rather uses the vertical angle mode (angle mode 50 in Figure 6 ) The transformation process can be performed using a set of first-order LFNST transformation matrices. Additionally, when the vertical plane mode is used to generate the prediction block of the current block, the decoder does not use the set of transformation matrices of the plane mode for the error signal, but rather the set of transformation matrices of the vertical angle mode (angle mode 50 in Figure 6). A set of transformation matrices can be used to perform a secondary transformation process based on multiple transformation sets. In this specification, the encoder may perform secondary transformation after performing primary transformation in the encoding process, which corresponds to the primary transformation and secondary transformation in the decoding process of the decoder, respectively. In other words, the primary transformation performed by the encoder corresponds to the secondary transformation performed by the decoder (the inverse transformation of the primary transformation performed by the encoder), and the secondary transformation performed by the encoder corresponds to the primary transformation performed by the decoder (the reverse transformation of the primary transformation performed by the encoder). It corresponds to the inverse transformation of the secondary transformation performed.

Referring to Figure 33, Equation 4, when the vertical plane mode is used to generate the prediction block of the current block, the prediction block is generated according to the pixel value of rec(-1, H), which is a fixed position, and the x-axis coordinate. It can be generated based on the dynamically changing pixel value of rec(x, -1). That is, when the vertical plane mode is used, a change in pixel value occurs along the x-axis, so the characteristics of the error signal may be similar to those of the horizontal angle mode (angle mode 50 in FIG. 6). The set of multiple transforms applied to the error signal may vary based on the intra prediction directional mode of the current block. That is, when the vertical plane mode is used to generate the prediction block of the current block, the encoder does not use the set of transformation matrices of the plane mode for the error signal, but the set of transformation matrices of the horizontal angle mode (angle mode No. 18 in Figure 6). A set of transformation matrices can be used to perform a first-order transformation process based on multiple transformation sets. Alternatively, the encoder may perform the primary transformation using a predetermined transform set regardless of the multiple transform sets. Additionally, the encoder can perform secondary transformation through LFNST on the primary transformed transform coefficients using multiple transform sets or a predetermined transform set. The transformation matrix used when performing LFNST may vary depending on the intra prediction directionality mode. That is, when the prediction block of the current block is generated using the vertical plane mode, the encoder does not use the set of second-order LFNST transformation matrices for the planar mode for the error signal, but uses the horizontal angle mode (number 18 in Figure 6). The transformation process can be performed using a set of second-order LFNST transformation matrices (in angular mode). When the horizontal plane mode is used to generate the prediction block of the current block, the encoder uses the vertical angle mode (e.g., angle mode 50 in Figure 6) rather than the planar mode to generate the first or second transformation matrix ( Alternatively, a set of matrices, a set of matrices, and a kernel set) can be derived. When the vertical plane mode is used to generate the prediction block of the current block, the decoder does not use the set of first-order LFNST transform matrices for the planar mode for the error signal, but uses the horizontal angle mode (angle mode 18 in Figure 6 ) The transformation process can be performed using a set of first-order LFNST transformation matrices. In addition, when the vertical plane mode is used to generate the prediction block of the current block, the decoder does not use the set of transformation matrices of the plane mode for the error signal, but the set of transformation matrices of the horizontal angle mode (angle mode No. 18 in Figure 6). A set of transformation matrices can be used to perform a secondary transformation process based on multiple transformation sets. If the horizontal plane mode is used to generate the prediction mode for the current block, the encoder uses the vertical angle mode (angle mode 50 in Figure 6) rather than the planar mode to generate the first or second order transformation matrix (or set of matrices). , set of matrices, set of kernels) can be derived. When the horizontal plane mode is used to generate the prediction block of the current block, the decoder does not use the set of first-order LFNST transform matrices for the planar mode for the error signal, but rather uses the vertical angle mode (angle mode 50 in Figure 6 The transformation process can be performed using a set of first-order LFNST transformation matrices for ). Additionally, when the horizontal plane mode is used to generate the prediction block of the current block, the decoder does not use the set of transformation matrices of the plane mode for the error signal, but rather the set of transformation matrices of the vertical angle mode (angle mode 50 in Figure 6). A set of transformation matrices can be used to perform a secondary transformation process based on multiple transformation sets.

When a vertical plane mode, a horizontal plane mode, and a conventional plane mode are used to generate a prediction block of the current block, a method of selecting a multi-transform set available for the current block in a video signal processing device will be described.

1) First, the video signal processing device can derive nSzIdxW and nSzIdxH values based on the size of the current block in order to map the horizontal and vertical sizes of the current block into one variable. nSzIdxW calculates the logarithm of 2 to the width of the current block, discarding decimal places, and can be the minimum value between 2 and 3. nSzIdxH can be the minimum value between 2 and 3 after calculating the logarithm of 2 to the height of the current block, discarding the decimal places.

2) Next, the video signal processing device can derive the intra-directional mode (predMode) of the current block. In TIMD mode, intra prediction modes expanded from the existing 67 to 131 can be used, reducing precision to the existing 67 modes.

3) Next, the video signal processing device can derive the ucMode, nMdIdx, and isTrTransposed values.

A. If the current block is encoded in MIP mode, ucMode can be set to '0', nMdIdx can be set to '35', and isTrTransposed can be set to a value derived from MIP.

B. If the current block is not encoded in MIP mode, ucMode can be set to the intra-directional mode (predMode) of the current block. predMode may mean the index value of intra directional mode. predMode can be determined through extended angle mode according to the width-to-height size ratio of the current block. The video signal processing device can clip predMode to a value between 2 and 66. If the prediction block of the current block is generated using the vertical plane mode, the video signal processing device may reset predMode to the horizontal prediction mode (angle mode no. 18 in FIG. 6). If the prediction block of the current block is generated using the horizontal plane mode, the video signal processing device may reset predMode to the vertical prediction mode (angle mode No. 50 in FIG. 6). If predMode is greater than angle mode 34, which is diagonal mode, the isTrTransposed value can be set to 1, and if predMode is equal to or less than 34, the isTrTransposed value can be set to 0. If predMode is greater than 34, the video signal processing device resets the value of predMode to a value subtracted from 67 (the maximum value of the intra directional mode index) plus 1. For example, if predMode is 35, the video signal processing device can reset angle mode 35 to angle mode 33 (67+1-35(predMode)). If predMode is 66, the video signal processing device can reset angle mode 66 to angle mode 2 (67+1-66(predMode)). That is, by making it symmetrical based on angle mode 34, which is the diagonal mode. , which has the effect of reducing the size of the conversion mapping table in Figure 35 by about half.

4) The video signal processing device can derive the nSzIdx value through nSzIdxW, nSzIdxH, and isTrTransposed values. If the isTrTransposed value is '1', the value obtained by multiplying nSzIdxH by 4 and adding nSzIdxW may be set as nSzIdx. If the isTrTransposed value is '0', the value obtained by multiplying nSzIdxW by 4 and adding nSzIdxH may be set as nSzIdx.

5) The video signal processing device can use nSzIdx, which is the size information of the current block, and nMdIdx, which is the intra directionality mode information of the current block, to derive nTrSet, which is an index of the set of available transformation types according to the predefined table in FIG. 35. . Figure 35 defines the index of the transformation type set according to the intra-screen orientation mode (0 to 34 and MIP) of the current block and the size index (0 to 15) of the current block. Referring to FIG. 35, nTrSet can be 80 types. If the size of the current block is 4x8 and the orientation mode within the screen of the current block is 13, nTrSet can be '7'.

6) The video signal processing device can parse mts_idx included in the bitstream and derive a transformation type set corresponding to nTrSet from the table in FIG. 36. The conversion type in the vertical and horizontal directions is set differently depending on whether predMode is a value greater than angle mode 34, which is diagonal mode. 36, numbers 0 to 79 of the gray-shaded vertical column may correspond to nTrSet, and numbers 0 to 3 of the gray-shaded horizontal column may correspond to mts_idx. Referring to FIG. 36, if nTrSet is 7 and the value of mts_idx is 3, 22 can be selected from (2, 17, 18, 22). Then, DST1 and DCT5 corresponding to index 22 of the transformation type combination table of FIG. 37 are selected, and the vertical transformation type of the current block can be set to DST1 and the horizontal transformation type to DCT5. 0 to 24 in the gray-shaded horizontal column of FIG. 37 are indices selected through FIG. 36, and 0 to 1 in the gray-shaded vertical column may mean the vertical direction conversion type and the horizontal direction conversion type, respectively. If the intra-screen prediction directionality mode of the current block is greater than 34, which is the diagonal mode, the vertical and horizontal transformation types are exchanged.

If the mts_idx value is '3' and both the horizontal and vertical sizes of the current block are 16 or less, the vertical or horizontal transformation type can be reset to the IDT transformation type through a process described later.

If the absolute value difference between the index of the intra-screen prediction direction mode of the current block and 18, which is the index of the horizontal direction mode, is less than a certain value, the vertical direction transformation type can be reset to the IDT transformation type. If the absolute value difference between the index of the intra-screen prediction direction mode of the current block and 50, which is the index of the horizontal direction mode, is less than a certain value, the horizontal direction transformation type can be reset to the IDT transformation type. At this time, the arbitrary value is an integer and can be determined based on the horizontal or vertical size of the current block. For example, any given value can be determined through the table in FIG. 38. The table in Figure 38(a) shows a case where the threshold value is set differently every time the horizontal or vertical size differs by 4, and the table in Figure 38(b) shows a case where the threshold value is set each time the horizontal or vertical size differs by 2. Indicates a case where the value is set differently. If the current block size is 16x16, the vertical transformation type is not reset to the IDT transformation type, and the existing transformation type can be maintained as is.

Specifically, Figure 39 shows that when the current block is predicted using a vertical plane mode, a horizontal plane mode, or a conventional plane mode, the video signal processing device uses a first or second transformation matrix (a set of matrices, a set of matrices) , shows a flowchart of how to derive the kernel set).

Referring to FIG. 39, the video signal processing device can receive an intra prediction directional mode and check whether the prediction mode of the current block is a planar mode. If the prediction mode is not a planar mode, the video signal processing device may derive a first-order or second-order transformation matrix (set of matrices, set of matrices, set of kernels) based on the input intra prediction directional mode. When the prediction mode is a planar mode, the video signal processing device can check whether the prediction mode is a vertical planar mode. When the prediction mode is the vertical plane mode, the video signal processing device creates a first-order or second-order transformation matrix (set of matrices, set of matrices, set of kernels) based on the horizontal angle mode (angle mode 18 in Figure 6). It can be induced. If the prediction mode is not the vertical plane mode, the video signal processing device may check whether the prediction mode is the horizontal plane mode. When the prediction mode is the horizontal plane mode, the video signal processing device creates a first-order or second-order transformation matrix (set of matrices, set of matrices, set of kernels) based on the vertical angle mode (angle mode 50 in Figure 6). It can be induced. If the prediction mode is not a horizontal plane mode, the video signal processing device may derive a first-order or second-order transformation matrix (set of matrices, set of matrices, set of kernels) based on the conventional planar mode.

When any one of the vertical plane mode, horizontal plane mode, and conventional plane mode is used to generate the prediction block of the current block, the video signal processing device uses the intra prediction directional mode derived using the DIMD method. Thus, a first-order or second-order transformation matrix (set of matrices, set of matrices, set of kernels) can be derived. In other words, when the prediction block of the current block is generated using any one of the vertical plane mode, horizontal plane mode, and conventional plane mode, the video signal processing device uses the intra prediction directionality derived using the DIMD method. A set of transformation matrices of modes can be used to perform a first-order transformation process based on multiple transformation sets. In addition, when the prediction block of the current block is generated by using any one of the vertical plane mode, the horizontal plane mode, and the conventional planar mode, the video signal processing device uses the intra prediction directional mode derived using the DIMD method. A second-order transformation process based on LFNST can be performed using a set of transformation matrices. DIMD information derived from neighboring pixels of the current block may include the first intra prediction directionality mode, the second intra prediction directionality mode, and information on whether to perform weight prediction. If weight prediction is not applied among the DIMD information from the surrounding pixels of the current block in the encoder and decoder, the first or second transformation matrix (or set of matrices, set of matrices, or kernel set) is used using the existing Planar prediction mode. It can be induced.

Since planar mode can be effective in smooth regions, to reduce complexity, the encoder can implicitly apply the DCT2 transformation method without performing a first-order transformation process based on a multiple transformation set. The decoder can apply the DCT2 transformation method if the current block is predicted in planar mode. If the current block is predicted to be in a conventional planar mode, the video signal processing device can implicitly apply the DCT2 transformation method without performing a primary transformation process based on a multiple transformation set. If the current block is predicted to be a vertical plane mode or a horizontal plane mode, the video signal processing device can apply a first-order transformation process based on a multiple transformation set. Meanwhile, if the current block is predicted to be a vertical plane mode or a horizontal plane mode, the video signal processing device may implicitly apply the DCT2 transformation method without applying the first transformation process based on the multiple transformation set. If the current block is predicted to be in a conventional planar mode, the video signal processing device can apply a first-order transform process based on a multiple transform set. Additionally, in the above-described method, if the size of the current block is an arbitrary size, the DST7 conversion method rather than the DCT2 conversion method can be applied. At this time, the arbitrary size may be the case where the horizontal or vertical size of the current block is equal to or greater than 4 or less than or equal to 16. For example, if the current block is predicted based on the vertical plane mode or the horizontal plane mode, and the horizontal size of the current block is 32 and the vertical size is 16, the video signal processing device requires horizontal direction conversion. DCT2 can be applied, and DST7 can be applied for vertical conversion.

Similar to planar mode, DC mode can also be divided into three prediction modes. Specifically, DC mode can be divided into modes based on the average between the upper pixels of the current block, the average between the left pixels of the current block, and the average between the upper and left pixels of the current block. If the horizontal size of the current block is larger than the vertical size, the video signal processing device may generate a prediction block of the current block using the average value between upper pixels of the current block. If the vertical size of the current block is larger than the horizontal size, the video signal processing device may generate a prediction block of the current block using the average value between the left pixels of the current block. If the horizontal and vertical sizes of the current block are the same, the video signal processing device can generate a prediction block of the current block using the average value between the upper and left pixels. The encoder can generate and signal a bitstream containing information (DC mode selection information) on which of the three DC modes the current block was predicted based on. The decoder may generate a prediction block for the current block based on the mode determined by parsing the DC mode selection information. If the prediction mode of the current block is DC mode, the encoder can generate and signal a bitstream including information about what mode it was encoded in. If the prediction mode of the current block is DC mode, the decoder can parse the DC mode selection information and determine the final DC mode for the current block. For example, if the current block is predicted to be in DC mode using the average between left pixels, the encoder signals the information that the current block is in DC mode and predicts that the current block is predicted to be in DC mode using the average between left pixels. Predicted information can be signaled using additional flag bits. If the current block is in DC mode, the decoder can determine which mode is applied to the current block by parsing additional flag bits.

The above three DC prediction modes may not be used in ISP mode. If the current block is encoded in ISP mode and the current block is encoded in DC mode, the encoder provides a flag indicating whether the current block is DC mode using the average between left pixels or DC mode using the average between top pixels. Bits may not be signaled. If the current block is in ISP mode and the current block is in DC mode, the decoder does not parse the flag bit indicating whether the current block is in DC mode using the average between left pixels or DC mode using the average between top pixels. It may not be possible.

Meanwhile, the three DC prediction modes can be used in ISP mode. If the current block is encoded in ISP mode and the current block is encoded in DC mode, the encoder can signal a flag bit indicating which of the three DC prediction modes the prediction mode of the current block is and include it in the bitstream. If the current block is ISP mode and the current block is DC mode, the decoder can set the final DC mode for the current block by parsing the flag bit.

Whether or not to apply the three DC prediction modes can be determined based on the color component of the current block. When the current block is a luminance component block, the above three DC prediction modes can be used. Alternatively, when the current block is a chrominance component block, among the three DC prediction modes, only the DC prediction mode that uses the average between the upper and left pixels can be used. Therefore, if the current block is a chrominance component block and DC mode is applied, the encoder may not include a flag bit indicating which of the three DC modes is used in the bitstream. If the current block is a chrominance component block and DC mode is applied, the decoder may not parse the flag bit indicating which of the three DC modes is used, and sets the DC mode for the current block to the upper and left pixels. It can be set to DC prediction mode, which uses the average between

When ISP mode is applied, the current block may be divided horizontally or vertically. Depending on the horizontal or vertical size of the current block, one block can be divided into 2 or 4 blocks. When the three planar modes or the three DC modes are used in a block encoded in ISP mode, the prediction mode can be determined according to the division type of the block. If the current block is encoded in ISP mode and the division type according to the ISP mode is horizontal division, the horizontal plane mode can be used among the three plane modes. If the current block is encoded in ISP mode and the division type according to the ISP mode is horizontal division, the vertical plane mode can be used among the three plane modes. If the current block is encoded in ISP mode and the division type according to the ISP mode is horizontal division, among the three DC modes, the mode that uses the average between left pixels can be used. If the current block is encoded in ISP mode and the division type according to the ISP mode is horizontal division, among the three DC modes, the mode that uses the average between upper pixels can be used.

The division type of the ISP mode may be determined based on the intra prediction directional mode derived from DIMD or the intra prediction directional mode derived from TIMD. If the DIMD (or TIMD) mode of the current block is less than 34, the ISP partition type of the current block may be horizontal partition. If the DIMD (or TIMD) mode of the current block is equal to or greater than 34, the ISP partition type of the current block may be vertical partition. Conversely, if the DIMD (or TIMD) mode of the current block is less than 34, the ISP division type of the current block may be vertical division. If the DIMD (or TIMD) mode of the current block is equal to or greater than 34, the ISP partition type of the current block may be horizontal partition. If the splitting type of the ISP mode is determined based on the intra prediction directional mode derived from DIMD (or TIMD), the splitting type is implicitly determined, so the encoder may not include information about the ISP splitting type in the bitstream. , the decoder may not parse information about the ISP partition type.

The video signal processing device can use all three DC modes to generate prediction blocks for each mode, and apply the same or different weights to each prediction block to generate the final prediction block through weight average. At this time, if the horizontal size of the current block is larger than the vertical size, the largest weight may be applied to the block predicted using the average between the upper pixels. Additionally, if the horizontal size of the current block is smaller than the vertical size, the largest weight may be applied to the block predicted using the average between left pixels. Alternatively, if the intra prediction direction mode of the DIMD derived from the surrounding pixels of the current block is greater than or equal to a random value, the largest weight may be applied to the block predicted using the average between the upper pixels. If the intra prediction directional mode of the DIMD derived from the surrounding pixels of the current block is smaller than a random value, the largest weight may be applied to the block predicted using the average between left pixels. Here, the arbitrary value can be an integer and can be 34, which is the index of the diagonal mode (angle mode 34 in FIG. 6). DIMD information derived from neighboring pixels of the current block may include the first intra prediction directionality mode, the second intra prediction directionality mode, and information on whether to perform weight prediction. If weight prediction is not applied among DIMD information derived from surrounding pixels of the current block, the largest weight may be applied to the block predicted using the average between all pixels on the upper and left sides.

DC mode is a prediction method that generates a prediction block for the current block using only one average value, and has the characteristic that all pixels in the prediction block have the same value. In order to increase prediction efficiency using DC mode in blocks with gradual changes, a video signal processing device can generate a prediction block so that the average value changes depending on the vertical or horizontal position within the block, and use this to predict DC based on the gradient. It is described as a mode. Referring to FIG. 40, when a prediction block for the current block is generated using DC mode in the vertical direction, the video signal processing device selects pixels upwardly adjacent to the current block (e.g., the hatched pixel in FIG. 40). After calculating the average value of the current block, a prediction sample can be generated by applying the average value to the pixels corresponding to the first horizontal line of the current block (for example, pixels indicated by 1 in FIG. 40). Next, the average value may be recalculated according to the degree of change between the reference pixel (A) located at the top of the pixels adjacent to the left of the current block and the reference pixel (B) of the next horizontal line. The video signal processing device may generate a prediction sample by applying the recalculated average value to the second horizontal line of the current block (for example, pixels indicated by 2 in FIG. 40). How to recalculate the average value for the second horizontal line is the distance between reference pixel (A) and reference pixel (B) (L), the distance between reference pixel (A) and reference pixel (E) (M), and ) and the pixel (E) at a position of (-1, -1) based on the lower left position of the current block. For example, the average value can be recalculated by adding L divided by M multiplied by X to the average value. The same is performed for the next horizontal line, so that a prediction sample for the current block can be generated. That is, the prediction block of the current block may be generated so that the average value for each horizontal (or vertical) line changes according to the gradient of reference pixels adjacent to the left (or top) of the current block. The pixel (pixel) values of each horizontal (or vertical) direction sample are configured to be the same, and the prediction block of the current block may be generated so that at least one vertical (or horizontal) pixel (pixel) value is different. Additionally, the line-level DC prediction mode can be divided into a line-level DC prediction mode in the vertical direction and a line-level DC prediction mode in the horizontal direction. If the horizontal size of the current block is larger than the vertical size, the number of pixels (pixels) adjacent to the upper side of the current block is large, so the prediction block for the current block is generated using the line-by-line DC prediction mode in the vertical direction. It can be. Alternatively, if the horizontal and vertical sizes of the current block are the same, a prediction block for the current block can be generated using the conventional DC prediction mode.

Referring to FIGS. 41(a) to 41(d), when the current block is predicted using the vertical plane mode, the position of the standard reference pixel used may vary for each horizontal line. Specifically, the first horizontal line of the current block (pixels marked 1 in Figure 41(a)) uses pixels adjacent to the upper side of the current block (e.g., hatched pixels in Figure 41) and the A pixel. It can be predicted. The second horizontal line of the current block (pixels marked 2 in Figure 41(a)) can be predicted using pixels adjacent to the top of the current block (e.g., hatched pixels in Figure 41) and the B pixel. there is. Additionally, the line-by-line planar prediction mode can be divided into a vertical planar mode, a horizontal planar mode, and a weighted average mode of prediction blocks in the vertical and horizontal directions. If the horizontal size of the current block is larger than the vertical size, the number of pixels (pixels) adjacent to the upper side of the current block is large, so the video signal processing device generates a prediction block for the current block using the vertical plane mode. can do. If the vertical size of the current block is larger than the horizontal size, the number of pixels (pixels) adjacent to the left of the current block is large, so the video signal processing device can generate a prediction block of the current block using the vertical direction plane mode. You can. If the horizontal and vertical sizes of the current block are the same, the video signal processing device can generate a prediction block for the current block using a conventional planar mode.

The video signal processing device may divide the current block into sub-blocks of an arbitrary specified size (MxN) and then perform prediction on a sub-block basis using a planar mode. Referring to FIGS. 42(a) to 42(c), the positions of standard reference pixels for prediction (1, 2, 3, and 4 in FIG. 42) may vary for each sub-block. Specifically, referring to FIG. 42(a), the current block can be divided into 4 sub-blocks of MxN size. The position of the standard reference pixel may be determined based on the upper left position of the current sub-block and/or the horizontal and vertical sizes of the current sub-block. In Figure 42(a), the positions of the standard reference pixels of the upper left subblock can be 1 and 3, the positions of the standard reference pixels of the upper right subblock can be 2 and 3, and the standard reference pixels of the lower left subblock can be 1 and 3. The positions of can be 1 and 4, and the positions of the standard reference pixels of the lower right sub-block can be 2 and 4. For example, when a prediction block for the upper right sub-block in Figure 42(a) is generated, if planar mode is used, the video signal processing device uses the V pixel value and the value generated using the reference pixel 3 and H A predicted value for the pixel marked with an Figure 42(b) shows prediction of the horizontal plane mode on a sub-block basis. For example, when the prediction block for the lower left sub-block in FIG. 42(b) is generated, if the horizontal plane mode is used, the video signal processing device generates the H pixel value and the standard reference pixel 1. A weighted average of the values can be used to generate a predicted value for the X-marked pixel. Figure 42(c) shows prediction of the vertical plane mode on a sub-block basis. For example, when a prediction block for the upper right sub-block in Figure 42(c) is generated, if the vertical plane mode is used, the video signal processing device generates a value using the V pixel value and the reference pixel 3. You can generate a predicted value for the pixel marked with X by weighting the average.

The video signal processing device includes the horizontal or vertical size of the current block, the ratio of the horizontal and vertical sizes of the current block, whether the current block is a luminance component block or a chrominance component block, quantization parameter information, and intra prediction directionality of neighboring blocks of the current block. Based on at least one of the mode information, the position of the subblock, and intra prediction directional mode information derived using DIMD from surrounding pixels of the current block, a horizontal plane mode, a vertical plane mode, and a conventional plane are applied to each subblock. Among the modes, you can determine which planar mode is applied.

If the intra prediction directional mode derived using DIMD from the surrounding pixels of the current block is greater than or equal to a random value, the video signal processing device may apply the vertical plane mode to generate the prediction block of the sub-block. If the intra prediction directional mode derived using DIMD from the surrounding pixels of the current block is smaller than a random value, the video signal processing device may apply the horizontal plane mode to generate the prediction block of the sub-block. At this time, the arbitrary value may be an integer or may be 34, which is the index of the diagonal mode (angle mode 34 in FIG. 6). DIMD information derived from neighboring pixels of the current block may include the first intra prediction directionality mode, the second intra prediction directionality mode, and information on whether to perform weight prediction. If DIMD information derived from surrounding pixels of the current block indicates that weight prediction is not performed, the video signal processing device may generate a prediction block for each sub-block using a conventional planar mode.

The plane mode used to generate the prediction block for each sub-block may be determined based on the location of each sub-block. The video signal processing device may use a conventional planar mode to generate a prediction block of the upper left sub-block, use a vertical plane mode to generate a prediction block of the upper right sub-block, and use a horizontal plane mode to generate a prediction block of the upper-right sub-block. The prediction block of the lower left sub-block can be generated, and the prediction block of the lower right sub-block can be generated using the conventional unfolding mode.

Only when a vertical or horizontal plane mode is applied to the current block, the sub-block unit plane mode prediction method can be applied. In other words, when the conventional planar mode is applied to the current block, the video signal processing device can generate a prediction block of the current block using the conventional planar mode. If the vertical or horizontal plane mode is used for the current block, the video signal processing device can divide the current block into subblocks and generate a prediction block for each subblock using the vertical or horizontal plane mode. there is.

The first horizontal line of the current block (pixels marked 1 in Figure 40(a)) can be predicted using the A pixel and pixels adjacent to the upper part of the current block (e.g., the hatched pixels in Figure 40). there is. Likewise, the second horizontal line of the current block (pixels marked 2 in Figure 40(a)) will be predicted using the pixels adjacent to the upper part of the current block (e.g., the hatched pixels in Figure 40) and the B pixel. You can. Additionally, the line-level planar mode can be divided into a vertical planar mode, a horizontal planar mode, and a weighted average prediction mode of prediction blocks in the vertical and horizontal directions. If the horizontal size of the current block is larger than the vertical size, the number of top pixels (pixels) adjacent to the current block is large, so the video signal processing device can generate a prediction block for the current block using the vertical plane mode. You can. If the vertical size of the current block is larger than the horizontal size, the number of pixels (pixels) adjacent to the left of the current block is large, so the video signal processing device generates a prediction block for the current block using the horizontal plane mode. can do. If the horizontal and vertical sizes of the current block are the same, the video signal processing device can generate a prediction block for the current block using a conventional planar mode.

When the prediction block of the current block is generated using the planar mode, the video signal processing device generates the prediction block using the vertical planar mode, and generates the prediction block using the horizontal planar mode, and each prediction block The final prediction block can be generated by weighting the average. At this time, the weights may be the same or different. If the intra prediction directional mode of the DIMD derived from the surrounding pixels of the current block is greater than or equal to a random value, the weight of the block predicted using the vertical plane mode may be the largest. The weight of the block predicted using the vertical plane mode may be 3, and the weight of the block predicted using the horizontal plane mode may be 1. If the intra prediction directional mode of the DIMD derived from the surrounding pixels of the current block is smaller than a random value, the weight of the block predicted using the horizontal plane mode may be the largest. The weight of the block predicted in the horizontal plane mode may be 3, and the weight of the block predicted in the vertical plane mode may be 1. Here, the arbitrary value is an integer, which may be 34, which is the index of the diagonal mode (angle mode 34 in FIG. 6). DIMD information derived from neighboring pixels of the current block may include the first intra prediction directionality mode, the second intra prediction directionality mode, and information on whether to perform weight prediction. If DIMD information derived from neighboring pixels of the current block indicates that weight prediction is not applied, the video signal processing device may apply the same weight to each prediction block. Here, the prediction block generated using the vertical plane mode and the prediction block generated using the horizontal plane mode may be generated using a sub-block-based planar mode prediction method.

Prediction using DC mode has the problem of generating discontinuous edges at boundaries between blocks at low bit rates. To improve this problem, the video signal processing device may perform correction on the prediction block by adding a random offset value to the prediction block of the current block generated through DC mode. At this time, information about the arbitrary offset value may be included in the bitstream and explicitly signaled, or may be implicitly derived from surrounding pixel values adjacent to the current block. The decoder may perform correction on the prediction block of the current block based on the offset value obtained by parsing information about an arbitrary offset value. Alternatively, the decoder may perform correction for the prediction block of the current block by deriving an offset value from neighboring pixel values adjacent to the current block. Alternatively, the prediction block of the current block may be generated by weighting the average of a block predicted using planar mode and a block predicted using DC mode.

Three planar modes or three DC modes can be derived based on the template. First, the video signal processing device may configure a reference template including reconstructed neighboring blocks adjacent to the current block (see FIG. 23). And the video signal processing device can configure prediction templates for the three planar modes using reference pixels around the reference template. The video signal processing device may calculate the cost between the reference template and the prediction template and then generate a prediction block of the current block using a planar prediction mode indicating the minimum cost.

Additionally, the video signal processing device may configure a reference template including reconstructed neighboring blocks adjacent to the current block (see FIG. 23). And the video signal processing device can configure prediction templates for the three planar modes using reference pixels around the reference template. The video signal processing device may calculate the cost between the reference template and the prediction template and then configure a list of planar modes based on the cost. At this time, the planar mode with the smallest cost may be located at the beginning of the list, and the list may be organized in ascending order according to cost. The encoder may generate and signal a bitstream including index information about the optimal plane mode for generating the prediction block of the current block among the plane modes in the list. The decoder can generate a prediction block of the current block using a planar mode determined by parsing the index information.

The above-described template-based planar mode derivation method can be equally applied to the three DC modes.

When a video signal processing device encodes the current block in intra prediction mode, pixels on the left or upper side of the current block are adjacent to neighboring pixels of the reconstructed current block, so they can be effectively predicted. However, the prediction efficiency of pixels to the right or below the current block is low because adjacent pixels have not yet been restored. In other words, the farther the pixel is located in the lower right corner of the current block, the farther the distance from the reference pixel (pixels around the restored current block) becomes, so the error signal increases. In order to improve this problem, as shown in FIG. 43, the video signal processing device restores only one pixel located on the lower right side of the current block in advance, and then restores the current block based on the restored lower right pixel and surrounding pixels adjacent to the current block. A prediction block can be generated. At this time, the pixel located on the lower right side of the current block can be predicted and encoded from the pixel adjacent to the current block, and information about the difference value, which is the difference between the pixel value located on the lower right side of the current block and the predicted value, is included in the bitstream and signaled. It can be. The decoder can restore the pixel located on the lower right side of the current block using the difference value determined by parsing information about the difference value. Next, the video signal processing device can generate a prediction block of the current block using pixels adjacent to the current block and one pixel located at the lower right of the current block. The video signal processing device uses pixels adjacent to the current block and one pixel located on the lower right side of the current block to display pixels on the right side of the current block (pixels marked 1 in FIG. 43) and pixels on the lower side (pixels marked 1 in FIG. 43). After predicting (pixels indicated by ), a prediction block of the current block can be generated using pixels adjacent to the current block and the lower and right pixels within the current block.

In order to increase coding efficiency, instead of coding the above-mentioned residual signal as is, a method of converting the residual signal, quantizing the obtained transform coefficient value, and coding the quantized transform coefficient may be used. As described above, the converter may obtain a transform coefficient value by converting the residual signal. At this time, the residual signal of a specific block may be distributed throughout the entire area of the current block. Accordingly, coding efficiency can be improved by concentrating energy in the low-frequency region through frequency domain conversion of the residual signal.

The encoder may obtain at least one residual block containing the residual signal for the current block. The remaining block may be either the current block or blocks divided from the current block. In this specification, a residual block may be described as a residual array or residual matrix containing residual samples of the current block. Additionally, in this specification, the residual block may represent a block with the same size as the size of the transform unit or transform block.

The encoder can transform the remaining blocks using a transform kernel. The transformation kernel used for transformation of the remaining block may be a transformation kernel with separable characteristics of vertical transformation and horizontal transformation. In this case, transformation for the remaining block can be performed separately into vertical transformation and horizontal transformation. For example, the encoder can perform vertical transformation by applying a transformation kernel in the vertical direction of the residual block. Additionally, the encoder can perform horizontal transformation by applying a transformation kernel in the horizontal direction of the remaining block. In this specification, a transform kernel may be used as a term to refer to a set of parameters used for transforming a residual signal, such as a transform matrix, transform array, transform function, or transform. According to one embodiment, the transformation kernel may be any one of a plurality of available kernels. Additionally, transformation kernels based on different transformation types may be used for each of vertical transformation and horizontal transformation. That is, before performing the first transformation, a transformation method for the vertical and horizontal directions is derived using at least one of the intra prediction mode of the current block, the encoding mode, the transformation method parsed from the bitstream, and the size information of the current block. It can be. Additionally, in order to reduce computational complexity in the conversion process for blocks of large size, a process may be performed in which only the low-frequency region is left and the high-frequency region is treated as '0'. This process is called high-frequency zeroing, and for this zeroing, the conversion size during the actual primary conversion can be set. In the high-frequency zeroing process, the low-frequency area can be set to an arbitrarily determined size. For example, the horizontal or vertical size can be a combination of 4, 8, 16, 32, etc.

The encoder can quantize the transform block converted from the residual block by transmitting it to the quantization unit. At this time, the transform block may include a plurality of transform coefficients. Specifically, a transform block may be composed of a plurality of transform coefficients arranged two-dimensionally. The size of the transform block, like the remaining block, may be the same as either the current block or a block divided from the current block. Transform coefficients transmitted to the quantization unit can be expressed as quantized values.

Additionally, the encoder may perform additional transformation before the transform coefficients are quantized. The above-described transformation method may be referred to as a primary transform, and additional transformation may be referred to as a secondary transform. Secondary transformation may be optional for each remaining block. According to one embodiment, the encoder may improve coding efficiency by performing secondary transformation on a region where it is difficult to concentrate energy in the low-frequency region only through primary transformation. For example, a secondary transformation may be added to a block whose residual values appear large in directions other than the horizontal or vertical direction of the residual block. The residual values of an intra-predicted block may have a higher probability of changing in directions other than the horizontal or vertical direction compared to the residual values of an inter-predicted block. Accordingly, the encoder can additionally perform secondary transformation on the residual signal of the intra-predicted block. Additionally, the encoder may omit secondary transformation for the residual signal of the inter-predicted block. Even in the secondary conversion process, high-frequency zeroing in the primary conversion can be performed.

As another example, whether to perform secondary transformation may be determined depending on the size of the current block or remaining block. Additionally, transformation kernels of different sizes may be used depending on the size of the current block or remaining block. For example, 8X8 secondary transformation may be applied to a block where the length of the shorter side of width or height is greater than or equal to the first preset length. In addition, 4 At this time, the first preset length may be larger than the second preset length, but the present disclosure is not limited thereto. Additionally, unlike the primary transformation, the secondary transformation may not be performed separately into vertical transformation and horizontal transformation. This secondary transform may be referred to as Low Frequency Non-Separable Transform (LFNST).

Additionally, in the case of video signals in a specific area, high-frequency band energy may not be reduced even if frequency conversion is performed due to rapid changes in brightness. Accordingly, compression performance by quantization may deteriorate. Additionally, when conversion is performed on an area where residual values rarely exist, encoding time and decoding time may unnecessarily increase. Accordingly, conversion of the residual signal in a specific area may be omitted. Whether or not to perform conversion on the residual signal of a specific area may be determined by syntax elements related to conversion of the specific area. For example, the syntax element may include transform skip information. Transformation skip information may be a transform skip flag. If the transformation skip information for the remaining block indicates transformation skipping, transformation is not performed on the corresponding remaining block. In this case, the encoder can directly quantize the residual signal for which conversion of the corresponding region has not been performed.

The above-described conversion-related syntax elements may be information parsed from a video signal bitstream. The decoder may obtain conversion-related syntax elements by entropy decoding the video signal bitstream. Additionally, the encoder may generate a video signal bitstream by entropy coding transformation-related syntax elements.

The decoder can obtain the encoding information necessary for decoding by parsing the transmitted bitstream. At this time, information related to the transformation process includes index information for the primary and secondary transformation types and quantized transformation coefficients. The inverse transform unit may obtain a residual signal by inversely transforming the inverse quantized transform coefficient. First, the inverse transformation unit can detect whether inverse transformation is performed for a specific region from the transformation-related syntax elements of the region. According to one embodiment, when a transform-related syntax element for a specific transform block indicates transform skipping, transform for the corresponding transform block may be omitted. In this case, both the first inverse transform and the second inverse transform can be omitted for the transform block. Additionally, the dequantized transform coefficient can be used as a residual signal. For example, the decoder can restore the current block using the dequantized transform coefficient as a residual signal. Alternatively, the secondary inverse transformation may be performed and the first inverse transformation may be omitted, and the secondary inverse transformation may be used as the residual signal. The above-described first-order inverse transform represents the inverse transformation of the first-order transform, and may be referred to as an inverse primary transform. The secondary inverse transform refers to the inverse transformation of the secondary transform, and may be referred to as an inverse secondary transform or inverse LFNST. In the present invention, the first (inverse) transformation may be referred to as the first (inverse) transformation, and the secondary (inverse) transformation may be referred to as the second (inverse) transformation.

Figure 44 shows DCT-II, DCT-V (discrete cosine transform type-V), DCT-VIII (discrete cosine transform type-VIII), DST-I (discrete sine transform type-I), and DST-VII applied to MTS. This shows the kernel formula. DCT and DST can be expressed as functions of cosine and sine, respectively. When the basis function of the transformation kernel for the number of samples N is expressed as Ti(j), index i represents the index in the frequency domain, and index j is the basis. Indicates the index within the function. That is, as i becomes smaller, it represents a low-frequency basis function, and as i becomes larger, it represents a high-frequency basis function. When expressed as a two-dimensional matrix, the basis function Ti(j) can represent the j-th element of the i-th row, and since the transformation kernels shown in Figure 30 all have separable characteristics, the horizontal direction with respect to the residual signal Conversion can be performed in the and vertical directions, respectively. That is, when the residual signal block is X and the transformation kernel matrix is T, the transformation for the residual signal X can be expressed as TXT'. At this time, T' means the transpose of the transformation kernel matrix T. Since DCT and DST are in decimal form rather than integer, it is burdensome to implement them as is in a hardware encoder and decoder. Therefore, the decimal type conversion kernel must be approximated to an integer type conversion kernel through scaling and rounding. The integer precision of the conversion kernel can be determined as 8-bit or 10-bit, but if the precision is low, coding efficiency may decrease. Depending on the approximation, the orthonormal properties of DCT and DST may not be maintained, but the resulting loss of coding efficiency is not significant, so approximating the conversion kernel to an integer form is advantageous in terms of implementing a hardware encoder and decoder. IDTR (Identity Transform) is a transformation in which the result of transformation is the self before transformation, and is called an identity transformation. In general, identity transformation constructs a transformation matrix by setting '1' at the position where the row and column have the same value. However, here, the identity transformation is used to equally increase or decrease the value of the input residual signal using an arbitrary fixed value other than the '1' value.

A bitstream consists of one or more coded video sequences (CVS), and one CVS is encoded independently from other CVSs. Each CVS consists of one or more layers, and each layer can represent a specific image quality, a specific resolution, or a general image, depth information map, or transparency map. Additionally, CLVS (coded layer video sequence) refers to a layer-wise CVS composed of consecutive PUs (in decoding order) within the same layer. For example, a CLVS may exist for a specific image quality layer, and a CLVS may exist for a depth information map.

Figure 45 shows a method of signaling in SPS a syntax element indicating whether the vertical plane mode and the horizontal plane mode are activated.

When sps_directional_planar_enabled_flag in FIG. 45 is '1' (true), it may indicate that the vertical plane mode and the horizontal plane mode are activated. If sps_directional_planar_enabled_flag is '0' (false), it may indicate that the vertical plane mode and the horizontal plane mode are disabled. That is, the vertical plane mode and horizontal plane mode are not used in all blocks. If sps_directional_planar_enabled_flag is not parsed (not included in the bitstream), the value of sps_directional_planar_enabled_flag can be inferred to be 0.

sps_directional_planar_enabled_flag equal to 1 specifies that the vertical planar and the horizontal planar mode is enabled for the CLVS. sps_directional_planar_enabled_flag equal to 0 specifies that the vertical planar and the horizontal planar mode is disabled for the CLVS. When sps_directional_planar_enabled_flag is not present, it is inferred to be equal to 0.

In the same manner as the signaling method in the SPS described above, a syntax element indicating whether the vertical plane mode and the horizontal plane mode are activated can also be signaled in the PPS. That is, whether the vertical plane mode and the horizontal plane mode are activated may vary for each picture and/or frame depending on the syntax element signaled in the PPS (eg, pps_directional_planar_enabled_flag).

Figure 46 shows a method of signaling in the SPS a syntax element indicating whether to activate the method of generating a prediction block using a plurality of reference pixel lines described with reference to Figures 25 to 29. A method of generating a prediction block using a plurality of reference pixel lines may be described as intra fusion.

If sps_intra_fusion_enabled_flag in FIG. 46 is '1' (true), it may indicate that intra fusion is activated. If sps_intra_fusion_enabled_flag is '0' (false), it indicates that intra fusion is disabled. That is, intra fusion is not performed on all blocks. If sps_intra_fusion_enabled_flag is not parsed (not included in the bitstream), the value of sps_intra_fusion_enabled_flag can be inferred to be 0.

sps_intra_fusion_enabled_flag equal to 1 specifies that the intra fusion method is enabled for the CLVS. sps_intra_fusion_enabled_flag equal to 0 specifies that intra fusion method is disabled for the CLVS. When sps_intra_fusion_enabled_flag is not present, it is inferred to be equal to 0.

In the same manner as the signaling method in the SPS described above, a syntax element indicating whether intra fusion is activated can also be signaled in the PPS. That is, whether intra fusion is activated may vary for each picture and/or frame depending on the syntax element signaled in the PPS (eg, pps_intra_fusion_enabled_flag).

Figure 47 shows constraint flags related to vertical plane mode and horizontal plane mode and intra fusion. The general_constraint_info() syntax of Figure 47 can be called in the profile_tier_level() syntax. profile_tier_level() syntax can be called in sequence parameter set RBSP syntax, video parameter set RBSP syntax, and Decoding capability information RBSP syntax. Individual syntax elements of the general_constraint_info() syntax may have corresponding syntax elements in the sequence parameter set RBSP, and activation/deactivation of the corresponding sequence parameter set RBSP syntax element may be restricted by the definition of the corresponding flag.

The constraint flag related to the vertical plane mode and the horizontal plane mode may be gci_no_directional_planar_constraint_flag.

If the value of gci_no_directional_planar_constraint_flag is equal to 1, the value of sps_directional_planar_enabled_flag for all pictures existing in OlsScope is constrained to 0. That is, deactivation of the vertical plane mode and the horizontal plane mode may be restricted (forced). If the value of gci_no_directional_planar_constraint_flag is equal to 0, the value of sps_directional_planar_enabled_flag is not restricted.

gci_no_directional_planar_constraint_flag equal to 1 specifies that sps_directional_planar_enabled_flag for all pictures in OlsInScope shall be equal to 0. gci_no_directional_planar_constraint_flag equal to 0 does not impose such a constraint.

The constraint flag related to intra fusion may be gci_no_intra_fusion_constraint_flag.

If the value of gci_no_intra_fusion_constraint_flag is equal to 1, the value of sps_intra_fusion_enabled_flag for all pictures existing in OlsScope is constrained to 0. In other words, deactivation of intra fusion can be restricted (forced). If the value of gci_no_intra_fusion_constraint_flag is equal to 0, the value of sps_intra_fusion_enabled_flag is not constrained to 0.

gci_no_intra_fusion_constraint_flag equal to 1 specifies that sps_intra_fusion_enabled_flag for all pictures in OlsInScope shall be equal to 0. gci_no_intra_fusion_constraint_flag equal to 0 does not impose such a constraint.

When generating a prediction block for a current block, a video signal processing device may use a multi-prediction mode that performs a weighted average of two or more intra prediction blocks. When DIMD or TIMD mode is used, the two intra prediction directionality modes can be implicitly determined from the surrounding block or MPM list. In blocks other than DIMD or TIMD mode, two intra prediction directional modes can be signaled from the bitstream, but this may increase the bit amount and reduce compression efficiency.

To alleviate this increase in bit quantity, when the video signal processing device configures the MPM list, the intra prediction directionality mode to be included in each item of the list may be set to two or more instead of one.

In Figure 48(a), A, B, C,... , F, G can represent any of the intra prediction directional modes, MIP, Intra TMP, IBC, DIMD, and TIMD, and "N/A" means not used, so the 4th and 5th in the MPM list are multi-prediction. It may consist of only one intra prediction directional mode rather than a single mode. As shown in Figure 48(b), a single prediction mode can be added to the MPM list first, and then multiple prediction modes can be added, and "N/A" means not used, so the 1st and 2nd in the MPM list are It may consist of only one intra prediction directional mode, not multiple prediction modes.

The encoder can signal information about whether to use an MPM list consisting of only one intra-prediction directional mode (single prediction-based MPM list) or a multi-prediction-based MPM list by adaptively including it in the bitstream on a block-by-block basis. . The decoder parses information about whether to use an MPM list consisting of only one intra-prediction directional mode (single prediction-based MPM list) or a multi-prediction-based MPM list, and determines which MPM is used to generate a prediction block for the current block. You can decide on a list.

The video signal processing device includes the horizontal or vertical size of the current block, the ratio of the horizontal and vertical sizes of the current block, whether the current block is a luminance block or a chrominance block, quantization parameter information, and intra prediction directional mode information of neighboring blocks of the current block. , At least one of the encoding mode information of the current block can be used to determine whether to use the multi-prediction based MPM list.

Coding efficiency may vary depending on how the multi-prediction-based MPM list is constructed, and various methods can be applied as follows.

The video signal processing device may first construct a single prediction-based MPM list and then construct a multi-prediction-based MPM list using the single prediction-based MPM list. For example, a video signal processing device can configure a multi-prediction-based MPM list by combining each intra-prediction directional mode in a single prediction-based MPM list. At this time, the video signal processing device may rearrange the single prediction-based MPM list using the template cost-based reordering method used in the TIMD method. The video signal processing device may construct a multi-prediction-based MPM list using only a combination of the first intra-prediction directional mode and non-first intra-prediction directional modes of the reordered single prediction-based MPM list.

A video signal processing device can construct a multi-prediction-based MPM list using a combination of intra-prediction directional modes derived from DIMD and single-prediction-based MPM lists. For example, the video signal processing device may combine the first intra prediction directionality mode derived from DIMD and the first intra prediction directionality mode in the single prediction-based MPM list and add it to the first of the multi-prediction-based MPM list. Additionally, the video signal processing device may combine the second intra prediction directionality mode derived from DIMD and the first intra prediction directionality mode in the single prediction-based MPM list and add it to the second of the multi-prediction-based MPM list. Additionally, the video signal processing device may combine the first intra prediction directionality mode derived from DIMD and the second intra prediction directionality mode in the single prediction-based MPM list and add it to the third of the multi-prediction-based MPM list. In this way, the video signal processing device can construct a multi-prediction based MPM list using all possible combinations.

A video signal processing device can configure a multi-prediction-based MPM list using several predefined multi-prediction modes. Here, several predefined multi-prediction modes are (18, 0), (50, 0), (34, 0), (2, 0), (66, 0), (18, 1), (50, 1) ), (34, 1), (2, 1), (66, 1), etc., and in "(X, Y)", there is.

The video signal processing device can reorder the multi-prediction based MPM list through a TIMD-based reordering method. The video signal processing device can reset the multi-prediction based MPM list using only a few low-cost items.

When a video signal processing device generates a prediction block using multiple prediction modes, the intra prediction mode used to generate each prediction block may be signaled based on the Primary MPM and Secondary MPM. Primary MPM may include 6 intra-prediction directional modes, and Secondary MPM may include 16 intra-prediction directional modes. Secondary MPM can be configured using several predefined intra prediction modes. For example, the video signal processing unit has DC, 50, 18, 46, 54, 14, 22, 42, 58, 10, 26, 38, 62, 6, 30, 34, 66, 2, 48, 52, 16 If the prediction mode (see intra prediction mode in Figure 6) does not exist in the Primary MPM, DC, 50, 18, 46, 54, 14, 22, 42, 58, 10 until the list of Secondary MPMs is all constructed. The list of secondary MPMs can be constructed by adding the following prediction modes in order: , 26, 38, 62, 6, 30, 34, 66, 2, 48, 52, and 16. When multiple prediction modes are used in the current block, the video signal processing device determines the prediction block of the current block by deriving one intra-prediction directionality mode from the Primary MPM and another intra-prediction directionality mode from the Secondary MPM. The intra prediction directional mode used to generate can be determined. If multiple prediction modes are used in the current block, the encoder can signal by including in the bitstream index information indicating which intra-prediction directional mode is used within the Primary MPM and Secondary MPM. If multiple prediction modes are used in the current block, the decoder can parse the index information and then determine the intra prediction direction modes used to generate prediction blocks for the current block from the Primary MPM and Secondary MPM lists.

If multiple prediction modes are used in the current block, the video signal processing device can determine the intra prediction directionality modes used to generate the prediction block of the current block using the intra prediction directionality mode, Primary MPM, and Secondary MPM derived from DIMD. there is. If multiple prediction modes are used in the current block, the encoder sends information about whether the intra prediction directionality mode derived from DIMD is used and index information indicating which intra prediction directionality mode is used within the Primary MPM and Secondary MPM in the bitstream. It can be signaled by including it in . If multiple prediction modes are used in the current block, the decoder parses information on whether the intra prediction directionality mode derived from DIMD is used and index information indicating which intra prediction directionality mode is used within the Primary MPM and Secondary MPM. Intra prediction directional modes used to generate prediction blocks for the current block can be determined.

If the multi-prediction mode is used in the current block, the video signal processing device can configure the Secondary MPM list into a multi-prediction-based MPM list using the Primary MPM list. For example, if the primary MPM mode created from the surrounding block is Planar, 18, 54, 30, 45, 17, modes such as Planar+18, Planard+54,.., etc. are added to the list of secondary MPM to enable multi-prediction based It may consist of an MPM list. If a multi-prediction mode is used in the current block, the encoder uses the Primary MPM list to configure the Secondary MPM list into a multi-prediction-based MPM list, and then configures the multi-prediction mode used to generate the prediction block of the current block into the list of Secondary MPMs. It can be determined and signaled by including index information about the determined list of Secondary MPMs in the bitstream. The decoder can parse the index information and determine the multi-prediction mode used to generate the prediction block of the current block from the list of Secondary MPM.

Additionally, when a multi-prediction mode is used in the current block, the video signal processing device can configure the Secondary MPM list into a multi-prediction-based MPM list using the intra-prediction directional mode derived from DIMD and the Primary MPM list. For example, if the primary MPM mode created from the surrounding block is Planar, 18, 54, 30, 45, 17, the list of secondary MPM includes modes such as Planar+18, Planard+54,.., etc. to enable multi-prediction. It may consist of a base MPM list. If a multi-prediction mode is used in the current block, the encoder can configure the secondary MPM list into a multi-prediction-based MPM list using the intra-prediction directional mode derived from DIMD and the primary MPM list. The encoder can determine the multi-prediction mode used to generate the prediction block of the current block from the list of Secondary MPMs, and can signal it by including index information for the determined list of Secondary MPMs in the bitstream. The decoder can parse the index information and determine the multi-prediction mode used to generate the prediction block of the current block from the list of Secondary MPM.

When multiple prediction modes are used to generate prediction blocks, the video signal processing device can determine the positions of reference pixels used to generate each prediction block, respectively. That is, the video signal processing device can specify the left reference pixel line and the top reference pixel line used to generate each prediction block among various reference pixel lines (see FIG. 31). After determining the optimal reference pixel line, the encoder can signal by including the position information of the left reference pixel line and the position information of the top reference pixel line in the bitstream. The decoder can parse each position information to determine the positions of the left reference pixel line and the top reference pixel line used to generate each prediction block.

Referring to FIG. 49, when the first intra prediction directional mode among the multiple prediction modes is used to generate a prediction block, the video signal processing device may generate the prediction block using only the left reference pixel of the current block. Additionally, when the second intra prediction directional mode among the multiple prediction modes is used to generate a prediction block, the video signal processing device may generate the prediction block using only the top reference pixel of the current block. For the first intra prediction directional mode, only modes with numbers less than 34 can be used among the intra prediction directional modes of FIG. 6, and for the second intra prediction directional mode, only modes equal to or greater than 34 among the intra prediction directional modes of FIG. 6 can be used. there is. Therefore, when a multi-prediction-based MPM list is configured, the first intra prediction directional mode can be configured to enable only modes with numbers less than 34, and the second intra prediction directional mode can be configured to enable only modes with numbers equal to or greater than 34.

Referring to FIG. 49, the video signal processing device can specify the left reference pixel line and the top reference pixel line used to generate each prediction block, respectively. After determining the optimal reference pixel line, the encoder can signal by including the position information of the left reference pixel line and the position information of the top reference pixel line in the bitstream. The decoder can parse each position information to determine the positions of the left reference pixel line and the top reference pixel line used to generate each prediction block. Additionally, the position information of the top reference pixel line may be derived from the position information of the left reference pixel line, and the position information of the top reference pixel line and the position information of the left reference pixel line may be the same. When signaling the position information of the top reference pixel line, the encoder can signal only the difference value from the position information of the left reference pixel line by including it in the bitstream. In the decoder, the position information of the left reference pixel line is parsed, the difference value of the position information of the top reference pixel line is parsed, and then the difference value of the position information of the left reference pixel line and the top reference pixel line is used. The location information of the upper reference pixel line can be determined.

When a secondary transform is applied to the current block to which a multi-prediction mode is applied, a transform set for the secondary transform may be determined based on the two intra prediction directional modes. At this time, one of the two intra prediction directionality modes can be used to select the first or second transformation set, and the first intra prediction directionality mode can be used. The encoder can signal which of the two intra-prediction directionality modes will be used to derive the primary transform or secondary transform set by including in the bitstream. The decoder parses information about which of the two intra prediction directional modes to use to derive the first or second transform set, and determines which intra prediction mode to use to derive the first or second inverse transform set for the current error block. Directional mode can be determined. Alternatively, when multiple prediction modes are applied to the current block, the video signal processing device may determine the intra prediction directionality mode used to derive the first transform (inverse transform) or second transform (inverse transform) set as a predefined mode. Here, the predefined mode may be Planar, DC mode, etc.

A video signal processing device may not perform PDPC on a prediction block to which the multi-prediction mode is applied. The video signal processing device may perform PDPC on the prediction block to which the multiple prediction mode is applied based on the first intra prediction directional mode among the two intra prediction directional modes. The video signal processing device may perform the first PDPC based on the first intra prediction directional mode among the two intra prediction directional modes on the prediction block to which the multiple prediction mode is applied, and then perform the second PDPC based on the second intra prediction directional mode. there is.

In the MTS transformation, which is the first transformation described above, the transformation is calculated by applying a transformation kernel to each of the vertical and horizontal directions of the error block, so it can be said to be a separable transform method. On the other hand, in the LFNST transformation, which is the secondary transformation described above, the transformation does not apply a transformation kernel to each of the vertical and horizontal directions, and the transformation kernel is applied and calculated only once, so it can be said to be a non-separable transform method. . In addition, the above-described secondary transformation is additionally applied to the primary transformed transform coefficient of the block to which DCT-2 transformation has been applied, so it can be said to be a two-stage transformation technique. The above-described secondary transformation has high coding efficiency, but has the disadvantage of being complicated because a total of three transformation kernels are applied. To reduce this complexity, the NSPT (Non-separable primary transform) method, which is a method of applying transformation using only the secondary transformation method, can be applied. The NSPT transformation method is a non-separate transformation method, and is calculated by applying the transformation kernel only once rather than applying a transformation kernel to each of the vertical and horizontal directions of the error block. In a video signal processing device, the error block of the current block may be transformed or inversely transformed using one of the MTS, DCT2 + LFNST, and NSPT transformation methods.

There may be 35 transformation sets used in the LFNST and NSPT transformations, and they may vary depending on the intra prediction mode (see Figure 6). That is, the video signal processing device may refer to the transform set table of FIG. 50 and derive the transform set index of the LFNST and NSPT transforms corresponding to the intra prediction mode (see FIG. 6). Additionally, the LFNST and NSPT transform sets vary depending on the intra prediction mode, information on whether the current block is a luminance block or a chrominance block, the horizontal and vertical sizes of the current block, and whether the intra prediction directionality mode of the current block is the extended angle mode. You can. There may be an arbitrary number of transformation matrices for each transformation set. Here, the arbitrary number may be an integer of 1 or more, and may be 3. In the encoder, index information about the optimal transformation matrix among several transformation matrices in the transformation set can be signaled by including it in the bitstream. In the decoder, after parsing the index for the optimal transformation matrix, the inverse transformation can be applied using the transformation matrix corresponding to the index in the transformation set.

If the prediction block of the current block is generated using the vertical plane mode and NSPT is applied, the encoder does not use the transform set of the plane mode for the error signal, but rather the transform set of the vertical angle mode (angle mode 50 in Figure 6). The transformation process can be performed using transformation sets. If the horizontal plane mode is used to generate the prediction block of the current block and NSPT is applied, the encoder can derive the transform set using the horizontal angle mode (angle mode 18 in FIG. 6) rather than the planar mode. If the vertical plane mode is used to generate the prediction block of the current block and NSPT is applied, the decoder does not use the transform set for the plane mode for the error signal, but rather the transform set for the vertical angle mode (angle mode 50 in Figure 6). The transformation process can be performed using transformation sets. If the horizontal plane mode is used to generate the prediction block of the current block and NSPT is applied, the decoder does not use the transform set for the plane mode for the error signal, but rather the transform set for the horizontal angle mode (angle mode 18 in Figure 6). The transformation process can be performed using transformation sets.

Referring to Figure 33, Equation 4, when the vertical plane mode is used to generate the prediction block of the current block, the prediction block is generated according to the pixel value of rec(-1, H), which is a fixed position, and the x-axis coordinate. It can be generated based on the dynamically changing pixel value of rec(x, -1). That is, when the vertical plane mode is used, a change in pixel value occurs along the x-axis, so the characteristics of the error signal may be similar to those of the horizontal angle mode (angle mode 50 in FIG. 6). That is, if the prediction block of the current block is generated using the vertical plane mode and NSPT is applied, the encoder does not use the transform set of the plane mode for the error signal, but uses the horizontal angle mode (angle mode number 18 in Figure 6). ) can be used to perform the conversion process. If the horizontal plane mode is used to generate the prediction block of the current block and NSPT is applied, the encoder can derive the transform set using the vertical angle mode (angle mode 50 in Figure 6) rather than the planar mode. If the vertical plane mode is used to generate the prediction block of the current block and NSPT is applied, the decoder does not use the transform set for the plane mode for the error signal, but rather the transform set for the horizontal angle mode (angle mode 18 in Figure 6). The transformation process can be performed using transformation sets. If the horizontal plane mode is used to generate the prediction block of the current block and NSPT is applied, the decoder does not use the transform set for the plane mode for the error signal, but rather the transform set for the vertical angle mode (angle mode 50 in Figure 6). A transformation process can be performed using a transformation set. If the current block is adjacent to at least one boundary among a tile boundary, a slice boundary, and a subpicture boundary, the restored sample of another adjacent tile, another adjacent slice, or another adjacent subpicture cannot be accessed or used. If the current block is adjacent to at least one boundary among a tile boundary, a slice boundary, and a subpicture boundary, the encoder may not signal planar mode selection information and may not include it in the bitstream. In the decoder, if the current block is adjacent to at least one boundary among tile boundaries, slice boundaries, and subpicture boundaries, the planar mode selection information may not be parsed, and the mode of the current block may be set to planar mode (or DC mode). .

If the current block is adjacent to at least one boundary among tile boundaries, slice boundaries, and subpicture boundaries, the encoder may not signal information related to DIMD, TIMD, Intra TMP, TMRL, and intra fusion, and may not include it in the bitstream. You can. In the decoder, if the current block is adjacent to at least one boundary among tile boundaries, slice boundaries, and subpicture boundaries, information related to DIMD, TIMD, Intra TMP, TMRL, and intra fusion may not be parsed, and the mode of the current block may be set to planar. It can be set to mode (or DC mode).

FIG. 51 shows a method of deriving a multiple transform set and an LFNST transform set for the linear mode when linear mode is used in generating a prediction block of the current block described with reference to FIGS. 1 to 50.

Referring to FIG. 51, the video signal processing device may determine the first prediction mode of the current block (S5110). The video signal processing device may generate a prediction block of the current block based on the first prediction mode (S5120). The video signal processing device may generate a residual block of the current block based on a transformation matrix set determined based on the second prediction mode (S5130). The video signal processing device may restore the current block based on the prediction block and the residual block (S5140).

The residual block is based on at least one of a set of transform matrices of a multiple transform set (MTS) and/or a set of transform matrices of a low frequency non-separable transform (LFNST). It can be created.

The first prediction mode and the second prediction mode may be different prediction modes.

The first prediction mode may be one of a planar mode based on the horizontal direction (horizontal planar mode) or a planar mode based on the vertical direction (vertical planar mode).

The plane mode based on the horizontal direction may be a prediction mode based on the value of the block at the (-1, y) position and the value of the block at the (W, -1) position. The plane mode based on the vertical direction may be a prediction mode based on the value of the block at the (x, -1) position and the value of the block at the (-1, H) position. At this time, the position of the upper-left block of the current block may be (0, 0), and the position of the prediction block may be (x, y).

i) the first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is a vertical angle mode, or ii) the first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is The prediction mode may be a horizontal angle mode.

The first prediction mode may be a linear prediction mode.

The first prediction mode may be indicated by a syntax element included in the bitstream.

The methods described above in this specification may be performed through a processor of a decoder or encoder. Additionally, the encoder can generate a bitstream that is decoded by the methods described above. Additionally, the bitstream generated by the encoder may be stored in a computer-readable non-transitory storage medium (recording medium).

Although this specification is mainly described from the perspective of a decoder, it can be operated equally in an encoder. The term parsing in this specification has been described with a focus on the process of obtaining information from the bitstream, but from the encoder perspective, it can be interpreted as configuring the information in the bitstream. Therefore, the term parsing is not limited to the decoder operation, but can also be interpreted as the act of constructing a bitstream in the encoder. Additionally, this bitstream may be stored and configured in a computer-readable recording medium.

Embodiments of the present invention described above can be implemented through various means. For example, embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof.

In the case of hardware implementation, the method according to embodiments of the present invention uses one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), and Programmable Logic Devices (PLDs). , can be implemented by FPGAs (Field Programmable Gate Arrays), processors, controllers, microcontrollers, microprocessors, etc.

In the case of implementation by firmware or software, the method according to embodiments of the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described above. Software code can be stored in memory and run by a processor. The memory may be located inside or outside the processor, and may exchange data with the processor through various known means.

Some embodiments may also be implemented in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include both computer storage media and communication media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures or other data of modulated data signals such as program modules, or other transmission mechanisms, and includes any information delivery medium.

The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above are illustrative in all respects and should be interpreted as limited. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

Claims

In the video signal decoding device,

Contains a processor,

The processor,

determine the first prediction mode of the current block,

Generate a prediction block of the current block based on the first prediction mode,

Generating a residual block of the current block based on a set of transformation matrices determined based on a second prediction mode,

A video signal decoding device that restores the current block based on the prediction block and the residual block.
According to clause 1,

The residual block is based on at least one of a set of transform matrices of a multiple transform set (MTS) and/or a set of transform matrices of a low frequency non-separable transform (LFNST). A video signal decoding device generated by:
According to clause 1,

The first prediction mode and the second prediction mode are different prediction modes.
According to clause 3,

The first prediction mode is one of a planar mode based on the horizontal direction or a planar mode based on the vertical direction.
According to clause 4,

The plane mode based on the horizontal direction is a prediction mode based on the value of the block at the (-1, y) position and the value of the block at the (W, -1) position,

The plane mode based on the vertical direction is a prediction mode based on the value of the block at the (x, -1) position and the value of the block at the (-1, H) position,

The position of the upper left block of the current block is (0, 0),

A video signal decoding device, wherein the position of the prediction block is (x, y).
According to clause 4,

i) the first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is a vertical angle mode, or

ii) The first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is a horizontal direction angle mode.
According to clause 4,

The first prediction mode is a linear prediction mode.
According to clause 4,

The first prediction mode is indicated by a syntax element included in the bitstream.
In the video signal encoding device,

Contains a processor,

The processor,

determine the first prediction mode of the current block,

Generate a prediction block of the current block based on the first prediction mode,

Generating a residual block of the current block based on a set of transformation matrices determined based on a second prediction mode,

A video signal encoding device that restores the current block based on the prediction block and the residual block.
According to clause 9,

The residual block is based on at least one of a set of transform matrices of a multiple transform set (MTS) and/or a set of transform matrices of a low frequency non-separable transform (LFNST). A video signal encoding device generated by:
According to clause 9,

The first prediction mode and the second prediction mode are different prediction modes.
According to claim 11,

The first prediction mode is one of a planar mode based on the horizontal direction or a planar mode based on the vertical direction.
According to clause 12,

The plane mode based on the horizontal direction is a prediction mode based on the value of the block at the (-1, y) position and the value of the block at the (W, -1) position,

The plane mode based on the vertical direction is a prediction mode based on the value of the block at the (x, -1) position and the value of the block at the (-1, H) position,

The position of the upper left block of the current block is (0, 0),

A video signal encoding device where the position of the prediction block is (x, y).
According to clause 12,

i) the first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is a vertical angle mode, or

ii) The first prediction mode is a planar mode based on the horizontal direction, and the second prediction mode is a horizontal direction angle mode.
According to clause 12,

The first prediction mode is a linear prediction mode.
According to clause 12,

The first prediction mode is indicated by a syntax element included in the bitstream.
A computer-readable non-transitory storage medium storing a bitstream, wherein the bitstream is decoded by a decoding method,

The decoding method is,

determining a first prediction mode of the current block;

generating a prediction block of the current block based on the first prediction mode;

generating a residual block of the current block based on a set of transformation matrices determined based on a second prediction mode; and

and restoring the current block based on the prediction block and the residual block.
According to clause 17,

The residual block is based on at least one of a set of transform matrices of a multiple transform set (MTS) and/or a set of transform matrices of a low frequency non-separable transform (LFNST). A non-transitory storage medium created by
According to clause 17,

The first prediction mode and the second prediction mode are different prediction modes.
According to clause 19,

The first prediction mode is one of a planar mode based on a horizontal direction, or a planar mode based on a vertical direction.