WO2024071523A1

WO2024071523A1 - Method and device for video coding using improved cross-component linear model prediction

Info

Publication number: WO2024071523A1
Application number: PCT/KR2022/019676
Authority: WO
Inventors: 전병우; 이지환; 김범윤; 허진; 박승욱
Original assignee: 현대자동차주식회사; 기아 주식회사; 성균관대학교 산학협력단
Priority date: 2022-09-26
Filing date: 2022-12-06
Publication date: 2024-04-04

Abstract

Disclosed are a video coding method and device using improved cross-component linear model (CCLM) prediction, and the present embodiment provides a video coding method and device, wherein, in intra prediction of a current chroma block, a first predictor of the current chroma block is generated according to CCLM prediction, a second predictor of the current chroma block is additionally generated on the basis of pre-reconstructed neighboring pixels, and then the weights of the first predictor and the second predictor are combined.

Description

Method and apparatus for video coding using improved cross-component linear model prediction

This disclosure relates to a video coding method and apparatus using improved cross-component linear model prediction.

The content described below simply provides background information related to the present invention and does not constitute prior art.

Since video data has a larger amount of data than audio data or still image data, it requires a lot of hardware resources, including memory, to store or transmit it without processing for compression.

Therefore, typically, when storing or transmitting video data, an encoder is used to compress the video data and store or transmit it, and a decoder receives the compressed video data, decompresses it, and plays it. These video compression technologies include H.264/AVC, HEVC (High Efficiency Video Coding), and VVC (Versatile Video Coding), which improves coding efficiency by about 30% or more compared to HEVC.

However, the size, resolution, and frame rate of the image are gradually increasing, and the amount of data that needs to be encoded is also increasing accordingly, so a new compression technology with better coding efficiency and higher picture quality improvement effect than the existing compression technology is required.

Generally, an image to be encoded is partitioned into coding units (CUs) of various shapes and sizes and then encoded in units of CUs. The tree structure represents information defining the division of these CU units, and can be transmitted from the encoder to the decoder to indicate the division type of the image. When dividing into CUs, the luma image and chroma image can be divided independently. Alternatively, the luma signal and the chroma signal may be divided into CUs of the same structure. At this time, the technology in which the luma signal and the chroma signal have different division structures is called CST (Chroma Separate Tree) technology or dual tree technology. If CST technology is used, the chroma block may have a different partitioning method than the luma block. Additionally, a technology in which luma signals and chroma signals have the same division structure is called single tree technology. When the single tree technique is used, the chroma block has the same partitioning method as the luma block.

Meanwhile, there is a linear relationship between the pixels of the chroma signal and the corresponding pixels of the luma signal. Therefore, Cross-Component Linear Model (CCLM) prediction exists as a conventional technology that can generate an intra predictor of a chroma signal from pixels of the luma signal based on this linear relationship. For intra prediction of the current chroma block, CCLM prediction first determines the luma area corresponding to the current chroma block within the luma image. Afterwards, CCLM prediction derives a linear model between the pixels in the surrounding pixel lines of the current chroma block and the corresponding luma pixels. Finally, CCLM prediction uses the derived linear model to generate a predictor of the current chroma block from the pixel value of the corresponding luma area.

In CCLM prediction, as described above, neighboring pixels of the current chroma block are used to derive a linear model, but there is a problem in that the restored neighboring pixels are not used when generating a predictor. Therefore, when using CCLM prediction to improve picture quality and improve coding efficiency during intra prediction of the current chroma block, a method of additionally using the reconstructed neighboring pixels needs to be considered.

In order to improve the prediction performance of CCLM (Cross-Component Linear Model) prediction in intra prediction of the current chroma block, the present disclosure generates a first predictor of the current chroma block according to CCLM prediction and replaces the restored surrounding pixels. The purpose is to provide a video coding method and device for additionally generating a second predictor of the current chroma block based on and then weightedly combining the first predictor and the second predictor.

According to an embodiment of the present disclosure, in an intra prediction method of a current chroma block performed by an image decoding apparatus, the step of decoding a cross-component prediction mode for cross-component prediction for the current chroma block, Here, the cross component prediction predicts the current chroma block using pixels of the corresponding luma area for the current chroma block and the corresponding luma area; generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode; Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels; generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative mode; deriving weights for the first predictor and the second predictor; and generating an intra predictor of the current chroma block by performing a weighted sum of the first predictor and the second predictor using the weight.

According to another embodiment of the present disclosure, in an intra prediction method of a current chroma block performed by an image encoding device, the step of determining a cross-component prediction mode for cross-component prediction for the current chroma block. , where the cross component prediction predicts the current chroma block using pixels of the corresponding luma area for the current chroma block and the corresponding luma area; generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode; Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels; generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative mode; deriving weights for the first predictor and the second predictor; and generating an intra predictor of the current chroma block by performing a weighted sum of the first predictor and the second predictor using the weight.

According to another embodiment of the present disclosure, a computer-readable recording medium stores a bitstream generated by an image encoding method, wherein the image encoding method includes cross-component prediction for a current chroma block. determining a component prediction mode, wherein the cross-component prediction predicts the current chroma block using pixels of the corresponding luma area for the current chroma block and the corresponding luma area; generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode; Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels; generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative mode; deriving weights for the first predictor and the second predictor; and generating an intra predictor of the current chroma block by performing a weighted sum of the first predictor and the second predictor using the weight. .

As described above, according to this embodiment, in intra prediction of the current chroma block, a first predictor of the current chroma block is generated according to CCLM prediction, and a second predictor of the current chroma block is generated based on the reconstructed neighboring pixels. By providing a video coding method and device for additionally generating predictors and then weight combining the first predictor and the second predictor, it is possible to improve the prediction performance of CCLM prediction.

1 is an example block diagram of a video encoding device that can implement the techniques of the present disclosure.

Figure 2 is a diagram to explain a method of dividing a block using the QTBTTT structure.

3A and 3B are diagrams showing a plurality of intra prediction modes including wide-angle intra prediction modes.

Figure 4 is an example diagram of neighboring blocks of the current block.

Figure 5 is an example block diagram of a video decoding device that can implement the techniques of the present disclosure.

Figure 6 is an example diagram showing surrounding pixels referenced for CCLM prediction.

Figure 7 is an example diagram showing information that can be used in intra prediction of a chroma channel.

FIG. 8 is an exemplary diagram illustrating an intra prediction unit that performs intra prediction of a chroma block according to an embodiment of the present disclosure.

Figure 9 is an exemplary diagram showing a peripheral chroma pixel area according to an embodiment of the present disclosure.

10 and 11 are exemplary diagrams showing intensity histograms for each directional mode, according to an embodiment of the present disclosure.

Figure 12 is an exemplary diagram showing a peripheral chroma pixel area according to another embodiment of the present disclosure.

13A and 13B are exemplary diagrams showing intensity histograms for each directional mode according to another embodiment of the present disclosure.

Figure 14 is an exemplary diagram showing a peripheral chroma pixel area according to another embodiment of the present disclosure.

Figure 15 is an exemplary diagram showing a distortion histogram for each directional mode, according to an embodiment of the present disclosure.

16A and 16B are flowcharts showing an intra prediction method of a current chroma block according to an embodiment of the present disclosure.

Figure 17 is an exemplary diagram showing a luma pixel area within a corresponding luma area, according to an embodiment of the present disclosure.

Figure 18 is an exemplary diagram showing a luma pixel area according to another embodiment of the present disclosure.

19 to 21 are exemplary diagrams showing the distribution of neighboring blocks and prediction modes of the current chroma block, according to another embodiment of the present disclosure.

Figures 22 and 23 are exemplary diagrams showing the distribution of blocks and prediction modes included in the corresponding luma area, according to another embodiment of the present disclosure.

Figure 24 is an example diagram showing an intensity histogram for each directional mode according to another embodiment of the present disclosure.

Figure 25 is an example diagram showing the distribution of neighboring blocks and prediction modes of the current chroma block, according to another embodiment of the present disclosure.

Hereinafter, embodiments of the present invention will be described in detail with reference to the exemplary drawings. When adding reference numerals to components in each drawing, it should be noted that identical components are given the same reference numerals as much as possible even if they are shown in different drawings. Additionally, in describing the present embodiments, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present embodiments, the detailed description will be omitted.

1 is an example block diagram of a video encoding device that can implement the techniques of the present disclosure. Hereinafter, the video encoding device and its sub-configurations will be described with reference to the illustration in FIG. 1.

The image encoding device includes a picture division unit 110, a prediction unit 120, a subtractor 130, a transform unit 140, a quantization unit 145, a rearrangement unit 150, an entropy encoding unit 155, and an inverse quantization unit. It may be configured to include (160), an inverse transform unit (165), an adder (170), a loop filter unit (180), and a memory (190).

Each component of the video encoding device may be implemented as hardware or software, or may be implemented as a combination of hardware and software. Additionally, the function of each component may be implemented as software and a microprocessor may be implemented to execute the function of the software corresponding to each component.

One image (video) consists of one or more sequences including a plurality of pictures. Each picture is divided into a plurality of regions and encoding is performed for each region. For example, one picture is divided into one or more tiles and/or slices. Here, one or more tiles can be defined as a tile group. Each tile or/slice is divided into one or more Coding Tree Units (CTUs). And each CTU is divided into one or more CUs (Coding Units) by a tree structure. Information applied to each CU is encoded as the syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as the syntax of the CTU. Additionally, information commonly applied to all blocks within one slice is encoded as the syntax of the slice header, and information applied to all blocks constituting one or more pictures is a picture parameter set (PPS) or picture parameter set. Encoded in the header. Furthermore, information commonly referenced by multiple pictures is encoded in a sequence parameter set (SPS). And, information commonly referenced by one or more SPSs is encoded in a video parameter set (VPS). Additionally, information commonly applied to one tile or tile group may be encoded as the syntax of a tile or tile group header. Syntax included in the SPS, PPS, slice header, tile, or tile group header may be referred to as high level syntax.

The picture division unit 110 determines the size of the CTU (Coding Tree Unit). Information about the size of the CTU (CTU size) is encoded as SPS or PPS syntax and transmitted to the video decoding device.

The picture division unit 110 divides each picture constituting the image into a plurality of CTUs (Coding Tree Units) with a predetermined size, and then repeatedly divides the CTUs using a tree structure. (recursively) Divide. A leaf node in the tree structure becomes a coding unit (CU), the basic unit of encoding.

The tree structure is QuadTree (QT), in which the parent node is divided into four child nodes (or child nodes) of the same size, or BinaryTree, in which the parent node is divided into two child nodes. , BT), or a TernaryTree (TT) in which the parent node is divided into three child nodes in a 1:2:1 ratio, or a structure that mixes two or more of these QT structures, BT structures, and TT structures. there is. For example, a QuadTree plus BinaryTree (QTBT) structure may be used, or a QuadTree plus BinaryTree TernaryTree (QTBTTT) structure may be used. Here, BTTT may be combined and referred to as MTT (Multiple-Type Tree).

As shown in Figure 2, the CTU can first be divided into a QT structure. Quadtree splitting can be repeated until the size of the splitting block reaches the minimum block size (MinQTSize) of the leaf node allowed in QT. The first flag (QT_split_flag) indicating whether each node of the QT structure is split into four nodes of the lower layer is encoded by the entropy encoder 155 and signaled to the video decoding device. If the leaf node of QT is not larger than the maximum block size (MaxBTSize) of the root node allowed in BT, it may be further divided into either the BT structure or the TT structure. In the BT structure and/or TT structure, there may be multiple division directions. For example, there may be two directions in which the block of the node is divided: horizontally and vertically. As shown in Figure 2, when MTT splitting begins, a second flag (mtt_split_flag) indicates whether the nodes have been split, and if split, an additional flag indicating the splitting direction (vertical or horizontal) and/or the splitting type (Binary). Or, a flag indicating Ternary) is encoded by the entropy encoding unit 155 and signaled to the video decoding device.

Alternatively, prior to encoding the first flag (QT_split_flag) indicating whether each node is split into four nodes of the lower layer, a CU split flag (split_cu_flag) indicating whether the node is split is encoded. It could be. If the CU split flag (split_cu_flag) value indicates that it is not split, the block of the corresponding node becomes a leaf node in the split tree structure and becomes a CU (coding unit), which is the basic unit of coding. When the CU split flag (split_cu_flag) value indicates splitting, the video encoding device starts encoding from the first flag in the above-described manner.

When QTBT is used as another example of a tree structure, there are two types: a type that horizontally splits the block of the node into two blocks of the same size (i.e., symmetric horizontal splitting) and a type that splits it vertically (i.e., symmetric vertical splitting). Branches may exist. A split flag (split_flag) indicating whether each node of the BT structure is divided into blocks of a lower layer and split type information indicating the type of division are encoded by the entropy encoder 155 and transmitted to the video decoding device. Meanwhile, there may be an additional type that divides the block of the corresponding node into two asymmetric blocks. The asymmetric form may include dividing the block of the corresponding node into two rectangular blocks with a size ratio of 1:3, or may include dividing the block of the corresponding node diagonally.

A CU can have various sizes depending on the QTBT or QTBTTT division from the CTU. Hereinafter, the block corresponding to the CU (i.e., leaf node of QTBTTT) to be encoded or decoded is referred to as the 'current block'. Depending on the adoption of QTBTTT partitioning, the shape of the current block may be rectangular as well as square.

The prediction unit 120 predicts the current block and generates a prediction block. The prediction unit 120 includes an intra prediction unit 122 and an inter prediction unit 124.

In general, each current block in a picture can be coded predictively. Typically, prediction of the current block is done using intra prediction techniques (using data from the picture containing the current block) or inter prediction techniques (using data from pictures coded before the picture containing the current block). It can be done. Inter prediction includes both one-way prediction and two-way prediction.

The intra prediction unit 122 predicts pixels within the current block using pixels (reference pixels) located around the current block within the current picture including the current block. There are multiple intra prediction modes depending on the prediction direction. For example, as shown in FIG. 3A, the plurality of intra prediction modes may include two non-directional modes including a planar mode and a DC mode and 65 directional modes. The surrounding pixels and calculation formulas to be used are defined differently for each prediction mode.

For efficient directional prediction of the rectangular-shaped current block, the directional modes (67 to 80, -1 to -14 intra prediction modes) shown by dotted arrows in FIG. 3B can be additionally used. These may be referred to as “wide angle intra-prediction modes”. In Figure 3b, the arrows point to corresponding reference samples used for prediction and do not indicate the direction of prediction. The predicted direction is opposite to the direction indicated by the arrow. Wide-angle intra prediction modes are modes that perform prediction in the opposite direction of a specific directional mode without transmitting additional bits when the current block is rectangular. At this time, among the wide-angle intra prediction modes, some wide-angle intra prediction modes available for the current block may be determined according to the ratio of the width and height of the rectangular current block. For example, wide-angle intra prediction modes with angles smaller than 45 degrees (intra prediction modes 67 to 80) are available when the current block is in the form of a rectangle whose height is smaller than its width, and wide-angle intra prediction modes with angles larger than -135 degrees are available. Intra prediction modes (-1 to -14 intra prediction modes) are available when the current block has a rectangular shape with a width greater than the height.

The intra prediction unit 122 can determine the intra prediction mode to be used to encode the current block. In some examples, intra prediction unit 122 may encode the current block using multiple intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, the intra prediction unit 122 calculates rate-distortion values using rate-distortion analysis for several tested intra-prediction modes and has the best rate-distortion characteristics among the tested modes. You can also select intra prediction mode.

The intra prediction unit 122 selects one intra prediction mode from a plurality of intra prediction modes and predicts the current block using surrounding pixels (reference pixels) and an operation formula determined according to the selected intra prediction mode. Information about the selected intra prediction mode is encoded by the entropy encoding unit 155 and transmitted to the video decoding device.

The inter prediction unit 124 generates a prediction block for the current block using a motion compensation process. The inter prediction unit 124 searches for a block most similar to the current block in a reference picture that has been encoded and decoded before the current picture, and generates a prediction block for the current block using the searched block. Then, a motion vector (MV) corresponding to the displacement between the current block in the current picture and the prediction block in the reference picture is generated. Typically, motion estimation is performed on the luma component, and a motion vector calculated based on the luma component is used for both the luma component and the chroma component. Motion information including information about the reference picture and information about the motion vector used to predict the current block is encoded by the entropy encoding unit 155 and transmitted to the video decoding device.

The inter prediction unit 124 may perform interpolation on a reference picture or reference block to increase prediction accuracy. That is, subsamples between two consecutive integer samples are interpolated by applying filter coefficients to a plurality of consecutive integer samples including the two integer samples. If the process of searching for the block most similar to the current block is performed for the interpolated reference picture, the motion vector can be expressed with precision in decimal units rather than precision in integer samples. The precision or resolution of the motion vector may be set differently for each target area to be encoded, for example, slice, tile, CTU, CU, etc. When such adaptive motion vector resolution (AMVR) is applied, information about the motion vector resolution to be applied to each target area must be signaled for each target area. For example, if the target area is a CU, information about the motion vector resolution applied to each CU is signaled. Information about motion vector resolution may be information indicating the precision of a differential motion vector, which will be described later.

Meanwhile, the inter prediction unit 124 may perform inter prediction using bi-prediction. In the case of bidirectional prediction, two reference pictures and two motion vectors indicating the positions of blocks most similar to the current block within each reference picture are used. The inter prediction unit 124 selects the first reference picture and the second reference picture from reference picture list 0 (RefPicList0) and reference picture list 1 (RefPicList1), respectively, and searches for a block similar to the current block within each reference picture. Create a first reference block and a second reference block. Then, the first reference block and the second reference block are averaged or weighted to generate a prediction block for the current block. Then, motion information including information about the two reference pictures used to predict the current block and information about the two motion vectors is transmitted to the encoder 150. Here, reference picture list 0 may be composed of pictures before the current picture in display order among the restored pictures, and reference picture list 1 may be composed of pictures after the current picture in display order among the restored pictures. there is. However, it is not necessarily limited to this, and in terms of display order, relief pictures after the current picture may be additionally included in reference picture list 0, and conversely, relief pictures before the current picture may be additionally included in reference picture list 1. may be included.

Various methods can be used to minimize the amount of bits required to encode motion information.

For example, if the reference picture and motion vector of the current block are the same as the reference picture and motion vector of the neighboring block, the motion information of the current block can be transmitted to the video decoding device by encoding information that can identify the neighboring block. This method is called ‘merge mode’.

In the merge mode, the inter prediction unit 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as 'merge candidates') from neighboring blocks of the current block.

As shown in FIG. 4, the surrounding blocks for deriving merge candidates include the left block (A0), bottom left block (A1), top block (B0), and top right block (B1) adjacent to the current block in the current picture. ), and all or part of the upper left block (A2) can be used. Additionally, a block located within a reference picture (which may be the same or different from the reference picture used to predict the current block) rather than the current picture where the current block is located may be used as a merge candidate. For example, a block co-located with the current block within the reference picture or blocks adjacent to the co-located block may be additionally used as merge candidates. If the number of merge candidates selected by the method described above is less than the preset number, the 0 vector is added to the merge candidates.

The inter prediction unit 124 uses these neighboring blocks to construct a merge list including a predetermined number of merge candidates. A merge candidate to be used as motion information of the current block is selected from among the merge candidates included in the merge list, and merge index information is generated to identify the selected candidate. The generated merge index information is encoded by the encoder 150 and transmitted to the video decoding device.

Merge skip mode is a special case of merge mode. After performing quantization, when all transformation coefficients for entropy encoding are close to zero, only peripheral block selection information is transmitted without transmitting residual signals. By using merge skip mode, relatively high coding efficiency can be achieved in low-motion images, still images, screen content images, etc.

Hereinafter, merge mode and merge skip mode are collectively referred to as merge/skip mode.

Another method for encoding motion information is AMVP (Advanced Motion Vector Prediction) mode.

In AMVP mode, the inter prediction unit 124 uses neighboring blocks of the current block to derive predicted motion vector candidates for the motion vector of the current block. The surrounding blocks used to derive predicted motion vector candidates include the left block (A0), bottom left block (A1), top block (B0), and top right block adjacent to the current block in the current picture shown in FIG. B1), and all or part of the upper left block (A2) can be used. Additionally, a block located within a reference picture (which may be the same or different from the reference picture used to predict the current block) rather than the current picture where the current block is located will be used as a surrounding block used to derive prediction motion vector candidates. It may be possible. For example, a collocated block located at the same location as the current block within the reference picture or blocks adjacent to the block at the same location may be used. If the number of motion vector candidates is less than the preset number by the method described above, the 0 vector is added to the motion vector candidates.

The inter prediction unit 124 derives predicted motion vector candidates using the motion vectors of the neighboring blocks, and determines a predicted motion vector for the motion vector of the current block using the predicted motion vector candidates. Then, the predicted motion vector is subtracted from the motion vector of the current block to calculate the differential motion vector.

The predicted motion vector can be obtained by applying a predefined function (eg, median, average value calculation, etc.) to the predicted motion vector candidates. In this case, the video decoding device also knows the predefined function. In addition, since the neighboring blocks used to derive predicted motion vector candidates are blocks for which encoding and decoding have already been completed, the video decoding device also already knows the motion vectors of the neighboring blocks. Therefore, the video encoding device does not need to encode information to identify the predicted motion vector candidate. Therefore, in this case, information about the differential motion vector and information about the reference picture used to predict the current block are encoded.

Meanwhile, the predicted motion vector may be determined by selecting one of the predicted motion vector candidates. In this case, information for identifying the selected prediction motion vector candidate is additionally encoded, along with information about the differential motion vector and information about the reference picture used to predict the current block.

The subtractor 130 generates a residual block by subtracting the prediction block generated by the intra prediction unit 122 or the inter prediction unit 124 from the current block.

The transform unit 140 converts the residual signal in the residual block having pixel values in the spatial domain into transform coefficients in the frequency domain. The conversion unit 140 may convert the residual signals in the residual block by using the entire size of the residual block as a conversion unit, or divide the residual block into a plurality of subblocks and perform conversion by using the subblocks as a conversion unit. You may. Alternatively, the residual signals can be converted by dividing them into two subblocks, a transform area and a non-transformation region, and using only the transform region subblock as a transform unit. Here, the transformation area subblock may be one of two rectangular blocks with a size ratio of 1:1 based on the horizontal axis (or vertical axis). In this case, a flag indicating that only the subblock has been converted (cu_sbt_flag), directional (vertical/horizontal) information (cu_sbt_horizontal_flag), and/or position information (cu_sbt_pos_flag) are encoded by the entropy encoding unit 155 and signaled to the video decoding device. do. In addition, the size of the transform area subblock may have a size ratio of 1:3 based on the horizontal axis (or vertical axis), and in this case, a flag (cu_sbt_quad_flag) that distinguishes the corresponding division is additionally encoded by the entropy encoding unit 155 to encode the image. Signaled to the decryption device.

Meanwhile, the transformation unit 140 can separately perform transformation on the residual block in the horizontal and vertical directions. For transformation, various types of transformation functions or transformation matrices can be used. For example, a pair of transformation functions for horizontal transformation and vertical transformation can be defined as MTS (Multiple Transform Set). The conversion unit 140 may select a conversion function pair with the best conversion efficiency among MTSs and convert the residual blocks in the horizontal and vertical directions, respectively. Information (mts_idx) about the transformation function pair selected from the MTS is encoded by the entropy encoder 155 and signaled to the video decoding device.

The quantization unit 145 quantizes the transform coefficients output from the transform unit 140 using a quantization parameter, and outputs the quantized transform coefficients to the entropy encoding unit 155. The quantization unit 145 may directly quantize a residual block related to a certain block or frame without conversion. The quantization unit 145 may apply different quantization coefficients (scaling values) depending on the positions of the transform coefficients within the transform block. The quantization matrix applied to the quantized transform coefficients arranged in two dimensions may be encoded and signaled to the video decoding device.

The rearrangement unit 150 may rearrange coefficient values for the quantized residual values.

The rearrangement unit 150 can change a two-dimensional coefficient array into a one-dimensional coefficient sequence using coefficient scanning. For example, the realignment unit 150 can scan from DC coefficients to coefficients in the high frequency region using zig-zag scan or diagonal scan to output a one-dimensional coefficient sequence. . Depending on the size of the transformation unit and the intra prediction mode, a vertical scan that scans a two-dimensional coefficient array in the column direction or a horizontal scan that scans the two-dimensional block-type coefficients in the row direction may be used instead of the zig-zag scan. That is, the scan method to be used among zig-zag scan, diagonal scan, vertical scan, and horizontal scan may be determined depending on the size of the transformation unit and the intra prediction mode.

The entropy encoding unit 155 uses various encoding methods such as CABAC (Context-based Adaptive Binary Arithmetic Code) and Exponential Golomb to encode the one-dimensional quantized transform coefficients output from the reordering unit 150. A bitstream is created by encoding the sequence.

In addition, the entropy encoder 155 encodes information such as CTU size, CU split flag, QT split flag, MTT split type, and MTT split direction related to block splitting, so that the video decoding device can encode blocks in the same way as the video coding device. Allow it to be divided. In addition, the entropy encoding unit 155 encodes information about the prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and generates intra prediction information (i.e., intra prediction) according to the prediction type. Information about the mode) or inter prediction information (coding mode of motion information (merge mode or AMVP mode), merge index in case of merge mode, information on reference picture index and differential motion vector in case of AMVP mode) is encoded. Additionally, the entropy encoding unit 155 encodes information related to quantization, that is, information about quantization parameters and information about the quantization matrix.

The inverse quantization unit 160 inversely quantizes the quantized transform coefficients output from the quantization unit 145 to generate transform coefficients. The inverse transform unit 165 restores the residual block by converting the transform coefficients output from the inverse quantization unit 160 from the frequency domain to the spatial domain.

The addition unit 170 restores the current block by adding the restored residual block and the prediction block generated by the prediction unit 120. Pixels in the restored current block are used as reference pixels when intra-predicting the next block.

The loop filter unit 180 restores pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, etc. that occur due to block-based prediction and transformation/quantization. Perform filtering on them. The filter unit 180 is an in-loop filter and may include all or part of a deblocking filter 182, a Sample Adaptive Offset (SAO) filter 184, and an Adaptive Loop Filter (ALF) 186. .

The deblocking filter 182 filters the boundaries between restored blocks to remove blocking artifacts caused by block-level encoding/decoding, and the SAO filter 184 and alf(186) perform deblocking filtering. Additional filtering is performed on the image. The SAO filter 184 and alf 186 are filters used to compensate for the difference between the restored pixel and the original pixel caused by lossy coding. The SAO filter 184 improves not only subjective image quality but also coding efficiency by applying an offset in units of CTU. In comparison, the ALF 186 performs filtering on a block basis, distinguishing the edge and degree of change of the block and applying different filters to compensate for distortion. Information about filter coefficients to be used in ALF may be encoded and signaled to a video decoding device.

The restored block filtered through the deblocking filter 182, SAO filter 184, and ALF 186 is stored in the memory 190. When all blocks in one picture are reconstructed, the reconstructed picture can be used as a reference picture for inter prediction of blocks in the picture to be encoded later.

Figure 5 is an example block diagram of a video decoding device that can implement the techniques of the present disclosure. Hereinafter, the video decoding device and its sub-configurations will be described with reference to FIG. 5.

The image decoding device includes an entropy decoding unit 510, a rearrangement unit 515, an inverse quantization unit 520, an inverse transform unit 530, a prediction unit 540, an adder 550, a loop filter unit 560, and a memory ( 570).

Like the video encoding device of FIG. 1, each component of the video decoding device may be implemented as hardware or software, or may be implemented as a combination of hardware and software. Additionally, the function of each component may be implemented as software and a microprocessor may be implemented to execute the function of the software corresponding to each component.

The entropy decoder 510 decodes the bitstream generated by the video encoding device, extracts information related to block division, determines the current block to be decoded, and provides prediction information and residual signals needed to restore the current block. Extract information, etc.

The entropy decoder 510 extracts information about the CTU size from a Sequence Parameter Set (SPS) or Picture Parameter Set (PPS), determines the size of the CTU, and divides the picture into CTUs of the determined size. Then, the CTU is determined as the highest layer of the tree structure, that is, the root node, and the CTU is divided using the tree structure by extracting the division information for the CTU.

For example, when dividing a CTU using the QTBTTT structure, first extract the first flag (QT_split_flag) related to the division of the QT and split each node into four nodes of the lower layer. And, for the node corresponding to the leaf node of QT, extract the second flag (MTT_split_flag) and split direction (vertical / horizontal) and/or split type (binary / ternary) information related to the split of MTT and split the corresponding leaf node into MTT. Split into structures. Accordingly, each node below the leaf node of QT is recursively divided into a BT or TT structure.

As another example, when splitting a CTU using the QTBTTT structure, first extract the CU split flag (split_cu_flag) indicating whether to split the CU, and if the corresponding block is split, extract the first flag (QT_split_flag). It may be possible. During the division process, each node may undergo 0 or more repetitive MTT divisions after 0 or more repetitive QT divisions. For example, MTT division may occur immediately in the CTU, or conversely, only multiple QT divisions may occur.

As another example, when dividing a CTU using the QTBT structure, the first flag (QT_split_flag) related to the division of the QT is extracted and each node is divided into four nodes of the lower layer. And, for the node corresponding to the leaf node of QT, a split flag (split_flag) indicating whether to further split into BT and split direction information are extracted.

Meanwhile, when the entropy decoding unit 510 determines the current block to be decoded using division of the tree structure, it extracts information about the prediction type indicating whether the current block is intra-predicted or inter-predicted. When prediction type information indicates intra prediction, the entropy decoder 510 extracts syntax elements for intra prediction information (intra prediction mode) of the current block. When prediction type information indicates inter prediction, the entropy decoder 510 extracts syntax elements for inter prediction information, that is, information indicating a motion vector and a reference picture to which the motion vector refers.

Additionally, the entropy decoding unit 510 extracts information about quantized transform coefficients of the current block as quantization-related information and information about the residual signal.

The reordering unit 515 re-organizes the sequence of one-dimensional quantized transform coefficients entropy decoded in the entropy decoding unit 510 into a two-dimensional coefficient array (i.e., in reverse order of the coefficient scanning order performed by the image encoding device). block).

The inverse quantization unit 520 inversely quantizes the quantized transform coefficients and inversely quantizes the quantized transform coefficients using a quantization parameter. The inverse quantization unit 520 may apply different quantization coefficients (scaling values) to quantized transform coefficients arranged in two dimensions. The inverse quantization unit 520 may perform inverse quantization by applying a matrix of quantization coefficients (scaling values) from an image encoding device to a two-dimensional array of quantized transform coefficients.

The inverse transform unit 530 inversely transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to restore the residual signals, thereby generating a residual block for the current block.

In addition, when the inverse transformation unit 530 inversely transforms only a partial area (subblock) of the transformation block, a flag (cu_sbt_flag) indicating that only the subblock of the transformation block has been transformed, and directionality (vertical/horizontal) information of the subblock (cu_sbt_horizontal_flag) ) and/or extracting the position information (cu_sbt_pos_flag) of the subblock, and inversely transforming the transformation coefficients of the corresponding subblock from the frequency domain to the spatial domain to restore the residual signals, and for the area that has not been inversely transformed, a “0” value is used as the residual signal. By filling , the final residual block for the current block is created.

In addition, when MTS is applied, the inverse transform unit 530 determines a transformation function or transformation matrix to be applied in the horizontal and vertical directions, respectively, using the MTS information (mts_idx) signaled from the video encoding device, and uses the determined transformation function. Inverse transformation is performed on the transformation coefficients in the transformation block in the horizontal and vertical directions.

The prediction unit 540 may include an intra prediction unit 542 and an inter prediction unit 544. The intra prediction unit 542 is activated when the prediction type of the current block is intra prediction, and the inter prediction unit 544 is activated when the prediction type of the current block is inter prediction.

The intra prediction unit 542 determines the intra prediction mode of the current block among a plurality of intra prediction modes from the syntax elements for the intra prediction mode extracted from the entropy decoder 510, and provides a reference around the current block according to the intra prediction mode. Predict the current block using pixels.

The inter prediction unit 544 uses the syntax elements for the inter prediction mode extracted from the entropy decoder 510 to determine the motion vector of the current block and the reference picture to which the motion vector refers, and uses the motion vector and the reference picture to determine the motion vector of the current block. Use it to predict the current block.

The adder 550 restores the current block by adding the residual block output from the inverse transform unit and the prediction block output from the inter prediction unit or intra prediction unit. Pixels in the restored current block are used as reference pixels when intra-predicting a block to be decoded later.

The loop filter unit 560 may include a deblocking filter 562, a SAO filter 564, and an ALF 566 as an in-loop filter. The deblocking filter 562 performs deblocking filtering on the boundaries between restored blocks to remove blocking artifacts that occur due to block-level decoding. The SAO filter 564 and the ALF 566 perform additional filtering on the reconstructed block after deblocking filtering to compensate for the difference between the reconstructed pixel and the original pixel caused by lossy coding. The filter coefficient of ALF is determined using information about the filter coefficient decoded from the non-stream.

The restored block filtered through the deblocking filter 562, SAO filter 564, and ALF 566 is stored in the memory 570. When all blocks in one picture are reconstructed, the reconstructed picture is later used as a reference picture for inter prediction of blocks in the picture to be encoded.

This embodiment relates to encoding and decoding of images (videos) as described above. More specifically, in intra prediction of the current chroma block, a first predictor of the current chroma block is generated according to CCLM prediction, and a second predictor of the current chroma block is additionally generated based on the reconstructed neighboring pixels. , Provides a video coding method and device for weightedly combining a first predictor and a second predictor.

The following embodiments may be performed by the intra prediction unit 122 in a video encoding device. Additionally, it may be performed by the intra prediction unit 542 in a video decoding device.

The video encoding device may generate signaling information related to this embodiment in terms of bit rate distortion optimization when predicting the current block. The video encoding device can encode the video using the entropy encoding unit 155 and then transmit it to the video decoding device. The video decoding device can decode signaling information related to prediction of the current block from the bitstream using the entropy decoding unit 510.

In the following description, the term 'target block' to be encoded/decoded may be used in the same sense as the current block or coding unit (CU) as described above, or a partial region of the coding unit. It may mean.

Hereinafter, the target block includes a luma block including a luma component and a chroma block including a chroma component. The chroma block of the target block is expressed as the target chroma block or the current chroma block. The luma block of the target block is expressed as the target luma block or the current luma block.

Additionally, the aspect ratio of a block is defined as the horizontal length of the block divided by the vertical length.

I. CCLM(Cross-component Linear Model) 예측I. CCLM (Cross-component Linear Model) prediction

In the VVC technology, the intra prediction mode of the luma block has fine-grained directional modes (i.e., 2 to 66) in addition to the undirectional modes (i.e., Planar and DC), as illustrated in FIG. 3A. Additionally, as added to the example of FIG. 3B, the intra prediction mode of the luma block has directional modes (-14 to -1 and 67 to 80) according to wide-angle intra prediction.

Meanwhile, depending on the prediction direction used by the luma block, the chroma block can also use intra prediction in this granular directional mode to a limited extent. However, in intra prediction of a chroma block, various directional modes other than the horizontal and vertical directions that the luma block can use cannot always be used. To be able to use these various directional modes, the prediction mode of the current chroma block must be set to DM mode. By setting it to DM mode in this way, the current chroma block can use an orientation mode other than the horizontal and vertical of the luma block.

When encoding a chroma block, the most frequently used intra prediction modes or to maintain image quality include Planar, DC, Vertical, Horizontal, and DM modes. At this time, in DM mode, the intra prediction mode of the luma block spatially corresponding to the current chroma block is used as the intra prediction mode of the chroma block.

The video encoding device can signal to the video decoding device whether the intra prediction mode of the chroma block is DM mode. At this time, there may be several ways to transmit the DM mode to the video decoding device. For example, the video encoding device can indicate whether it is in DM mode by setting intra_chroma_pred_mode, which is information for indicating the intra prediction mode of a chroma block, to a specific value and then transmitting it to the video decoding device.

When the chroma block is encoded in intra prediction mode, the intra prediction unit 542 of the video decoding device determines the intra prediction mode of the chroma block according to Table 1. IntraPredModeC can be set.

Hereinafter, in order to distinguish intra_chroma_pred_mode and IntraPredModeC, which are information related to the intra prediction mode of a chroma block, they are expressed as a chroma intra prediction mode indicator and a chroma intra prediction mode, respectively.

Here, lumaIntraPredMode is the intra prediction mode of the luma block corresponding to the current chroma block (hereinafter referred to as 'luma intra prediction mode'). lumaIntraPredMode represents one of the prediction modes illustrated in FIG. 3A. For example, in Table 1, lumaIntraPredMode = 0 indicates Planar prediction mode, and lumaIntraPredMode = 1 indicates DC prediction mode. lumaIntraPredMode of 18, 50, and 66 indicates the directional modes referred to as horizontal, vertical, and VDIA, respectively. Meanwhile, when intra_chroma_pred_mode = 0, 1, 2, and 3, planar, vertical, horizontal, and DC prediction modes are indicated, respectively. The case of intra_chroma_pred_mode = 4 is the DM mode, and the IntraPredModeC value, which is the chroma intra prediction mode, is set equal to the lumaIntraPredMode value.

Hereinafter, this embodiment will be described focusing on the parsing of encoded information by the video decoding device. However, for convenience of explanation, if necessary, the video encoding device will be mentioned. Nevertheless, most of the embodiments described below can be applied equally or similarly to video encoding devices. Meanwhile, the video encoding device determines encoding information in terms of bit rate distortion optimization. Afterwards, the video encoding device encodes them to generate a bitstream and then signals it to the video decoding device. Additionally, the video encoding device can obtain encoding information from a higher level and proceed with the subsequent encoding process.

When performing prediction in a video encoding/decoding device, a method of generating a prediction block of the current block from a color component different from the color component of the target block to be currently encoded and decoded is called cross-component prediction. ) is defined. In VVC technology, cross-component prediction is performed using the linear relationship between chroma pixels and corresponding luma pixels to intra-predict the current chroma block, which is called CCLM (Cross-component Linear Model) prediction. Below, CCLM prediction is described.

First, the process of parsing the intra prediction mode of the current chroma block performed by the video decoding device is shown in Table 2.

The video decoding device parses cclm_mode_flag, which indicates whether to use CCLM prediction mode. If cclm_mode_flag is 1 and CCLM mode is used, the video decoding device parses cclm_mode_idx and parses the index of CCLM mode. At this time, depending on the value of cclm_mode_idx, the CCLM mode may indicate one of three modes. On the other hand, when cclm_mode_flag is 0 and CCLM mode is not used, the video decoding device parses intra_chroma_pred_mode indicating intra prediction mode, as described above.

When the CCLM mode is applied for intra prediction of the current chroma block, the image decoding device determines the area in the luma image corresponding to the current chroma block (hereinafter, 'corresponding luma area'). For prediction of a linear model, left reference pixels and top reference pixels of the corresponding luma area, and left reference pixels and top reference pixels of the target chroma block may be used. Hereinafter, the left reference pixels and the top reference pixels are integrated into reference pixels and surrounding pixels. Or, it is expressed by adjacent pixels. Additionally, reference pixels of the chroma component are indicated as chroma reference pixels, and reference pixels of the luma component are indicated as luma reference pixels. Meanwhile, in the example of FIG. 6, the size of the chroma block, that is, the number of pixels, is expressed as N×N (where N is a natural number).

In CCLM prediction, a linear model is derived between the reference pixels of the luma area and the reference pixels of the chroma block, and then the linear model is applied to the restored pixels of the corresponding luma area, thereby acting as a predictor of the target chroma block. A prediction block is created. For example, as illustrated in FIG. 6, four pairs of pixels combining pixels in the surrounding pixel line of the current chroma block and pixels in the corresponding luma area can be used to derive a linear model. The image decoding device may derive α and β representing a linear model for four pairs of pixels, as shown in Equation 1.

Here, for the corresponding luma pixels among the four pairs of pixels, X _a and X _b each represent the average value of the two minimum values and the average value of the two maximum values. Additionally, for chroma pixels, Y _a and Y _b each represent the average value of two minimum values and the average value of two maximum values. Afterwards, the image decoding device generates a predictor pred _C (i,j) of the current chroma block from the pixel value rec' _L (i,j) of the corresponding luma area using a linear model, as shown in Equation 2. can do.

Before applying the linear model, the image decoding device checks whether the size of the corresponding luma area is the same as the size of the current chroma block. If the sizes between the two are different depending on the subsampling method of the chroma channel, the video decoding device can adjust the size of the corresponding luma area to be the same as the size of the current chroma block by applying downsampling to the corresponding luma area.

Meanwhile, as described above, the CCLM mode is divided into three modes: CCLM_LT, CCLM_L, and CCLM_T, depending on the positions of surrounding pixels used in the derivation process of the linear model. As illustrated in FIG. 6, the CCLM_LT mode uses two pixels in each direction among the surrounding pixels adjacent to the left and top of the current chroma block. CCLM_L uses 4 pixels from surrounding pixels adjacent to the left of the current chroma block. Lastly, CCLM_T uses four pixels from among the surrounding pixels adjacent to the top of the current chroma block.

II. 본 실시예에 따른 현재 크로마 블록의 인트라 예측II. Intra prediction of current chroma block according to this embodiment

For intra prediction of the chroma channel, the video decoding device may use a method of generating a predictor using information (①) of the corresponding luma area, or a method of generating a predictor using information (②) of the same channel. In VVC technology, there are various techniques for each method, and these techniques are divided into prediction modes. Additionally, the predictor generation method can be specified by indicating the prediction mode. Hereinafter, setting the predictor generation method is described as setting the prediction mode. Hereinafter, generating a predictor using information (①) of the corresponding luma area is expressed as 'cross component prediction', and the method is expressed as 'cross component prediction mode' or 'cross component prediction method'. In addition, generating a predictor using information (②) of the same channel is expressed as 'same-channel prediction', and the method is expressed as 'same-channel prediction mode' or 'same-channel prediction method'.

For example, among the intra prediction methods of the chroma channel in VVC technology, the cross component prediction method using information (①) of the corresponding luma area includes the CCLM mode as described above. In addition, as a cross component prediction method, a method of deriving multiple linear models between the corresponding luma area and the current chroma block and predicting using them, a gradient value (i.e., a change value) based on the pixel value instead of the luma pixel value of the corresponding position ), a method of deriving a linear model using and predicting using it, a method of predicting using many-to-one matching that also uses the luma pixel corresponding to the same position and its surrounding pixel values when predicting one pixel value of the current chroma block, etc. there is.

Meanwhile, among intra prediction methods for chroma channels, co-channel prediction methods that use information (②) of the same channel include planar, DC, and directional modes. In addition, co-channel prediction methods include technologies such as ISP (Intra Sub Partition), MIP (Matrix-weighted Intra Prediction), and MRL (Multiple Reference Line). In addition, a method of predicting by inferring the directional or non-directional mode from several reference lines around the current block, calculating a weight based on the distance between the pixel in the corresponding luma area and the pixel around the block, and then using this weight to calculate the current A method of predicting by weighting the pixels in a chroma block and the surrounding chroma pixels can also be a co-channel prediction method.

Meanwhile, when generating a predictor of a chroma block using information in the luma area corresponding to the chroma block, such as in CCLM prediction, there is a problem that information on surrounding pixels of the current chroma block is not used during the predictor generation process. This is because in the conventional technology, a predictor is generated using only one of the information of the corresponding luma area (①) and the information of the same channel (②). In addition, depending on the chroma channel subsampling method, there is a possibility that the information (①) of the corresponding luma area may be less important than the information (②) of the same channel, such as surrounding pixels in the current channel. Accordingly, discontinuity may occur between the predictor generated in CCLM mode and adjacent surrounding pixels. This problem of existing technology can be solved by considering surrounding pixel information of the current channel when predicting according to CCLM mode. This means that in addition to using the information in ①, the prediction is performed using the information in ②. Alternatively, this problem of the existing technology can be solved by additionally using luma area information when making predictions using information on surrounding pixels within the same channel (for example, when performing directional or non-directional intra prediction). This means that in addition to using the information in ②, the prediction is performed using the information in ①.

In order to cope with the problem of using only one of the information of the corresponding luma area and the information of the same channel as described above, the intra prediction unit 542 in the video decoding device according to this embodiment performs first prediction based on CCLM mode. A predictor of the current chroma block is generated by weightedly combining the second predictor additionally generated based on the ruler and the intra prediction mode. Here, the CCLM mode uses information (①) of the corresponding luma area, and the intra prediction mode uses information (②) of the same channel. The intra prediction unit 542 according to this embodiment includes all or part of an input unit 802, a first predictor generator 804, a second predictor generator 806, and a weighted summer 808. Meanwhile, the intra prediction unit 122 in the video encoding device may also include the same components.

The input unit 802 according to this embodiment acquires a CCLM mode for CCLM prediction of the current chroma block. Alternatively, the input device 802 may obtain a cross-component prediction mode for cross-component prediction of the current chroma block.

The first predictor generator 804 performs CCLM prediction based on CCLM mode to generate a first predictor of the current chroma block. Alternatively, the first predictor generator 804 may generate the first predictor of the current chroma block by performing cross-component prediction based on the cross-component prediction mode.

The second predictor generator 806 generates a second predictor of the current chroma block based on an intra prediction mode using neighboring pixels. That is, the second predictor generator 806 generates the second predictor based on the same-channel prediction mode using the same-channel information.

The weighted summer 808 generates an intra predictor of the current chroma block by weightedly combining the first predictor and the second predictor using a weight.

At this time, the image decoding device may weightly combine the first predictor and the second predictor using weights as shown in Equation 3.

Here, (i,j) represents the position of the pixel, and pred _C (i,j) represents the intra predictor of the current chroma block. pred _CCLM (i,j) represents the first predictor, pred _intra (i,j) represents the second predictor, and w _CCLM (i,j) represents the weight. As described above, pred _CCLM (i,j) represents a predictor based on CCLM prediction, but may comprehensively represent a predictor based on cross-component prediction.

Hereinafter, the second predictor and ‘additional predictor’ are used interchangeably. If there are multiple (e.g., n) additional predictors, another pred _intra is added to Equation 3, and the weights are also divided and distributed for each additional predictor within 1-w _CCLM (i,j). You can.

Meanwhile, in Equation 3, the weight is expressed based on w _CCLM , but depending on the embodiment, it may be implemented based on w _intra as in Equation 4.

The second predictor according to the co-channel prediction mode and the first predictor according to the CCLM mode can be weighted and combined as shown in Equation 4. In Equation 4, the predictor according to the co-channel prediction mode may be called the first predictor, and the predictor according to the CCLM mode may be called the second predictor. At this time, similar to the illustration in FIG. 8, the intra prediction mode for generating the first predictor using the same channel information is parsed, and the CCLM mode for generating the second predictor is inferred using the information in the corresponding luma area. It can be. Therefore, depending on the implementation, it should be understood that the first predictor and the second predictor may include both the cases shown in Equation 3 and Equation 4.

Hereinafter, for convenience, unless otherwise specified, the predictor according to the CCLM mode will be referred to as the first predictor, the predictor according to the intra prediction mode using surrounding pixel information will be referred to as the second predictor, and w _CCLM shown in Equation 3 is The weight expressed as a standard is used. Additionally, as illustrated in FIG. 8, the CCLM prediction mode for generating the first predictor may be parsed and the intra prediction mode for generating the second predictor may be inferred.

Hereinafter, the term 'adjacent' refers to the case where two objects are spatially in contact, and the term 'periphery', including the meaning of 'adjacent', refers to the spatial meaning that one object exists within a certain distance from another object. . If channel information is to be displayed, this is specified in the context. The temporal meaning of 'surrounding' is not separately mentioned, but subsequent realization examples can be realized at corresponding positions in other frames.

The video decoding device can independently infer an intra prediction mode using neighboring pixels, or use the prediction mode transmitted on the bitstream by the video encoding device. Additionally, the video decoding device may independently infer a method of weighted combining the first predictor and the second predictor, or use a method transmitted on a bitstream by the video encoding device. Methods for inferring/transmitting the intra prediction mode and methods for inferring/transmitting weights can be combined in various ways. For example, the intra prediction mode can be inferred by a video decoding device, and the weighted combining method can be transmitted through a bitstream. Conversely, the intra prediction mode is transmitted through a bitstream and the weighted combining method can be inferred by the video decoding device. Below, preferred embodiments of these various combinations are described.

<Realization Example 1> Inferring the method of generating a predictor to be weighted

This implementation infers a method of generating a second predictor (pred _intra ) using neighboring pixels of the same channel according to Equation 3. According to this implementation, the video decoding device can set the preset prediction mode as the prediction mode of the second predictor without explicitly receiving a signal about the prediction mode of the second predictor from the video encoding device. Alternatively, the image decoding device may set the width/height/area/aspect ratio/prediction mode/position/number/distance to the current chroma block of the surrounding chroma blocks of the current chroma block, and the value/position/number/up to the current chroma block of the surrounding chroma pixels. At least one of the distance, width/height/area/aspect ratio/prediction mode/position/number of blocks included in the corresponding luma area and its surrounding blocks, and value/position/number of luma pixels in and around the corresponding luma area. Based on the information, at least one prediction mode of the second predictor can be inferred. Here, blocks included in the corresponding luma area are defined as blocks in which all or part of the block is included in the corresponding luma area.

Meanwhile, when this implementation follows Equation 4, as described above, the same-channel prediction mode for generating the intra predictor (pred _intra ) is parsed, and the predictor (pred _CCLM ) using information in the corresponding luma area is parsed. The creation method can be inferred. According to this implementation, the video decoding device can set the preset prediction mode as the prediction mode of pred _CCLM without explicitly receiving a signal about the prediction mode of pred _CCLM from the video encoding device. Alternatively, the image decoding device may use the width/height/area/aspect ratio/prediction mode/position/number/distance to the current chroma block of the current chroma blocks, the value/position/number/distance to the current chroma block of the surrounding chroma pixels, and the corresponding Based on at least one information of the width/height/area/aspect ratio/prediction mode/position/number of blocks included in the luma area and the surrounding blocks, and the value/position/number of luma pixels in and around the corresponding luma area. At least one prediction mode of pred _CCLM can be inferred.

<Implementation Example 1-1> Setting a predefined prediction mode as the prediction mode of the second predictor

In this implementation, according to Equation 3, the video decoding device sets a predefined prediction mode as the prediction mode of the second predictor (pred _intra ). At this time, the available prediction mode may be a mode that generates a predictor based on surrounding pixels, such as the 67 intra prediction modes (IPM), matrix-weighted intra prediction (MIP) mode, etc. illustrated in FIG. 3A. there is. For example, when using one second predictor (i.e., when n is 1), the prediction mode of the second predictor may be Planar mode. Therefore, by applying Equation 3, the image decoding device can generate a predictor of the current chroma block as shown in Equation 5.

As another example, when two additional predictors are used (i.e., when n is 2), the prediction mode of each additional predictor may be Planar mode and DC mode. Accordingly, the image decoding device can generate a predictor of the current chroma block as shown in Equation 6.

Here, the weights of additional predictors satisfy w ₁ +w ₂ =1. In the case of this implementation, the preset prediction mode is called 'representative mode', which will be explained later.

Meanwhile, when this implementation follows Equation 4, the prediction mode of the second predictor (pred _CCLM ) using information of the corresponding luma area may be preset. This mode may be at least one of the cross-component prediction modes described above. At this time, the same-channel prediction mode for generating a predictor using the same-channel information can be parsed.

<Implementation Example 1-2> Using restored chroma information around the current chroma block

In this implementation, the image decoding device uses a prediction mode derived using information such as the restored chroma information around the current chroma block, that is, the value/position/number/distance to the current chroma block of pixels around the current chroma block, etc. (hereinafter referred to as 'representative mode') is set as the prediction mode of the second predictor. At this time, the number of representative modes derived by the video decoding device depends on the number of additional predictors that are weighted.

Meanwhile, the representative mode setting method according to this implementation can be applied when the second predictor is pred _CCLM or pred _intra . Hereinafter, the derivation of the representative mode for the case where the second predictor is pred _intra is described.

The video decoding device can use one of the following methods as a method for deriving the representative mode.

As a first method, the prediction mode with the highest intensity among prediction modes derived from the values of surrounding pixels of the current chroma block using an edge detection filter may be set as the representative mode. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in DIMD (Decoder-side Intra Mode Derivation) technology as follows. First, as in the example of FIG. 9, a 'surrounding chroma pixel area' is set including pixels surrounding the current chroma block. At this time, the peripheral chroma pixel area may be set in various ways depending on the embodiment other than the example in FIG. 9.

The video decoding device applies an edge detection filter such as a Sobel filter, Prewitt filter, Robert cross filter, etc. to the determined surrounding chroma pixel area, as shown in the example of FIG. 9. The video decoding device calculates the gradient for each pixel in the region and replaces it with the directional mode of intra prediction. The video decoding device may derive the intensity (I) for the corresponding directional mode based on the size of the gradient value and then generate an intensity histogram by accumulating the intensities for each directional mode as shown in the example of FIG. 10. In the example of FIG. 10, I _IPM represents the intensity of each directional mode. At this time, the video decoding device may use mode 19, which is the directional mode with the highest intensity in the histogram illustrated in FIG. 10, as a representative mode.

When inferring multiple representative modes, the directional modes with the next highest intensity after mode 19 in the histogram can be used as representative modes in order of size.

If the intensities of the directional modes are the same, a representative mode can be derived by specifying priorities. At this time, as a priority, a determined order such as {horizontal mode, vertical mode, mode 66, ...}, ascending order, or descending order may be used. For example, when priority is given in descending order. , In the example of Figure 11, the index of mode 22 is greater than that of mode 21, so mode 22 has a higher priority. Therefore, the video decoding device can derive mode 22 as the representative mode.

In addition to the above example, when the intensities are the same, the representative mode can be inferred according to predefined rules.

As another example, in order to derive a plurality of representative modes, the video decoding device divides the surrounding pixels into surrounding chroma pixel areas on the left and top of the current chroma block, as shown in the example of FIG. By calculating the histogram, the directional mode with the highest intensity can be used as the representative mode.

For example, when the intensity histogram derived from the upper and left peripheral pixel areas is the same as the example in FIGS. 13A and 13B, the image decoding device uses mode 19 derived from the histogram illustrated in FIG. 13A and the example illustrated in FIG. 13B. Mode 22 derived from the histogram can be used as a representative mode.

Meanwhile, as described above, since the prediction mode that can be inferred using the edge detection filter is limited to the directional prediction mode, this implementation may be limited to the case where the second predictor is pred _intra according to Equation 3.

As a second method, the distortion of each prediction mode can be compared using information on surrounding pixels of the current chroma block, and then the prediction mode with the smallest distortion can be set as the representative mode. To this end, the video decoding device can borrow the method used to derive the prediction mode from template-based intra prediction mode derivation (TIMD) technology as follows. First, the surrounding chroma pixel area of the current chroma block is set as shown in the example in FIG. 14. The image decoding device generates a predictor of the surrounding chroma pixel area by performing prediction for each prediction mode candidate using the surrounding pixels on the left and top of the corresponding area. In the example of FIG. 14, the surrounding pixels referred to when generating the predictor are limited to neighboring pixels adjacent to the set peripheral chroma pixel area, but in addition, neighboring pixels slightly distant from the set area may be used. Additionally, the surrounding chroma pixel area may be set in various ways depending on the embodiment.

Meanwhile, when the second predictor is pred _intra , prediction mode candidates may be prediction modes that generate a predictor based on surrounding pixels, such as IPM (Intra Prediction Mode) or MIP mode. Additionally, when the second predictor is pred _CCLM , the prediction mode candidates may be one of the cross-component prediction modes, which may be a prediction mode that generates a predictor based on information in the corresponding luma area.

Afterwards, the image decoding device calculates the distortion (D) between the predictor of the surrounding chroma pixel area according to each prediction mode candidate and the restored pixel values of the area. At this time, various image similarity measurement methods such as Mean Square Error (MSE), Sum of Absolute Differences (SAD), and Sum of Absolute Transformed Differences (SATD) may be used. The video decoding device uses the prediction mode with the smallest distortion among prediction mode candidates as the representative mode. For example, when a distortion (D _IPM ) histogram is generated for prediction mode candidates as shown in the example of FIG. 15, mode 50 has the smallest distortion. Therefore, the video decoding device can use mode 50 as a representative mode. When multiple representative modes are derived or the distortion of some prediction mode candidates is the same, the selection criteria for the representative mode are replaced in order of increasing distortion size, and then the first method described above can be equally applied.

In order to derive a representative mode from the values of surrounding pixels of the current chroma block, the representative mode can be derived using various methods in addition to the two examples described above.

Hereinafter, using the illustrations of FIGS. 16A and 16B, an intra prediction method of the current chroma block using the weighted sum of the first predictor and the second predictor will be described.

First, when the first predictor is pred _CCLM as shown in Equation 3, the intra prediction method of the current chroma block is described according to the example of FIG. 16A.

The video decoding device parses cclm_mode_idx (S1600). By parsing the index cclm_mode_idx, obtain the CCLM mode to apply to the current chroma block. Alternatively, the video decoding device can parse the index and obtain a cross-component prediction mode to apply to the current chroma block. Meanwhile, cclm_mode_idx or cross component prediction mode can be determined by the video encoding device in terms of optimizing encoding efficiency.

The video decoding device generates the first predictor (pred _CCLM ) by performing the existing getCclmPred() function using the parsed CCLM mode as input (S1602). Alternatively, the video decoding device may generate a first predictor using the parsed cross-component prediction mode as an input.

The video decoding device performs the getExtraIntraMode() function to infer the representative mode (S1604). Hereinafter, the getExtraIntraMode() function is called 'representative mode derivation function' or 'derivation function' for short.

The video decoding device generates a second predictor (pred _intra ) by using the representative mode as an input and performing the existing getIntraPred() function (S1606).

The image decoding device generates a predictor (pred _C ) of the current chroma block by weighting the first predictor and the second predictor (S1608).

Next, when the first predictor is pred _intra as shown in Equation 4, the intra prediction method of the current chroma block is described according to the example of FIG. 16B.

The video decoding device parses intra_chroma_pred_mode (S1620). By parsing the index intra_chroma_pred_mode, obtain the intra prediction mode to apply to the current chroma block. Meanwhile, intra_chroma_pred_mode may be determined by the video encoding device in terms of optimizing encoding efficiency.

The video decoding device generates the first predictor (pred _intra ) by performing the existing getIntraPred() function using the parsed intra mode as input (S1622).

The video decoding device performs the getExtraIntraMode() function to infer the representative mode (S1624).

The video decoding device generates a second predictor (pred _CCLM ) by using the representative mode as an input and performing the existing getCclmPred() function (S1626).

The image decoding device generates a predictor (pred _C ) of the current chroma block by weighting the first predictor and the second predictor (S1628).

At this time, the derivation function getExtraIntraMode(), which infers the representative mode according to the various representative mode derivation methods of Realization Example 1-2, can be implemented in various ways. Hereinafter, the operation of getExtraIntraMode() will be described based on the example of FIG. 16A, but can be equally described based on the example of FIG. 16B. The upper-left pixel coordinates (x0, y0), width, and height of the current chroma block can be provided as basic input to getExtraIntraMode(), which implements this implementation.

Hereinafter, the operation of the derived function getExtraIntraMode(x0, y0, width, height) according to the first method of Realization Example 1-2 will be described.

The derivation function sets the surrounding chroma pixel area from which to derive the representative mode based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block.

The derivation function creates an intensity histogram as follows.

That is, the derivation function generates an intensity histogram in the form of a list with a length equal to the number of preset directional mode candidates and then initializes it to 0. The derivation function positions the filter within the surrounding chroma pixel area according to the size of the preset filter and derives the slope value and intensity of the pixel area whose positions overlap with the filter. The derivation function replaces the slope value with the directional mode index. The derivation function uses the directional mode index as the position index in the histogram list and accumulates the derived intensity at that position. The derivation function moves the center position of the filter and repeats the above operation until all points that can be the center position of the filter within the surrounding chroma pixel area are searched.

The derivation function derives the position index with the largest intensity value from the intensity histogram, and then outputs the directional mode index corresponding to the position index as the representative mode.

Hereinafter, the operation of the derived function getExtraIntraMode(x0, y0, width, height) according to the second method of Realization Example 1-2 will be described.

The derivation function sets the surrounding chroma pixel area from which to derive the representative mode based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block and selects the range of reference pixels to predict the set area.

The derivation function creates a distortion histogram as follows.

In other words, the derivation function groups a preset prediction mode candidate and the distortion of that mode into a pair, sets up a vector-shaped distortion histogram with this as each component, and then initializes the distortion of all candidates to 0. The derivation function generates a predictor of the surrounding chroma pixel area from reference pixels using the prediction mode candidate, which is the first component in the vector. The derivation function uses a preset image similarity comparison measurement method to calculate the distortion value between the generated predictor and the restored pixel values of the surrounding chroma pixel area. The derivation function updates the second component in the vector with the derived distortion value. The derivation function repeats the above operation for all prediction mode candidates in the vector.

The derivation function outputs the prediction mode paired with the smallest distortion value from the distortion histogram as the representative mode.

The operation of the derivation function getExtraIntraMode() as described above regarding the two types of representative mode derivation methods describes the case of generating one representative mode. If multiple representative modes are created, the derived function can be expanded by additionally entering numExtraMode, the number of representative modes.

Meanwhile, the representative mode inferred by getExtraIntraMode() may be the same type of prediction mode as the existing predictor's prediction mode, or the representative mode may not be inferred at all. In this case, the prediction mode with the next priority can be used as the representative mode in the inference process. Alternatively, the representative mode may be inferred using another inference method, or a preset mode may be used as the representative mode.

<Realization Example 1-3> Using the restored information in and around the corresponding luma area

In this implementation, according to Equation 3, the image decoding device stores the deconstructed information inside and around the luma area (hereinafter, 'corresponding luma area') corresponding to the current chroma block, that is, inside and around the corresponding luma area. The prediction mode (hereinafter referred to as 'representative mode') derived using information such as the value/position/number of pixels around the pixels is set as the prediction mode of the second predictor (pred _intra ). At this time, the number of representative modes derived by the video decoding device depends on the number of second predictors that are weighted. However, since this implementation uses only the restored information in and around the corresponding luma area, a prediction mode (eg, CCLM mode) that generates a chroma predictor using information in the corresponding luma area cannot be inferred.

As a first method, the most dominant prediction mode among prediction modes derived from the values of pixels in and around the corresponding luma area using an edge detection filter can be set as the representative mode. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in DIMD technology. First, as shown in the example of FIG. 17, a specific area among the pixels in the corresponding luma area is set as a 'luma pixel area'. In the example of FIG. 17, a specific area among the pixels in the corresponding luma area is set as the 'luma pixel area', but in addition, the luma pixel area can be set in various ways depending on the embodiment, including pixels surrounding the corresponding luma area. .

Thereafter, the video decoding device may derive one or more representative modes in the same manner as the first method of deriving the representative mode in Realization Example 1-2.

As a second method, after comparing the distortion of each prediction mode using information on pixels in and around the corresponding luma area, the prediction mode with the smallest distortion can be set as the representative mode. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in TIMD technology. First, as shown in the example of FIG. 18, a luma pixel area is set including pixels in the corresponding luma area and surrounding pixels. The video decoding device generates a predictor by performing prediction for each prediction mode candidate using surrounding pixels on the left and top of the corresponding area. In the example of FIG. 18, the surrounding pixels referred to when generating the predictor are limited to neighboring pixels adjacent to the set luma pixel area, but in addition, neighboring pixels slightly distant from the set area may be used. In addition, in the example of FIG. 18, the luma pixel area is limited to pixels within the corresponding luma area, but the luma pixel area can be set in various ways depending on the embodiment, including both pixels within the corresponding luma area and surrounding pixels. .

Thereafter, the video decoding device may derive one or more representative modes in the same manner as the second method of deriving the representative mode in Realization Example 1-2.

In order to derive a representative mode from the values of pixels in the corresponding luma area and surrounding pixels, the representative mode can be derived using various methods in addition to the two examples described above.

Meanwhile, when the second predictor to be weighted is pred _intra as shown in Equation 3, the example of FIG. 16A may show the process of generating the final predictor in this implementation. At this time, the derivation function getExtraIntraMode(), which infers the representative mode according to the various representative mode derivation methods of Realization Examples 1-3, can be implemented in various ways. The upper left pixel coordinates (x0, y0), width, and height of the current chroma block can be provided as basic input to getExtraIntraMode(), which implements this implementation.

Hereinafter, the operation of the derived function getExtraIntraMode(x0, y0, width, height) according to the first method of Realization Example 1-3 will be described.

The derivation function sets the luma pixel area from which the representative mode is derived from the pixels inside the corresponding luma area and the surrounding pixels based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block.

The derivation function creates an intensity histogram as follows.

That is, the derivation function generates an intensity histogram in the form of a list with a length equal to the number of preset directional mode candidates and then initializes it to 0. The derivation function positions the filter within the luma pixel area according to the size of the preset filter and derives the slope value and intensity of the pixel area whose positions overlap with the filter. The derivation function replaces the slope value with the directional mode index. The directional mode index is used as the position index in the histogram list to accumulate the derived intensity at that position. The derivation function moves the center position of the filter and repeats the above operation until all points that can be the center position of the filter within the luma pixel area are searched.

Hereinafter, the operation of the derived function getExtraIntraMode(x0, y0, width, height) according to the second method of Realization Example 1-3 will be described.

The derivation function sets the luma pixel area to derive the representative mode from the pixels inside the corresponding luma area and the surrounding pixels based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block and predicts the set area. Select the range of reference pixels for this purpose.

The derivation function creates a distortion histogram as follows.

In other words, the derivation function groups a preset prediction mode candidate and the distortion of that mode into a pair, sets up a vector-shaped distortion histogram with this as each component, and then initializes the distortion of all candidates to 0. The derivation function generates a predictor of the luma pixel area from reference pixels using the prediction mode candidate, which is the first component in the vector. The derivation function uses a preset image similarity comparison measurement method to calculate the distortion value between the generated predictor and the restored pixel values of the luma pixel area. The derivation function updates the second component in the vector with the derived distortion value. The derivation function repeats the above operation for all prediction mode candidates in the vector.

<Realization Example 1-4> Combination of Realization Examples 1-1 to 1-3

In this implementation, according to Equation 3 or Equation 4, when generating a plurality of second predictors, the video decoding device uses a prediction mode (hereinafter referred to as 'representative mode') to generate each second predictor. For inference, one of the methods presented in Realization Example 1-1 to Realization Example 1-3 can be selected and used. In the above-described implementation examples, a method of simultaneously inferring a plurality of representative modes using one specific method within each implementation has already been described. In this implementation, the image decoding device may infer representative modes using different methods when generating each second predictor.

For example, when generating two additional predictors, the video decoding device can infer the first representative mode using Realization Example 1-1 and the second representative mode using Realization Example 1-2. . Alternatively, the first representative mode is inferred based on the first method (method of deriving the prediction mode with the largest intensity) in Realization Example 1-2, and the second representative mode is inferred based on the second method in Realization Example 1-2 ( It can be inferred based on the method of deriving the prediction mode with the smallest distortion. As other examples, combinations of various inference methods may exist, and as the number of additional predictors increases, more various combinations of inference methods may be used.

<Implementation Example 2> Method of setting weights for weighted combination

In this implementation, a method of weightedly combining a second predictor generated based on the representative mode inferred according to Realization Example 1 and a first predictor generated according to existing CCLM prediction based on Equation 3 is described. . The video decoding device can be used with a specific weight fixed without being explicitly signaled by the video encoding device. In addition, the video decoding device determines the width/height/width/aspect ratio/prediction mode/position/number/distance to the current chroma block of the surrounding chroma blocks of the current chroma block, and the value/position/number/up to the current chroma block of the surrounding chroma pixels. Consider at least one of the following: distance, width/height/area/aspect ratio/prediction mode/position/number of blocks included in the corresponding luma area and its surrounding blocks, and value/position/number of luma pixels in and around the corresponding luma area. Thus, the weight can be inferred.

The video decoding device can implement various weighted combining methods by appropriately setting w(i,j) in Equation 3. Hereinafter, basically, in addition to the first predictor (predictor based on information (①) of the corresponding luma area) generated in the existing CCLM mode, there is one second predictor (predictor based on information (②) of the same channel). Weighted combination methods are described for the case, but the same methods can also be applied to the case where a plurality of additional predictors exist.

Hereinafter, Realization Examples 2-1 to 2-5 are methods for setting the same weight for all pixels in the predictor. The weighted combining method for the corresponding implementations is described without considering the influence of pixel coordinates (i,j) in the predictor, as shown in Equation 7.

In the case of a method of setting weights differently according to pixel coordinates (i,j) in the predictor, it can be described using the expression of Equation 3.

Meanwhile, the above can be equally applied to the implementation example using Equation 4.

<Implementation Example 2-1> Using predefined weights

In this implementation, according to Equation 3 or Equation 4, the image decoding device uses a predefined weight w _CCLM . At this time, the predefined weights include equal weights, higher weights for CCLM predictions (3:1, 7:1,...), or lower weights for CCLM predictions (1:3, 1:7, . ..) can be used.

For example, as shown in Equation 8, an image decoding device can set equal weights for all predictors.

Alternatively, as shown in Equation 9, the image decoding device may set a higher weight to the first predictor according to CCLM prediction.

<Implementation Example 2-2> Using information from chroma blocks surrounding the current chroma block

In this implementation, according to Equation 3, the image decoding device sets the weight using information such as width/height/area/prediction mode/position/number/distance to the current chroma block of the surrounding chroma blocks of the current chroma block. Set it. In general, there may be a correlation between the current chroma block and surrounding chroma blocks. Accordingly, the correlation between the current chroma block and surrounding chroma blocks using a prediction mode (hereinafter referred to as 'representative mode') for generating the second predictor (pred _intra ) can be quantified. Hereinafter, the numerical correlation is referred to as peripheral pixel correlation r _C. The video decoding device can set the value of the weight w _CCLM using the neighboring pixel correlation r _C .

In this implementation, the representative mode may be one of the 67 intra prediction modes illustrated in FIG. 3A, as described above. Alternatively, the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes. If the representative mode is all intra prediction modes, prediction modes other than the representative mode may include MIP mode, CCLM mode, etc.

Meanwhile, when this implementation follows Equation 4, a prediction mode (hereinafter referred to as 'representative mode') for generating a predictor (pred _CCLM ) of the current chroma block, and information on neighboring blocks of the current chroma block are used. Thus, the correlation degree r _C of the surrounding pixels of the predictor (pred _CCLM ) can be inferred. Afterwards, the value of the weight w _intra can be set using r _C as the degree of correlation between the surrounding pixels.

The video decoding device can use one of the following three methods as a method of deriving the peripheral pixel correlation r _C.

The surrounding chroma blocks considered in the examples of the methods below include blocks adjacent to the current chroma block or blocks that are slightly distant, and the range of the surrounding chroma blocks may be set in various ways depending on the embodiment.

As a first method, r _C can be derived by calculating the ratio of neighboring blocks that use the representative mode among neighboring chroma blocks of the current chroma block based on the number of blocks.

For example, after calculating the ratio of the number of neighboring blocks using the representative mode among the total number of neighboring blocks of the current chroma block, this ratio may be set to r _C. The video decoding device can use this ratio as the weight of the second predictor, as shown in Equation 10, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.

Assume that the neighboring blocks and prediction modes of the current chroma block are distributed as shown in the example of FIG. 19. In the example of FIG. 19, there are a total of 5 neighboring blocks of the current chroma block, and 3 of these neighboring blocks use the representative mode, Planar mode. Therefore, the video decoding device may set 3/5 as the weight of the second predictor and 2/5 as the weight of the first predictor generated by CCLM mode according to Equation 10. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .

As a second method, r _C can be derived by calculating the ratio of neighboring blocks that use the representative mode among neighboring chroma blocks of the current chroma block based on block area.

For example, after calculating the ratio of the area of neighboring blocks using the representative mode to the total area of neighboring blocks of the current chroma block, this ratio may be set to r _C. The image decoding device can use this ratio as the weight of the second predictor, as shown in Equation 11, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.

Assume that the neighboring blocks and prediction modes of the current chroma block are distributed as shown in the example of FIG. 20. In the example of FIG. 20, the total area of neighboring blocks of the current chroma block is 68, and the area of neighboring blocks using Planar mode, the representative mode, is 28. Therefore, the video protection device can set 28/68 as the weight of the second predictor and 40/68 as the weight of the first predictor generated by CCLM mode according to Equation 11. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .

As a third method, r _C can be derived based on the ratio of the lengths of the sides adjacent to the current block of neighboring blocks using the representative mode among the lengths of all sides adjacent to the current chroma block and neighboring chroma blocks.

For example, after calculating the ratio of the lengths of the sides of the current chroma block adjacent to the current chroma block among the lengths of the sides of the current chroma block adjacent to the surrounding chroma blocks, this ratio may be set to r _C. . The video decoding device can use this ratio as the weight of the second predictor, as shown in Equation 12, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.

Assume that the neighboring blocks and prediction modes of the current chroma block are distributed as shown in the example of FIG. 21. In the example of FIG. 21, the total side length of the neighboring blocks adjacent to the current chroma block is 16, and the length of the adjacent blocks adjacent to the current chroma block of the neighboring blocks using Planar mode, which is a representative mode, is 10. Therefore, the image decoding device uses math According to

Equation

12, 10/16 can be set as the weight of the second predictor and 6/16 can be set as the weight of the first predictor generated by CCLM mode. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .

In the examples of FIGS. 19 to 21, one intra prediction mode, that is, Planar mode, is used as the representative mode, but as described above, the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes.

In this implementation, in the process of calculating the neighboring pixel correlation r _C , if the total number of neighboring blocks or the area of all neighboring blocks is not in the form of a power of 2, when implementing this in hardware, the calculation complexity will greatly increase during the division process. You can. Therefore, in the process of deriving the correlation between neighboring pixels, the image decoding device approximates each denominator and numerator in the form of a power of 2 using the operation shown in Equation 13, and then Equation 10 to Equation 10. 12 can be used to derive the relevance of surrounding pixels.

Alternatively, in the process of calculating the neighboring pixel relevance r _{C ,} the video decoding device may adjust the importance of the neighboring pixel relevance by additionally multiplying the predetermined importance value p. For example, by applying the specific gravity p, the degree of relevance of the surrounding pixels in Equation 10 can be expressed as Equation 14.

Alternatively, the image decoder and device can approximate the derived weight to the nearest power of 1/2, such as 1/2, 1/4, or 1/8. Alternatively, the video decoding device divides the weight section between 0 and 1 into equal parts such as 2 parts, 4 parts, 8 parts, etc. or uses a variable partition length, selects the value of the split position as the representative weight value, and then uses the derived weight can be approximated with the closest representative weight value. In addition, depending on the embodiment, the weights may be additionally adjusted using various conditional expressions or calculation formulas.

<Implementation Example 2-3> Using information on blocks included in the corresponding luma area and their surrounding blocks

In this implementation, according to Equation 3, the image decoding device determines the width/height/area/prediction mode of the blocks included in the luma area (hereinafter, 'corresponding luma area') corresponding to the current chroma block and the surrounding blocks. /Set the weight using information such as aspect ratio/position/number. There may be a certain correlation between the current chroma block and the corresponding luma area. Accordingly, the correlation between the blocks in the area using the prediction mode (hereinafter referred to as 'representative mode') for generating the second predictor (pred _intra ) and the current chroma block can be quantified. Hereinafter, the numerical correlation is referred to as luma pixel correlation r _L. The video decoding device can set the value of the weight w _CCLM using the luma pixel correlation r _L. However, since this implementation uses only the information of the block included in the corresponding luma area and the surrounding blocks, the weight of the representative mode can be derived for the prediction mode (e.g., IPM) that generates the predictor based on the surrounding pixel information. .

The video decoding device can use one of the following two methods as a method of deriving the luma pixel correlation r _L.

The blocks considered in the examples of the methods below are blocks included in the corresponding luma area, but the range of blocks may be set in various ways depending on the embodiment, including blocks surrounding the corresponding luma area.

As a first method, r _L can be derived by calculating the ratio of blocks using the representative mode among blocks included in the corresponding luma area based on the block area.

For example, after calculating the ratio of the number of blocks using the representative mode among the total number of blocks included in the corresponding luma area, this ratio may be set to r _L . The image decoding device can use this ratio as the weight of the second predictor, as shown in Equation 15, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.

Assume that the blocks and prediction modes included in the corresponding luma area are distributed as shown in the example of FIG. 22. In the example of FIG. 22, there are a total of 5 blocks included in the corresponding luma area, and 2 of these blocks use the representative mode, Planar mode. Therefore, the video decoding device may set 2/5 as the weight of the second predictor and 3/5 as the weight of the first predictor generated by CCLM mode according to Equation 15. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .

As a second method, r _L can be derived by calculating the ratio of blocks using the representative mode among blocks included in the corresponding luma area based on the block area.

For example, after calculating the ratio of the area where blocks using the representative mode overlap with the corresponding luma area among the areas of the corresponding luma area, this ratio may be set to r _L . The image decoding device can use this ratio as the weight of the second predictor, as shown in Equation 16, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.

Assume that the blocks and prediction modes included in the corresponding luma area are distributed as shown in FIG. 23. In the example of FIG. 23, the total area of the corresponding luma area is 256, and the area where blocks using Planar mode, a representative mode, overlap with the corresponding luma area is 96. Therefore, the video decoding device can set 96/256 as the weight of the second predictor and 160/256 as the weight of the first predictor generated by CCLM mode according to Equation 16. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .

In the examples of FIGS. 22 and 23, one intra prediction mode, that is, Planar mode, is used as the representative mode, but as described above, the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes.

In this implementation, in the process of calculating the luma pixel correlation r _L , if the number of blocks using all representative modes or the overlapping area with all blocks is not in the form of a power of 2, when implementing this in hardware, in the division process Computational complexity may increase significantly. Therefore, in the process of deriving the luma pixel correlation, the image decoding device approximates each denominator and numerator in the form of a power of 2 using an operation similar to that shown in Equation 13, and then Equation 15 and Equation 16 The luma pixel correlation can be derived using .

Alternatively, in the process of calculating the luma pixel correlation r _L , the video decoding device may adjust the proportion of the luma pixel correlation r _L by additionally multiplying the predetermined proportion value p. For example, by applying the specific gravity p, the luma pixel correlation in Equation 15 can be expressed as Equation 17.

Alternatively, the video decoding device may approximate the derived weight to the nearest power of 1/2, such as 1/2, 1/4, or 1/8. Alternatively, the video decoding device divides the weight section between 0 and 1 into equal parts such as 2 parts, 4 parts, 8 parts, etc. or uses a variable partition length, selects the value of the split position as the representative weight value, and then uses the derived weight can be approximated with the closest representative weight value. In addition, depending on the embodiment, the weights may be additionally adjusted using various conditional expressions or calculation formulas.

<Implementation Example 2-4> Using restored chroma information around the current chroma block

In this implementation, the image decoding device sets the weight using the restored chroma information around the current chroma block, that is, information such as value/position/number/distance to the current chroma block of pixels around the current chroma block. The restored information around the current chroma block may also include the width/height/area/aspect ratio/prediction mode/position/number/distance to the current chroma block of surrounding chroma blocks, etc., but the method using these is Realization Example 2- Depends on 2. Therefore, in this implementation, a method mainly based on information such as value/position/number/distance to the current chroma block of pixels surrounding the current chroma block is described. As described above, the area containing the surrounding pixels of the current chroma block is called the surrounding chroma pixel area.

Meanwhile, the weight setting method according to this implementation example can be applied when the second predictor is pred _CCLM or pred _intra . Hereinafter, a method for setting weights for the case where the second predictor is pred _intra is described.

The video decoding device can use one of the following methods to derive weights using the restored chroma information around the current chroma block.

As a first method, the image decoding device calculates prediction modes and intensities derived from the values of pixels surrounding the current chroma block using an edge detection filter, and then makes a second prediction based on the ratio of the intensity of the representative mode among all intensities. It can be set to your own weight. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in DIMD technology. This implementation is very similar in operation to the first method of Realization Example 1-2. As described above, the prediction mode that can be inferred using the edge detection filter is limited to the directional prediction mode, so this implementation may be limited to the case where the second predictor is pred _intra according to Equation 3.

When the set representative mode is the directional mode, it is assumed that intensity histograms are generated for the directional modes as shown in the example of FIG. 24 according to the first method of Realization Example 1-2. The video decoding device can set weights using the intensity of the representative mode from the intensity histogram. Hereinafter, in the description for setting weights using the intensity histogram, representative mode M is assumed to be mode 19.

As an example, the ratio of the intensity of the representative mode to the total of the intensities in the intensity histogram can be set as the weight, as shown in Equation 18.

Therefore, in the intensity histogram illustrated in Figure 24, the weight of the representative mode is set to 25/95.

As another example, as shown in Equation 19, the ratio of the intensity of the representative mode to the total of n (n = 2, 3, 4,..) top intensities in the intensity histogram may be set as the weight.

When n is 3, in the intensity histogram illustrated in FIG. 24, the weight of the representative mode is set to 25/65.

As another example, assume that the neighboring blocks and prediction modes of the current chroma block are distributed as shown in the example of FIG. 25. As shown in Equation 20, for the directional modes among the prediction modes used in the intensity histogram by the representative mode and neighboring blocks of the current chroma block, the ratio of the intensity of the representative mode to the total of the intensities of these directional modes is set as the weight. It can be.

Therefore, in the intensity histogram illustrated in Figure 24, the weight of the representative mode is set to 25/56.

In addition, weights can be set using the intensity of the representative mode according to various methods. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .

As a second method, the image decoding device calculates the distortion of each prediction mode based on information on surrounding pixels of the current chroma block, and then uses the ratio of the distortion of the representative mode among the total distortion values when setting the weight of the second predictor. You can. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in TIMD technology. This implementation is very similar in operation to the second method of Realization Example 1-2. The video decoding device calculates the distortion value D of the prediction mode candidates according to the second method of Realization Example 1-2, and then calculates the value calculated according to Equation 21 or Equation 22 for each prediction mode candidate as Realization Example 1- Replace with the intensity value used in the first method of 2.

Afterwards, the video decoding device can generate an intensity histogram in the same manner as the first method of this implementation and then set weights using it. The larger the intensity of each prediction mode, the more suitable it is as a representative mode, and conversely, the smaller the distortion, the more suitable it is as a representative mode. Therefore, intensity and distortion have an inverse relationship. Therefore, distortion can be replaced with intensity in various ways in addition to Equation 21 and Equation 22. In addition to the method of replacing distortion with intensity, the ratio of the distortion of the representative mode among the distortion values of the prediction mode candidates is calculated using various methods, and then the weight can be set using this ratio.

To set a weight from the values of surrounding pixels of the current chroma block, various methods other than the two examples described above can be used.

<Realization Example 2-5> Using the restored information in and around the corresponding luma area

In this embodiment, the image decoding device stores the reconstructed information inside and around the luma area (hereinafter, 'corresponding luma area') corresponding to the current chroma block, that is, the values of pixels inside and around the corresponding luma area. Set the weight of the second predictor using information such as /position/number. The restored information in and around the corresponding luma area may also include the width/height/area/aspect ratio/prediction mode/position/number of blocks included in the corresponding luma area and the blocks surrounding them, but the method of utilizing these is It depends on Realization Example 2-3. Therefore, in this implementation, a method mainly based on information such as value/position/number of pixels in and around the corresponding luma area is described. However, since this implementation only uses the reconstructed information in and around the corresponding luma area, the prediction mode (i.e., CCLM mode) that generates the chroma predictor using the information in the corresponding luma area cannot be inferred. As described above, an area containing pixels within and around the corresponding luma area is called a luma pixel area.

The image decoding device may use one of the following methods to derive weights using the reconstructed information in and around the corresponding luma area.

As a first method, the image decoding device calculates prediction modes and intensities derived from the values of pixels in and around the corresponding luma area using an edge detection filter, and then calculates the ratio of the intensity of the representative mode among all intensities. It can be set to the weight of the second predictor. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in DIMD technology. This implementation is very similar in operation to the first method of Realization Example 1-3.

When the set representative mode is the directional mode, it is assumed that intensity histograms are generated for the directional modes as shown in the example of FIG. 24 according to the first method of Realization Example 1-3. The video decoding device can set weights using the intensity of the representative mode from the intensity histogram. Thereafter, in the same manner as the first method of Realization Example 2-4, the video decoding device can set the weight using the intensity.

As a second method, the image decoding device calculates the distortion of each prediction mode based on information on pixels in and around the corresponding luma area, and then sets the weight of the second predictor based on the ratio of the distortion of the representative mode among the total distortion values. It is available at the time. For this purpose, the video decoding device can borrow the method used to derive the prediction mode in TIMD technology. This implementation is very similar in operation to the second method of Realization Examples 1-3. The image decoding device may calculate the distortion value D of the prediction mode candidates according to the second method of Realization Example 1-3, and then set the weight using the same permuted intensity as the second method of Realization Example 2-4. .

Meanwhile, in order to set the weight from the values of pixels in and around the corresponding luma area, various methods other than the two examples described above can be used.

<Realization Example 3> Method of signaling prediction mode and weight of second predictor

In this implementation, the video decoding device does not infer information for intra prediction of the current chroma block, but uses information signaled from the video encoding device. That is, information related to the prediction mode of the second predictor, information related to weighted combination, etc. are transmitted from the video encoding device to the video decoding device. Additionally, whether or not this embodiment is applied can be signaled from the video encoding device to the video decoding device.

<Implementation Example 3-1> Method of signaling information related to prediction mode

In this implementation, information related to the prediction mode (hereinafter referred to as 'representative mode') for generating the second predictor is directly signaled from the video encoding device to the video decoding device. At this time, related information includes the number of representative modes, representative mode derivation method, representative mode index, etc.

As a first method, the representative mode number can be signaled as follows. As an example, the number of representative modes is preset at a level higher than CU, such as SPS (Sequence Parameter Set)/VPS (Video Parameter Set)/PPS (Picture Parameter Set)/SH (Slice Header)/CTU (Coding Tree Unit), etc. can be set. For example, as shown in Table 3, the number of representative modes sps_ccip_extra_mode_num on the SPS may be defined in advance.

Here, ccip in the variable name is an abbreviation for Cross 'CCLM Intra Prediction'. Hereinafter, ccip is inserted into the variable name of the signal related to this embodiment.

The video encoding device encodes a preset number of representative modes, includes them in the bitstream, and signals them to the video decoding device. The video decoding device parses sps_ccip_extra_mode_num in the bitstream. Afterwards, the number of representative modes to be derived when performing prediction is determined according to the value of sps_ccip_extra_mode_num.

As another example, the number of representative modes may be signaled each time prediction is performed at the CU level. The intra prediction mode parsing process of the chroma channel described above in Table 2 may be changed as shown in the examples in Tables 4 to 6. According to Table 4 or Table 5, the number of representative modes required when predicting each block can be signaled by additionally parsing ccip_extra_mode_num according to the type of prediction mode.

Alternatively, as shown in Table 6, ccip_extra_mode_num can be additionally parsed regardless of the type of prediction mode.

As a second method, the representative mode derivation method can be signaled as follows. First, the representative mode derivation methods presented in Realization Example 1 can be classified by index as illustrated in Table 7.

As an example, a representative mode derivation method may be set in advance at a level higher than CU, such as SPS/VPS/PPS/SH/CTU. For example, as shown in Table 8, the index sps_ccip_mode_infer_idx of the representative mode induction method on SPS may be defined in advance.

The video encoding device encodes the index of the predefined representative mode derivation method and then includes it in the bitstream and signals it to the video decoding device. The video decoding device parses sps_ccip_mode_infer_idx in the bitstream. Afterwards, the representative mode derivation method to be used when performing prediction is determined according to the value of sps_ccip_mode_infer_idx.

As another example, the representative mode derivation method may be signaled each time prediction is performed at the CU level. In the intra prediction mode parsing process of the chroma channel, the representative mode derivation method required for predicting each block can be signaled by additionally parsing ccip_mode_infer_idx according to the type of prediction mode, as shown in Table 9 or Table 10.

Alternatively, as shown in Table 11, ccip_mode_infer_idx can be additionally parsed regardless of the type of prediction mode.

As a third way, the representative mode index can be signaled as follows. As an example, a representative mode index may be set in advance at a level higher than CU, such as SPS/VPS/PPS/SH/CTU. For example, as shown in Table 12, the representative mode index sps_ccip_extra_mode_idx may be defined in advance on the SPS.

The video encoding device encodes a predefined representative mode index, includes it in the bitstream, and signals it to the video decoding device. The video decoding device parses sps_ccip_extra_mode_idx in the bitstream. Afterwards, the representative mode to be used when performing prediction is determined according to the value of sps_ccip_extra_mode_idx.

As another example, the representative mode index may be signaled each time prediction is performed at the CU level. In the intra prediction mode parsing process of the chroma channel, the representative mode index required for prediction of each block can be signaled by additionally parsing ccip_extra_mode_idx according to the type of prediction mode, as shown in Table 13 or Table 14.

Alternatively, as shown in Table 15, ccip_extra_mode_idx can be additionally parsed regardless of the type of prediction mode.

At this time, ccip_extra_mode_idx displays 1 index when the number of representative modes is 1. Additionally, ccip_extra_mode_idx can be a list of multiple representative mode indices when multiple representative modes are used.

As an example of this implementation, a representative mode derivation method may be signaled while information on the number of representative modes is signaled. Alternatively, a preset number may be used without signaling representative mode number information, and the index of the representative mode may be signaled instead of information on the representative mode derivation method. In addition, various prediction methods can be created by selecting various combinations of relevant information to be signaled and related information not to be signaled.

<Implementation Example 3-2> Method of signaling information related to weighted combination

In this implementation, weighted combining related information is signaled from the video encoding device to the video decoding device. At this time, information related to weighted combination includes weighted combination method, weighted combination weight, specific gravity value, etc. Here, the specific gravity value is a value multiplied in the process of calculating the peripheral pixel correlation and the luma pixel correlation in Realization Examples 2-2 and 2-3.

As a first method, the weighted combination method can be signaled as follows. First, the weighted combination methods presented in Realization Example 2 can be classified by index as illustrated in Table 16.

As an example, a weighted combination method may be set in advance at a level higher than CU, such as SPS/VPS/PPS/SH/CTU. For example, as shown in Table 17, the index sps_ccip_weight_calc_mode_idx of the weighted combination method on SPS may be defined in advance.

The video encoding device encodes the index of a predefined weighted combination method and then includes it in the bitstream and signals it to the video decoding device. The video decoding device parses sps_ccip_weight_calc_mode_idx in the bitstream. Afterwards, the weighted combination method to be used when performing prediction is determined according to the value of sps_ccip_weight_calc_mode_idx.

As another example, the weighted combination method may be signaled each time prediction is performed at the CU level. In the intra prediction mode parsing process of the chroma channel, the weighted combination method to be used when predicting each block can be signaled by additionally parsing ccip_weight_calc_mode_idx according to the type of prediction mode, as shown in Table 18 or Table 19.

Alternatively, as shown in Table 20, ccip_weight_calc_mode_idx can be additionally parsed regardless of the type of prediction mode.

As a second way, the weights of the weighted combination can be signaled as follows. As an example, the weight of the weighted combination may be set in advance at a level higher than the CU, such as SPS/VPS/PPS/SH/CTU. For example, as shown in Table 21, the weight sps_ccip_pred_weight of the weighted combination may be defined in advance on the SPS.

The video encoding device encodes the weight of the predefined weighted combination and then includes it in the bitstream and signals it to the video decoding device. The video decoding device parses sps_ccip_pred_weight in the bitstream. Afterwards, the weighted combination method to be used when performing prediction is determined according to the value of sps_ccip_pred_weight.

As another example, the weight of the weighted combination may be signaled each time prediction is performed at the CU level. In the intra prediction mode parsing process of the chroma channel, by additionally parsing ccip_pred_weight according to the type of prediction mode, as shown in Table 22 or Table 23, the grouping method of pixels in the block required for prediction of each block can be signaled.

Alternatively, as shown in Table 24, ccip_pred_weight can be additionally parsed regardless of the type of prediction mode.

At this time, ccip_pred_weight represents one weight for the first predictor (or second predictor) when the number of representative modes is one. Additionally, when ccip_pred_weight uses multiple representative modes, the number of weights constituting the list may increase depending on the number of representative modes.

In addition, in Realization Examples 2-2 and 2-3, the weight value multiplied in the calculation process of peripheral pixel correlation and luma pixel correlation can also be signaled in the same manner as the weighted combination weight. At this time, the specific gravity value can be signaled by changing ccip_pred_weight to ccip_relativity_importance in Tables 21 to 24.

<Implementation Example 3-3> Method of signaling whether the present invention is applied or not

Whether or not the present invention is applied can be signaled as follows. As an example, to direct the use of an improved predictor (i.e., the predictor pred _C illustrated in Equations 3 and 4) according to a combination of embodiments of the present invention with respect to an existing predictor, SPS/ Flags may be set in advance at a level higher than CU, such as VPS/PPS/SH/CTU. For example, as shown in Table 25, a flag sps_ccip_mode_flag indicating use of an improved predictor on SPS may be defined in advance.

The video encoding device encodes a flag indicating the use of a predefined improved predictor and then includes it in the bitstream and signals it to the video decoding device. The video decoding device parses sps_ccip_mode_flag in the bitstream. If sps_ccip_mode_flag = 0, the video decoding device does not apply the present invention, and if sps_ccip_mode_flag = 1, the video decoding device may generate a predictor using the improved CCLM mode.

As an example of the present invention. A combination of Realization Example 3-1 and Realization Example 3-2 is possible. For example, if it is determined that the present invention is applied by signaling whether the present invention is applied, the methods of Realization Example 3-1 and Realization Example 3-2 can then be applied.

As another example, whether or not the present invention is applicable may be signaled at a low level. That is, application of the present invention can be determined using ccip_mode_flag at the CU level. If ccip_mode_flag is 0, the video decoding device does not apply the present invention, and if ccip_mode_flag is 1, the video decoding device may generate a final predictor by weightedly combining the first predictor and the second predictor.

The video decoding device can parse ccip_mode_flag as shown in Tables 26 and 27, depending on the type of prediction mode.

Alternatively, as shown in Table 28, ccip_mode_flag may be parsed regardless of the type of prediction mode.

Meanwhile, if the prediction mode is one of the CCLM modes and ccip_mode_flag is 1, the representative mode according to the present invention may be a co-channel prediction mode that generates a predictor using information (②) of the same channel. On the other hand, if the prediction mode is one of the prediction modes using the same channel information and ccip_mode_flag is 1, the representative mode according to the present invention may be a cross-component prediction mode that generates a predictor using information (①) of the corresponding luma area. .

In the flowchart/timing diagram of this specification, each process is described as being executed sequentially, but this is merely an illustrative explanation of the technical idea of an embodiment of the present disclosure. In other words, a person skilled in the art to which an embodiment of the present disclosure pertains may change the order described in the flowchart/timing diagram and execute one of the processes without departing from the essential characteristics of the embodiment of the present disclosure. Since the above processes can be applied in various modifications and variations by executing them in parallel, the flowchart/timing diagram is not limited to a time series order.

It should be understood from the above description that the example embodiments may be implemented in many different ways. The functions or methods described in one or more examples may be implemented in hardware, software, firmware, or any combination thereof. It should be understood that the functional components described herein are labeled as "...units" to particularly emphasize their implementation independence.

Meanwhile, various functions or methods described in this embodiment may be implemented with instructions stored in a non-transitory recording medium that can be read and executed by one or more processors. Non-transitory recording media include, for example, all types of recording devices that store data in a form readable by a computer system. For example, non-transitory recording media include storage media such as erasable programmable read only memory (EPROM), flash drives, optical drives, magnetic hard drives, and solid state drives (SSD).

The above description is merely an illustrative explanation of the technical idea of the present embodiment, and those skilled in the art will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but rather to explain it, and the scope of the technical idea of the present embodiment is not limited by these examples. The scope of protection of this embodiment should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this embodiment.

(Explanation of symbols)

122: Intra prediction unit

542: Intra prediction unit

802: input device

804: First predictor generator

806: Second predictor generator

808: Weighted Hapki

CROSS-REFERENCE TO RELATED APPLICATION

Priority is claimed for Patent Application No. 10-2022-0121555 filed in Korea on September 26, 2022 and Patent Application No. 10-2022-0167522 filed in Korea on December 5, 2022, and all of them. The contents are incorporated into this patent application by reference.

Claims

In the intra prediction method of the current chroma block performed by the video decoding device,

Decoding a cross-component prediction mode for cross-component prediction for the current chroma block, wherein the cross-component prediction involves pixels of the corresponding luma area for the current chroma block and the corresponding luma area. predicting the current chroma block using;

generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode;

Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels;

generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative mode;

deriving weights for the first predictor and the second predictor; and

Generating an intra predictor of the current chroma block by weighting the first predictor and the second predictor using the weight.

An intra prediction method comprising:
According to paragraph 1,

The step of inferring the representative mode is,

setting the peripheral chroma pixel area;

applying the edge detection filter to the surrounding chroma pixel area to generate directional modes and corresponding intensities, and generating an intensity histogram using the intensities; and

Setting the directional mode with the largest intensity value in the intensity histogram as the representative mode.

A method comprising:
According to paragraph 2,

The step of generating the intensity histogram is,

After applying the edge detection filter to each pixel in the surrounding chroma pixel area to calculate the slope and the size of the slope, the slope is replaced with the corresponding directional mode, and the intensity of the directional mode is calculated using the size of the slope. , and generating the intensity histogram by accumulating the intensities for each directional mode.
According to paragraph 2,

The step of inferring the representative mode is,

When inferring a plurality of representative modes, the method is characterized in that the plurality of representative modes are inferred in order of size of the intensities in the intensity histogram.
According to paragraph 1,

The step of inferring the representative mode is,

When inferring multiple representative modes,

dividing the peripheral chroma pixel area into an upper area and a left area;

applying the edge detection filter to the top area and the left area to generate directional modes and corresponding intensities, and generating intensity histograms of the top area and the left area using the intensities; and

Setting the directional mode with the largest intensity value in the intensity histogram of the upper area and the directional mode with the largest intensity value in the intensity histogram of the left area as the plurality of representative modes.

A method comprising:
According to paragraph 2,

The step of inferring the representative mode is,

When there are directional modes with the same intensities, the representative mode is set from the directional modes using a preset priority.
According to paragraph 1,

The step of inferring the representative mode is,

setting the peripheral chroma pixel area;

Generating distortion of each prediction mode among all prediction modes using peripheral pixels on the left and top of the peripheral chroma pixel area, wherein each prediction mode is applied to peripheral pixels of the peripheral chroma pixel area. generating a predictor of the surrounding chroma pixel area based on it; and

Setting the prediction mode with the smallest distortion among all prediction modes as the representative mode.

A method comprising:
In clause 7,

The step of generating the distortion is,

After generating a predictor for each prediction mode using the surrounding pixels on the left and top of the surrounding chroma pixel area, the similarity difference between the predictor and the restored pixel values of the surrounding chroma pixel area is calculated to determine the distortion. A method, characterized in that generating.
According to paragraph 2,

The step of deriving the weights is,

A method characterized in that setting the ratio of the intensity of the representative mode to the sum of all or the upper part of the intensities as the weight of the second predictor.
According to paragraph 2,

The step of deriving the weights is,

In relation to the directional modes among the prediction modes used by the representative mode and neighboring blocks of the current chroma block in the intensity histogram, the ratio of the intensity of the representative mode to the total intensities of the directional modes is used for the second prediction. A method characterized by setting the weight of the ruler.
In clause 7,

The step of deriving the weights is,

Characterized in that, based on the inverse proportionality convention between the distortion and the intensity, the distortion of each prediction mode is converted to the intensity.
According to clause 11,

The step of deriving the weights is,

A method characterized in that setting the ratio of the intensity of the representative mode to the sum of all or the upper portion of the intensities of all prediction modes as the weight of the second predictor.
In the intra prediction method of the current chroma block performed by the video encoding device,

Determining a cross-component prediction mode for cross-component prediction for the current chroma block, wherein the cross-component prediction involves pixels of the corresponding luma area for the current chroma block and the corresponding luma area. predicting the current chroma block using;

generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode;

Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels;

generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative mode;

deriving weights for the first predictor and the second predictor; and

Generating an intra predictor of the current chroma block by weighting the first predictor and the second predictor using the weight.

An intra prediction method comprising:
According to clause 13,

Characterized in that it further comprises the step of encoding the cross-component prediction mode.
A computer-readable recording medium storing a bitstream generated by an image encoding method, the image encoding method comprising:

Determining a cross-component prediction mode for cross-component prediction for a current chroma block, wherein the cross-component prediction uses pixels of the corresponding luma area for the current chroma block and the corresponding luma area. predict the current chroma block;

generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode;

Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels;

generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative mode;

deriving weights for the first predictor and the second predictor; and

Generating an intra predictor of the current chroma block by weighting the first predictor and the second predictor using the weight.

A recording medium comprising: