CN116491114A

CN116491114A - Image encoding and decoding method and apparatus using sub-block unit intra prediction

Info

Publication number: CN116491114A
Application number: CN202180076461.5A
Authority: CN
Inventors: 全炳宇; 金范允; 朴智允; 朴胜煜
Original assignee: Sungkyunkwan University School Industry Cooperation; Hyundai Motor Co; Kia Corp
Current assignee: Sungkyunkwan University School Industry Cooperation; Hyundai Motor Co; Kia Corp
Priority date: 2020-11-23
Filing date: 2021-11-23
Publication date: 2023-07-25
Also published as: KR20220071131A

Abstract

The present invention relates to an image encoding and decoding method and apparatus using intra prediction of a sub-block unit. The present embodiment provides an image encoding/decoding method and apparatus for modifying an intra prediction mode of a current block in a direction suitable for a sub-partitioned block by considering a shape of the sub-partitioned block, a sub-partitioned direction, a prediction direction of the current block, and the like, and generating an intra prediction mode of the sub-partitioned block so as to efficiently perform intra prediction in units of sub-blocks.

Description

Image encoding and decoding method and apparatus using sub-block unit intra prediction

Technical Field

The present invention relates to a video encoding/decoding method and apparatus using intra prediction on a per sub-block basis.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Since video data has a larger data amount than audio data or still image data, the video data requires a large amount of hardware resources (including a memory) to store or transmit the video data that is not subjected to compression processing.

Accordingly, encoders are typically used to compress and store or transmit video data. The decoder receives the compressed video data, decompresses the received compressed video data, and plays the decompressed video data. Video compression techniques include h.264/AVC, high efficiency video coding (High Efficiency Video Coding, HEVC), and multi-function video coding (Versatile Video Coding, VVC) that improves the coding efficiency of HEVC by about 30% or more.

However, as the image size, resolution, and frame rate gradually increase, the amount of data to be encoded is also increasing. Accordingly, a new compression technique providing higher coding efficiency and improved image enhancement effect compared to the existing compression technique is required.

In video (image) Coding, when an image is divided on each Coding Unit (CU) and coded on each CU, all pixels in a block to be coded are intra-predicted using one prediction mode. Since the distance between the pixel and the reference pixel may become further, a lot of energy may remain in the residual signal to be encoded. For long rectangular blocks of horizontal (or vertical) where the distance between the pixel to be predicted and the reference pixel is longer, or when the size of the block is larger, the residual energy problem in the residual signal may become more serious. The blocks may be further partitioned to solve the problem, but this causes another problem of increasing the burden for transmitting the intra prediction mode for each sub-divided block.

On the other hand, there is another solution to the problem of increasing the burden. The prior art performs prediction by dividing a block to be re-encoded into uniformly divided smaller blocks to reduce the burden while improving intra prediction efficiency, but transmits only a single prediction mode on each original block before sub-partitioning, and generally applies the single prediction mode to the small blocks of the sub-partition. The above background is referred to as Intra Sub-Partition (ISP) technology.

When the ISP is applied to the intra prediction of the current block, the video encoding and decoding apparatus may signal one intra prediction mode while predicting the block of the sub-partition using reference pixel values of the block close to the corresponding sub-partition. On the other hand, when the ISP technique is applied, a problem arises in that the intra prediction mode applied to the current block may not be the best mode of the sub-partitioned block. Therefore, a method for efficiently encoding a prediction mode of a sub-block is required in terms of encoding efficiency.

Disclosure of Invention

Technical problem

The present invention in some embodiments seeks to provide a video encoding/decoding method and apparatus for modifying an intra prediction mode of a current block in a direction suitable for a block of a sub-partition by considering a shape of the block of the sub-partition, a direction of the sub-partition, and a prediction direction of the current block. Based on the modified intra prediction mode of the current block, the video encoding/decoding method and apparatus generate an intra prediction mode of a sub-partitioned block to efficiently perform intra prediction on a per sub-block basis.

Solution method

At least one aspect of the present invention provides an intra prediction method performed by a video decoding apparatus for generating a prediction mode of a modified sub-block. The method includes decoding an intra prediction mode of a current block, information of the current block, and sub-block information from a bitstream. The sub-block information provides information related to a sub-block obtained by partitioning the current block. The method further includes selecting a method for modifying a prediction mode based on the information of the current block and the sub-block information. The method further includes generating a modified prediction mode by modifying an intra prediction mode of the current block based on the method for modifying the prediction mode.

Another aspect of the present invention provides a video decoding apparatus that generates a prediction mode of a modified sub-block. The apparatus includes an entropy decoder configured to decode an intra prediction mode of a current block, information of the current block, and sub-block information from a bitstream. The sub-block information provides information related to a sub-block obtained by partitioning the current block. The apparatus further includes an intra predictor configured to select a method for modifying a prediction mode based on the information of the current block and the sub-block information, and generate a modified prediction mode by modifying the intra prediction mode of the current block based on the method for modifying the prediction mode.

Yet another aspect of the present invention provides an intra prediction method performed by a video encoding apparatus for generating a prediction mode of a modified sub-block. The method includes obtaining an intra prediction mode of a current block, information of the current block, and sub-block information. The sub-block information provides information related to a sub-block obtained by partitioning the current block. The method further includes selecting a method for modifying a prediction mode based on the information of the current block and the sub-block information. The method further includes generating a modified prediction mode by modifying an intra prediction mode of the current block based on the method for modifying the prediction mode.

Effects of the invention

As described above, the present embodiment provides a video encoding/decoding method and apparatus for modifying an intra prediction mode of a current block in a direction suitable for a block of a sub-partition by considering a shape of the block of the sub-partition, a direction of the sub-partition, and a prediction direction of the current block. Based on the modified intra prediction mode of the current block, the video encoding/decoding method and apparatus generate an intra prediction mode of a sub-partitioned block to improve the encoding efficiency of intra prediction.

Further, the present embodiment provides a video encoding/decoding method and apparatus for modifying an intra prediction mode of a current block in a direction suitable for a block of a sub-partition by considering a shape of the block of the sub-partition, a direction of the sub-partition, and a prediction direction of the current block. Based on the modified intra prediction mode of the current block, the video encoding/decoding method and apparatus generate an intra prediction mode of a sub-partitioned block to improve the image quality of the decoded image.

Drawings

Fig. 1 is a block diagram of a video encoding device in which the techniques of the present invention may be implemented.

Fig. 2 illustrates a method of partitioning a block using a quadtree plus binary tree trigeminal tree (QTBTTT) structure.

Fig. 3a and 3b illustrate a plurality of intra prediction modes including a wide-angle intra prediction mode.

Fig. 4 shows neighboring blocks of the current block.

Fig. 5 is a block diagram of a video decoding apparatus in which the techniques of the present invention may be implemented.

Fig. 6 shows a current block and sub-blocks of a sub-partition.

Fig. 7 illustrates the problem of intra sub-partition (ISP) technology due to the application of Wide Angle Intra Prediction (WAIP) technology.

Fig. 8 illustrates sub-blocks having various shapes according to one embodiment of the present invention.

Figure 9 conceptually illustrates a prediction mode modifier, according to one embodiment of the present invention.

Fig. 10a and 10b show embodiments in which the partition direction of a sub-block is used as information of the sub-block.

Fig. 11 conceptually illustrates a prediction mode modifier according to another embodiment of the present invention.

Fig. 12a to 12e illustrate a method for selecting a representative block according to one embodiment of the present invention.

FIG. 13 illustrates selecting representative blocks and modifying prediction modes according to one embodiment of the invention.

Fig. 14 shows conditions regarding a specific size of a representative block according to an embodiment of the present invention.

Fig. 15a and 15b illustrate modifying a prediction mode of a sub-block based on a size of a representative block according to an embodiment of the present invention.

Fig. 16a and 16b illustrate conditions for modifying a prediction mode of a sub-block based on a shape of a representative block according to one embodiment of the present invention.

Fig. 17 illustrates modifying a prediction mode of a sub-block based on a shape of a representative block according to an embodiment of the present invention.

Fig. 18 illustrates modifying a prediction mode of a sub-block based on a position of a representative block according to an embodiment of the present invention.

Fig. 19 illustrates modifying a prediction mode of a sub-block based on a prediction direction according to an embodiment of the present invention.

Fig. 20 conceptually illustrates a prediction mode modifier according to another embodiment of the present invention.

Fig. 21 illustrates modifying an intra prediction mode of each sub-block according to one embodiment of the present invention.

Fig. 22 illustrates modifying an intra prediction mode of each sub-block based on a size of a representative block according to one embodiment of the present invention.

Fig. 23 shows an order of encoding (or decoding) sub-blocks within a current block.

FIG. 24 shows an order of modifying prediction modes of sub-blocks according to one embodiment of the present invention.

Fig. 25 illustrates modifying a prediction mode of a sub-block based on a preset model according to an embodiment of the present invention.

Fig. 26 conceptually illustrates a prediction mode modifier according to still another embodiment of the present invention.

Fig. 27 shows a mode modification flag according to still another embodiment of the present invention.

Fig. 28 is a flowchart illustrating a method of modifying a prediction mode of a sub-block performed by a video decoding apparatus according to one embodiment of the present invention.

Fig. 29 is a flowchart illustrating a method for modifying a prediction mode of a sub-block performed by a video decoding apparatus according to another embodiment of the present invention.

Fig. 30 is a flowchart illustrating a method for modifying a prediction mode of a sub-block performed by a video encoding apparatus according to one embodiment of the present invention.

Fig. 31 is a flowchart illustrating a method for modifying a prediction mode of a sub-block performed by a video encoding apparatus according to another embodiment of the present invention.

Detailed Description

Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying illustrative drawings. In the following description, like reference numerals denote like elements, although the elements are shown in different drawings. Furthermore, in the following description of some embodiments, detailed descriptions of related known components and functions have been omitted for clarity and conciseness when it may be considered that the subject matter of the present invention is obscured.

Fig. 1 is a block diagram of a video encoding device in which the techniques of the present invention may be implemented. Hereinafter, a video encoding apparatus and sub-components of the apparatus are described with reference to the illustration of fig. 1.

The encoding apparatus may include: an image divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, a reordering unit 150, an entropy encoder 155, an inverse quantizer 160, an inverse transformer 165, an adder 170, a loop filtering unit 180, and a memory 190.

Each component of the encoding apparatus may be implemented as hardware or software, or as a combination of hardware and software. In addition, the function of each component may be implemented as software, and the microprocessor may also be implemented to execute the function of the software corresponding to each component.

A video is made up of one or more sequences comprising a plurality of images. Each image is divided into a plurality of regions, and encoding is performed on each region. For example, an image is segmented into one or more tiles (tiles) or/and slices (slices). Here, one or more tiles may be defined as a tile set. Each tile or/and slice is partitioned into one or more Coding Tree Units (CTUs). In addition, each CTU is partitioned into one or more Coding Units (CUs) by a tree structure. Information applied to each CU is encoded as a syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as a syntax of the CTU. In addition, information commonly applied to all blocks in one slice is encoded as syntax of a slice header, and information applied to all blocks constituting one or more pictures is encoded as a picture parameter set (Picture Parameter Set, PPS) or a picture header. Furthermore, information commonly referred to by the plurality of images is encoded as a sequence parameter set (Sequence Parameter Set, SPS). In addition, information commonly referenced by the one or more SPS is encoded as a set of video parameters (Video Parameter Set, VPS). Furthermore, information commonly applied to one tile or group of tiles may also be encoded as syntax of the tile or group of tiles header. The syntax included in the SPS, PPS, slice header, tile, or tile set header may be referred to as a high level syntax.

The image divider 110 determines the size of the CTU. Information about the size of the CTU (CTU size) is encoded as a syntax of the SPS or PPS and transmitted to the video decoding apparatus.

The image divider 110 divides each image constituting a video into a plurality of CTUs having a predetermined size, and then recursively divides the CTUs by using a tree structure. Leaf nodes in the tree structure become CUs, which are the basic units of coding.

The tree structure may be a Quadtree (QT) in which a higher node (or parent node) is partitioned into four lower nodes (or child nodes) of the same size. The tree structure may also be a Binary Tree (BT) in which a higher node is split into two lower nodes. The tree structure may also be a Trigeminal Tree (TT), where the higher nodes are split into three lower nodes at a ratio of 1:2:1. The tree structure may also be a structure in which two or more of a QT structure, a BT structure, and a TT structure are mixed. For example, a quadtree plus binary tree (quadtree plus binarytree, QTBT) structure may be used, or a quadtree plus binary tree (quadtree plus binarytree ternarytree, QTBTTT) structure may be used. Here, BTTT is added to the tree structure to be called multiple-type tree (MTT).

Fig. 2 is a schematic diagram for describing a method of dividing a block by using the QTBTTT structure.

As shown in fig. 2, the CTU may be first partitioned into QT structures. Quadtree partitioning may be recursive until the size of the partitioned block reaches the minimum block size (MinQTSize) of leaf nodes allowed in QT. A first flag (qt_split_flag) indicating whether each node of the QT structure is partitioned into four lower-layer nodes is encoded by the entropy encoder 155 and signaled to the video decoding apparatus. When the leaf node of QT is not greater than the maximum block size (MaxBTSize) of the root node allowed in BT, the leaf node may be further divided into at least one of BT structure or TT structure. There may be multiple directions of segmentation in the BT structure and/or the TT structure. For example, there may be two directions, i.e., a direction of dividing the block of the corresponding node horizontally and a direction of dividing the block of the corresponding node vertically. As shown in fig. 2, when the MTT division starts, a second flag (MTT _split_flag) indicating whether a node is divided, and a flag additionally indicating a division direction (vertical or horizontal) and/or a flag indicating a division type (binary or trigeminal) in the case that a node is divided are encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

Alternatively, a CU partition flag (split_cu_flag) indicating whether a node is partitioned may be further encoded before encoding a first flag (qt_split_flag) indicating whether each node is partitioned into four nodes of a lower layer. When the value of the CU partition flag (split_cu_flag) indicates that each node is not partitioned, the block of the corresponding node becomes a leaf node in the partition tree structure and becomes a CU, which is a basic unit of encoding. When the value of the CU partition flag (split_cu_flag) indicates that each node is partitioned, the video encoding apparatus first starts encoding the first flag in the above scheme.

When QTBT is used as another example of the tree structure, there may be two types, i.e., a type of horizontally dividing a block of a corresponding node into two blocks having the same size (i.e., symmetrical horizontal division) and a type of vertically dividing a block of a corresponding node into two blocks having the same size (i.e., symmetrical vertical division). A partition flag (split_flag) indicating whether each node of the BT structure is partitioned into lower-layer blocks and partition type information indicating a partition type are encoded by the entropy encoder 155 and transmitted to the video decoding apparatus. On the other hand, there may additionally be a type in which a block of a corresponding node is divided into two blocks in an asymmetric form to each other. The asymmetric form may include a form in which a block of a corresponding node is divided into two rectangular blocks having a size ratio of 1:3, or may also include a form in which a block of a corresponding node is divided in a diagonal direction.

A CU may have various sizes according to QTBT or QTBTTT divided from CTUs. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of QTBTTT) is referred to as a "current block". When QTBTTT segmentation is employed, the shape of the current block may also be rectangular in shape, in addition to square shape.

The predictor 120 predicts the current block to generate a predicted block. Predictor 120 includes an intra predictor 122 and an inter predictor 124.

In general, each of the current blocks in the image may be predictively encoded. In general, prediction of a current block may be performed by using an intra prediction technique using data from an image including the current block or an inter prediction technique using data from an image encoded before the image including the current block. Inter prediction includes both unidirectional prediction and bi-directional prediction.

The intra predictor 122 predicts pixels in the current block by using pixels (reference pixels) located adjacent to the current block in the current image including the current block. Depending on the prediction direction, there are multiple intra prediction modes. For example, as shown in fig. 3a, the plurality of intra prediction modes may include two non-directional modes including a planar (planar) mode and a DC mode, and may include 65 directional modes. The neighboring pixels and algorithm equations to be used are defined differently according to each prediction mode.

For efficient direction prediction of a current block having a rectangular shape, direction modes (# 67 to # 80) indicated by dotted arrows in fig. 3b, intra prediction modes # -1 to # -14) may be additionally used. The direction mode may be referred to as a "wide angle intra-prediction mode". In fig. 3b, the arrows indicate the respective reference samples for prediction, rather than representing the prediction direction. The prediction direction is opposite to the direction indicated by the arrow. When the current block has a rectangular shape, the wide-angle intra prediction mode is a mode in which prediction is performed in a direction opposite to a specific direction mode without additional bit transmission. In this case, in the wide-angle intra prediction mode, some of the wide-angle intra prediction modes available for the current block may be determined by a ratio of a width to a height of the current block having a rectangular shape. For example, when the current block has a rectangular shape having a height smaller than a width, wide-angle intra prediction modes (intra prediction modes #67 to # 80) having angles smaller than 45 degrees are available. When the current block has a rectangular shape with a width greater than a height, a wide-angle intra prediction mode having an angle greater than-135 degrees is available.

The intra predictor 122 may determine intra prediction to be used for encoding the current block. In some examples, intra predictor 122 may encode the current block by utilizing a plurality of intra prediction modes, and may also select an appropriate intra prediction mode to use from among the test modes. For example, the intra predictor 122 may calculate a rate distortion value by using rate-distortion (rate-distortion) analysis of a plurality of tested intra prediction modes, and may also select an intra prediction mode having the best rate distortion characteristics among the test modes.

The intra predictor 122 selects one intra prediction mode among a plurality of intra prediction modes, and predicts the current block by using neighboring pixels (reference pixels) determined according to the selected intra prediction mode and an algorithm equation. Information about the selected intra prediction mode is encoded by the entropy encoder 155 and transmitted to a video decoding device.

The inter predictor 124 generates a prediction block of the current block by using a motion compensation process. The inter predictor 124 searches for a block most similar to the current block in a reference picture that has been encoded and decoded earlier than the current picture, and generates a predicted block of the current block by using the searched block. In addition, a Motion Vector (MV) is generated, which corresponds to a displacement (displacement) between a current block in the current image and a prediction block in the reference image. In general, motion estimation is performed on a luminance (luma) component, and a motion vector calculated based on the luminance component is used for both the luminance component and the chrominance component. Motion information including information of the reference picture and information on a motion vector for predicting the current block is encoded by the entropy encoder 155 and transmitted to a video decoding device.

The inter predictor 124 may also perform interpolation of reference pictures or reference blocks to increase the accuracy of prediction. In other words, the sub-samples are interpolated between two consecutive integer samples by applying the filter coefficients to a plurality of consecutive integer samples comprising the two integer samples. When the process of searching for a block most similar to the current block is performed on the interpolated reference image, the decimal-unit precision may be represented for the motion vector instead of the integer-sample-unit precision. The precision or resolution of the motion vector may be set differently for each target region to be encoded, e.g., a unit such as a slice, tile, CTU, CU, etc. When such adaptive motion vector resolution (adaptive motion vector resolution, AMVR) is applied, information on the motion vector resolution to be applied to each target area should be signaled for each target area. For example, when the target area is a CU, information about the resolution of a motion vector applied to each CU is signaled. The information on the resolution of the motion vector may be information representing the accuracy of a motion vector difference to be described below.

On the other hand, the inter predictor 124 may perform inter prediction by using bi-directional prediction. In the case of bi-prediction, two reference pictures and two motion vectors representing block positions most similar to the current block in each reference picture are used. The inter predictor 124 selects a first reference picture and a second reference picture from the reference picture list0 (RefPicList 0) and the reference picture list1 (RefPicList 1), respectively. The inter predictor 124 also searches for a block most similar to the current block in the corresponding reference picture to generate a first reference block and a second reference block. Further, a prediction block of the current block is generated by averaging or weighted-averaging the first reference block and the second reference block. Further, motion information including information on two reference pictures for predicting the current block and information on two motion vectors is transmitted to the entropy encoder 155. Here, the reference image list0 may be constituted by an image preceding the current image in display order among the pre-restored images, and the reference image list1 may be constituted by an image following the current image in display order among the pre-restored images. However, although not particularly limited thereto, a pre-restored image following the current image in the display order may be additionally included in the reference image list 0. Conversely, a pre-restored image preceding the current image may be additionally included in the reference image list 1.

In order to minimize the amount of bits consumed for encoding motion information, various methods may be used.

For example, when a reference image and a motion vector of a current block are identical to those of a neighboring block, information capable of identifying the neighboring block is encoded to transmit motion information of the current block to a video decoding apparatus. This method is called merge mode (merge mode).

In the merge mode, the inter predictor 124 selects a predetermined number of merge candidate blocks (hereinafter, referred to as "merge candidates") from neighboring blocks of the current block.

As the neighboring blocks used to derive the merge candidates, all or some of the left block A0, the lower left block A1, the upper block B0, the upper right block B1, and the upper left block B2 adjacent to the current block in the current image may be used, as shown in fig. 4. In addition, in addition to the current picture in which the current block is located, a block located within a reference picture (which may be the same as or different from the reference picture used to predict the current block) may also be used as a merging candidate. For example, a co-located block (co-located block) of a current block within a reference picture or a block adjacent to the co-located block may additionally be used as a merging candidate. If the number of merging candidates selected by the above method is less than a preset number, a zero vector is added to the merging candidates.

The inter predictor 124 configures a merge list including a predetermined number of merge candidates by using neighboring blocks. A merge candidate to be used as motion information of the current block is selected from among the merge candidates included in the merge list, and merge index information for identifying the selected candidate is generated. The generated merging index information is encoded by the entropy encoder 155 and transmitted to a video decoding apparatus.

The merge skip mode is a special case of the merge mode. After quantization, when all transform coefficients used for entropy coding are near zero, only neighboring block selection information is transmitted without transmitting a residual signal. By using the merge skip mode, relatively high encoding efficiency can be achieved for images with slight motion, still images, screen content images, and the like.

Hereinafter, the merge mode and the merge skip mode are collectively referred to as a merge/skip mode.

Another method for encoding motion information is advanced motion vector prediction (advanced motion vector prediction, AMVP) mode.

In the AMVP mode, the inter predictor 124 derives a motion vector prediction candidate for a motion vector of a current block by using neighboring blocks of the current block. As the neighboring blocks used to derive the motion vector prediction candidates, all or some of the left block A0, the lower left block A1, the upper side block B0, the upper right block B1, and the upper left block B2 adjacent to the current block in the current image shown in fig. 4 may be used. In addition, in addition to the current picture in which the current block is located, a block located within a reference picture (which may be the same as or different from a reference picture used to predict the current block) may also be used as a neighboring block used to derive a motion vector prediction candidate. For example, a co-located block of the current block within the reference picture or a block adjacent to the co-located block may be used. If the number of motion vector candidates selected by the above method is less than a preset number, a zero vector is added to the motion vector candidates.

The inter predictor 124 derives a motion vector prediction candidate by using the motion vector of the neighboring block, and determines a motion vector prediction of the motion vector of the current block by using the motion vector prediction candidate. In addition, a motion vector difference is calculated by subtracting a motion vector prediction from a motion vector of the current block.

Motion vector prediction may be obtained by applying a predefined function (e.g., median and average calculations, etc.) to the motion vector prediction candidates. In this case, the video decoding device is also aware of the predefined function. Further, since the neighboring block used to derive the motion vector prediction candidates is a block for which encoding and decoding have been completed, the video decoding apparatus may also already know the motion vector of the neighboring block. Therefore, the video encoding device does not need to encode information for identifying motion vector prediction candidates. Accordingly, in this case, information on a motion vector difference and information on a reference image for predicting a current block are encoded.

On the other hand, motion vector prediction may also be determined by selecting a scheme of any one of the motion vector prediction candidates. In this case, the information for identifying the selected motion vector prediction candidates is additionally encoded together with the information about the motion vector difference and the information about the reference picture for predicting the current block.

The subtractor 130 generates a residual block by subtracting the current block from the prediction block generated by the intra predictor 122 or the inter predictor 124.

The transformer 140 transforms a residual signal in a residual block having pixel values of a spatial domain into transform coefficients of a frequency domain. The transformer 140 may transform a residual signal in a residual block by using the entire size of the residual block as a transform unit, or may divide the residual block into a plurality of sub-blocks and perform the transform by using the sub-blocks as transform units. Alternatively, the residual block is divided into two sub-blocks, i.e., a transform region and a non-transform region, to transform the residual signal by using only the transform region sub-block as a transform unit. Here, the transform region sub-block may be one of two rectangular blocks having a size ratio of 1:1 based on a horizontal axis (or a vertical axis). In this case, a flag (cu_sbt_flag) indicating that only the sub-block is transformed, and direction (vertical/horizontal) information (cu_sbt_horizontal_flag) and/or position information (cu_sbt_pos_flag) are encoded by the entropy encoder 155 and signaled to the video decoding apparatus. In addition, the size of the transform region sub-block may have a size ratio of 1:3 based on the horizontal axis (or vertical axis). In this case, a flag (cu_sbt_quad_flag) dividing the corresponding division is additionally encoded by the entropy encoder 155 and signaled to the video decoding device.

On the other hand, the transformer 140 may perform transformation of the residual block separately in the horizontal direction and the vertical direction. For this transformation, various types of transformation functions or transformation matrices may be used. For example, the pair-wise transformation function for horizontal and vertical transformations may be defined as a transformation set (multiple transform set, MTS). The transformer 140 may select one transform function pair having the highest transform efficiency in the MTS and transform the residual block in each of the horizontal and vertical directions. Information (mts_idx) about the transform function pairs in the MTS is encoded by the entropy encoder 155 and signaled to the video decoding means.

The quantizer 145 quantizes the transform coefficient output from the transformer 140 using a quantization parameter, and outputs the quantized transform coefficient to the entropy encoder 155. The quantizer 145 may also immediately quantize the relevant residual block without transforming any block or frame. The quantizer 145 may also apply different quantization coefficients (scaling values) according to the positions of the transform coefficients in the transform block. A quantization matrix applied to quantized transform coefficients arranged in two dimensions may be encoded and signaled to a video decoding apparatus.

The reordering unit 150 may perform the rearrangement of the coefficient values on the quantized residual values.

The rearrangement unit 150 may change the 2D coefficient array to a 1D coefficient sequence by using coefficient scanning. For example, the rearrangement unit 150 may scan the DC coefficients to the coefficients of the high frequency region using zigzag scanning (zig-zag scan) or diagonal scanning (diagonal scan) to output a 1D coefficient sequence. Instead of the zig-zag scan, a vertical scan that scans the 2D coefficient array in the column direction and a horizontal scan that scans the 2D block type coefficients in the row direction may also be utilized, depending on the size of the transform unit and the intra prediction mode. In other words, the scanning method to be used may be determined in zigzag scanning, diagonal scanning, vertical scanning, and horizontal scanning according to the size of the transform unit and the intra prediction mode.

The entropy encoder 155 encodes the sequence of the 1D quantized transform coefficients output from the rearrangement unit 150 by using various encoding schemes including Context-based adaptive binary arithmetic coding (Context-based Adaptive Binary Arithmetic Code, CABAC), exponential golomb (Exponential Golomb), and the like to generate a bitstream.

Further, the entropy encoder 155 encodes information related to block division (e.g., CTU size, CTU division flag, QT division flag, MTT division type, MTT division direction, etc.) so that the video decoding apparatus can divide blocks equally to the video encoding apparatus. Further, the entropy encoder 155 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction. The entropy encoder 155 encodes intra prediction information (i.e., information about an intra prediction mode) or inter prediction information (a merge index in the case of a merge mode, and information about a reference picture index and a motion vector difference in the case of an AMVP mode) according to a prediction type. Further, the entropy encoder 155 encodes information related to quantization (i.e., information about quantization parameters and information about quantization matrices).

The inverse quantizer 160 inversely quantizes the quantized transform coefficient output from the quantizer 145 to generate a transform coefficient. The inverse transformer 165 transforms the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain to restore a residual block.

The adder 170 adds the restored residual block and the prediction block generated by the predictor 120 to restore the current block. The pixels in the restored current block are used as reference pixels when intra-predicting the next block.

The loop filtering unit 180 performs filtering on the restored pixels to reduce block artifacts (blocking artifacts), ringing artifacts (ringing artifacts), blurring artifacts (blurring artifacts), etc., which occur due to block-based prediction and transform/quantization. The loop filtering unit 180 as an in-loop filter may include all or some of a deblocking filter 182, a sample adaptive offset (sample adaptive offset, SAO) filter 184, and an adaptive loop filter (adaptive loop filter, ALF) 186.

Deblocking filter 182 filters boundaries between restored blocks to remove block artifacts (blocking artifacts) that occur due to block unit encoding/decoding, and SAO filter 184 and ALF 186 additionally filter the deblock filtered video. The SAO filter 184 and ALF 186 are filters for compensating for differences between restored pixels and original pixels that occur due to lossy coding (loss coding). The SAO filter 184 applies an offset as a CTU unit to enhance subjective image quality and coding efficiency. In contrast, the ALF 186 performs block unit filtering, and applies different filters to compensate for distortion by dividing boundaries of respective blocks and the degree of variation. Information about filter coefficients to be used for ALF may be encoded and signaled to the video decoding apparatus.

The restored blocks filtered by the deblocking filter 182, the SAO filter 184, and the ALF 186 are stored in the memory 190. When all blocks in one image are restored, the restored image may be used as a reference image for inter-predicting blocks within a picture to be subsequently encoded.

Fig. 5 is a functional block diagram of a video decoding apparatus in which the techniques of the present invention may be implemented. Hereinafter, with reference to fig. 5, a video decoding apparatus and sub-components of the apparatus are described.

The video decoding apparatus may be configured to include an entropy decoder 510, a reordering unit 515, an inverse quantizer 520, an inverse transformer 530, a predictor 540, an adder 550, a loop filtering unit 560, and a memory 570.

Similar to the video encoding apparatus of fig. 1, each component of the video decoding apparatus may be implemented as hardware or software, or as a combination of hardware and software. In addition, the function of each component may be implemented as software, and the microprocessor may also be implemented to execute the function of the software corresponding to each component.

The entropy decoder 510 extracts information related to block segmentation by decoding a bitstream generated by a video encoding apparatus to determine a current block to be decoded, and extracts prediction information required to restore the current block and information on a residual signal.

The entropy decoder 510 determines the size of CTUs by extracting information about the CTU size from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), and partitions a picture into CTUs having the determined size. Further, the CTU is determined as the highest layer (i.e., root node) of the tree structure, and the partition information of the CTU is extracted to partition the CTU by using the tree structure.

For example, when dividing a CTU by using the QTBTTT structure, first a first flag (qt_split_flag) related to the division of QT is extracted to divide each node into four nodes of the lower layer. In addition, a second flag (mtt_split_flag), a split direction (vertical/horizontal), and/or a split type (binary/trigeminal) related to the split of the MTT are extracted with respect to a node corresponding to the leaf node of the QT to split the corresponding leaf node into an MTT structure. As a result, each node below the leaf node of QT is recursively partitioned into BT or TT structures.

As another example, when a CTU is divided by using the QTBTTT structure, a CU division flag (split_cu_flag) indicating whether to divide the CU is extracted. When the corresponding block is partitioned, a first flag (qt_split_flag) may also be extracted. During the segmentation process, recursive MTT segmentation of 0 or more times may occur after recursive QT segmentation of 0 or more times for each node. For example, for CTUs, MTT partitioning may occur immediately, or conversely, QT partitioning may occur only multiple times.

As another example, when dividing the CTU by using the QTBT structure, a first flag (qt_split_flag) related to the division of QT is extracted to divide each node into four nodes of the lower layer. In addition, a split flag (split_flag) indicating whether or not a node corresponding to a leaf node of QT is further split into BT and split direction information are extracted.

On the other hand, when the entropy decoder 510 determines the current block to be decoded by using the partition of the tree structure, the entropy decoder 510 extracts information on a prediction type indicating whether the current block is intra-predicted or inter-predicted. When the prediction type information indicates intra prediction, the entropy decoder 510 extracts syntax elements for intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the entropy decoder 510 extracts information representing syntax elements of the inter prediction information, i.e., a motion vector and a reference picture to which the motion vector refers.

Further, the entropy decoder 510 extracts quantization-related information and extracts information on transform coefficients of the quantized current block as information on a residual signal.

The reordering unit 515 may change the sequence of the 1D quantized transform coefficients entropy-decoded by the entropy decoder 510 into a 2D coefficient array (i.e., block) again in the reverse order of the coefficient scan order performed by the video encoding device.

The inverse quantizer 520 inversely quantizes the quantized transform coefficient and inversely quantizes the quantized transform coefficient by using a quantization parameter. The inverse quantizer 520 may also apply different quantization coefficients (scaling values) to the quantized transform coefficients arranged in 2D. The inverse quantizer 520 may perform inverse quantization by applying a matrix of quantized coefficients (scaled values) from the video encoding device to a 2D array of quantized transform coefficients.

The inverse transformer 530 restores a residual signal by inversely transforming the inversely quantized transform coefficients from the frequency domain to the spatial domain to generate a residual block of the current block.

Further, when the inverse transformer 530 inversely transforms a partial region (sub-block) of the transform block, the inverse transformer 530 extracts a flag (cu_sbt_flag) transforming only the sub-block of the transform block, direction (vertical/horizontal) information (cu_sbt_horizontal_flag) of the sub-block, and/or position information (cu_sbt_pos_flag) of the sub-block. The inverse transformer 530 also inversely transforms transform coefficients of the corresponding sub-block from the frequency domain to the spatial domain to restore a residual signal, and fills the region that is not inversely transformed with a value of "0" as the residual signal to generate a final residual block of the current block.

Further, when applying MTS, the inverse transformer 530 determines a transform index or a transform matrix to be applied in each of the horizontal direction and the vertical direction by using MTS information (mts_idx) signaled from the video encoding apparatus. The inverse transformer 530 also performs inverse transformation on the transform coefficients in the transform block in the horizontal direction and the vertical direction by using the determined transform function.

The predictor 540 may include an intra predictor 542 and an inter predictor 544. The intra predictor 542 is activated when the prediction type of the current block is intra prediction, and the inter predictor 544 is activated when the prediction type of the current block is inter prediction.

The intra predictor 542 determines an intra prediction mode of the current block among the plurality of intra prediction modes according to syntax elements of the intra prediction mode extracted from the entropy decoder 510. The intra predictor 542 also predicts the current block by using neighboring reference pixels of the current block according to an intra prediction mode.

The inter predictor 544 determines a motion vector of the current block and a reference picture to which the motion vector refers by using syntax elements of the inter prediction mode extracted from the entropy decoder 510.

The adder 550 restores the current block by adding the residual block output from the inverse transformer 530 to the prediction block output from the inter predictor 544 or the intra predictor 542. In intra prediction of a block to be decoded later, pixels within the restored current block are used as reference pixels.

The loop filtering unit 560, which is an in-loop filter, may include a deblocking filter 562, an SAO filter 564, and an ALF 566. Deblocking filter 562 performs deblocking filtering on boundaries between restored blocks to remove block artifacts occurring due to block unit decoding. The SAO filter 564 and ALF 566 perform additional filtering on the restored block after deblocking filtering to compensate for differences between restored pixels and original pixels that occur due to lossy encoding. The filter coefficients of the ALF are determined by using information on the filter coefficients decoded from the bitstream.

The restored blocks filtered by the deblocking filter 562, the SAO filter 564, and the ALF 566 are stored in the memory 570. When all blocks in one image are restored, the restored image may be used as a reference image for inter-predicting blocks within a picture to be subsequently encoded.

The present embodiment relates to encoding and decoding of an image (video) as described above. More specifically, the present embodiment provides a video encoding/decoding method and apparatus for modifying an intra prediction mode of a current block in a direction suitable for a block of a sub-partition by considering a shape of the block of the sub-partition, a direction of the sub-partition, and a prediction direction of the current block. Based on the modified intra prediction mode of the current block, the video encoding/decoding method and apparatus generate an intra prediction mode of a block of a sub-partition to efficiently perform intra prediction based on the block of each sub-partition.

In the following description, the aspect ratio of a block is defined as a value obtained by dividing the horizontal length of the block by the vertical length, i.e., a ratio between the horizontal length and the vertical length. In general, the shape of the current block may be different from the shape of the subdivided block. The aspect ratio of a block may quantify the shape of the block. In the following description, when the shapes of two blocks are the same, this indicates that the aspect ratios of the two blocks are also the same. Furthermore, the similarity of the shapes of the two blocks indicates a similar aspect ratio between the two blocks.

I. Intra prediction and Intra Sub-Partition (ISP)

In the VVC technique, the intra prediction mode of the luminance block has a finely divided direction mode (i.e., 14 to 80) in addition to the non-direction mode (i.e., the plane mode and the DC mode), as shown in fig. 3a and 3 b. Based on the prediction mode, there are several techniques available for improving the coding efficiency of intra prediction. After sub-partitioning the current block into small blocks of the same size, ISP techniques share intra prediction modes among all sub-blocks. However, ISP techniques may apply a transform to each sub-block. Here, the sub-division of the block may be performed in a horizontal or vertical direction.

In the following description, as shown in fig. 6, a large block before being sub-partitioned is referred to as a current block, and a small block of each sub-partition is referred to as a sub-block.

ISP technology operates as follows.

The video encoding apparatus signals to the video decoding apparatus an intra_sub-areas_mode_flag indicating whether to apply the ISP and an intra_sub-areas_split_flag indicating the sub-division method. Partition types for sub-partitions according to intra_sub-partitions_mode_flag and intra_sub-partitions_split_flag are shown in table 1.

TABLE 1

IntraSubPartitionsSplitType	Name of IntraParticSplitType
		0	ISP_NO_SPLIT
1	ISP_HOR_SPLIT
		2	ISP_VER_SPLIT

The ISP technology sets the partition type intrasubpartitionsplit type as follows.

When intra_sub_modes_flag is 0, intra_sub_partitionsplit type is set to 0, and sub-block division is not performed. In other words, no ISP is applied.

If intra_sub_modes_flag is not 0, ISP is applied. Here, the intra_sub_sub_split type is set to a value of 1+intra_sub_split_flag, and sub-block division is performed according to a division type. Horizontal sub-block division (isp_hor_split) is performed if intra-sub-partitionsplit type=1, and sub-block division (isp_ver_split) is performed in the vertical direction if intra-sub-partitionsplit type=2. In other words, intra_sub_split_flag may indicate the direction of sub-block division.

For example, when the ISP mode of sub-partitioning a block in the horizontal direction is applied to the current block, intra_sub-partition type is 1, intra_sub-partition_mode_flag is 1, and intra_sub-partition_partition_flag is 0.

In the following description, intra_sub-partitions_mode_flag is referred to as a sub-block partition application flag, intra_sub-partitions_split_flag is referred to as a sub-block partition direction flag, and intra sub-partitions split type is referred to as a sub-block partition type.

Also, isp_hor_split is used interchangeably with horizontal SPLIT, and isp_ver_split is used interchangeably with vertical SPLIT.

When a current block is divided horizontally or vertically, ISP applications may be restricted according to the size of the current block during division to prevent blocks that are too small from being divided. In other words, when the current block size is 4×4, the ISP is not applied. A block of size 4 x 8 or 8 x 4 may be divided into two sub-blocks of the same shape and size, which is called Half Split. Other sized blocks may be partitioned into four sub-blocks of the same shape and size, referred to as quater Split.

The video encoding device sequentially encodes each sub-block. Here, each sub-block shares the same intra prediction information. In intra prediction for encoding each sub-block, the video encoding apparatus may improve compression efficiency by using reconstructed pixels in previously encoded sub-blocks as predicted pixel values of subsequent sub-blocks.

However, as described above, existing methods of dividing one block into a plurality of sub-blocks but sharing one prediction mode among the sub-blocks exhibit efficiency problems in some respects. Although the intra prediction direction applied to the current block may not be the optimal prediction direction in the sub-partitioned block, the conventional ISP technique has a problem in effectively solving this phenomenon. This problem becomes more pronounced when using wide-angle intra prediction (WAIP) techniques, where the prediction direction is determined by considering the aspect ratio of the block.

For example, as shown in fig. 7, consider an intra prediction case in which the current block is a vertically long rectangular block, and the prediction mode 66 is signaled. At this time, the video encoding apparatus determines the prediction mode for actual encoding as a direction of-1 according to the application of the WAIP technique. In other words, the prediction mode for actual encoding is modified to-1 indicated as the prediction direction in the example of fig. 7, since 67 is subtracted from the prediction mode 66 indicated as the signal direction in the example of fig. 7. As shown in the right example of fig. 7, the video encoding apparatus performs intra prediction on all sub-blocks in the-1 direction when the current block is sub-partitioned. Thus, when the sub-blocks are encoded sequentially, there may be a potential technical problem that the video encoding device cannot use the reconstructed pixels in the sub-block (1) as the predicted pixel values for encoding the sub-block (2).

In the following description, various embodiments are disclosed to solve the above technical problems.

On the other hand, the intra predictor 122 of the video encoding apparatus and the intra predictor 542 of the video decoding apparatus may perform the following embodiments. In the following description, in order to avoid repetition of the description, the present embodiment is described from the viewpoint of the intra predictor 542 within the video decoding apparatus.

Sub-blocks of various shapes and sizes

The present embodiment can solve the above technical problems by providing sub-blocks having more diverse shapes. In the prior art, when sub-partitioning a block, the block is partitioned only in a fixed horizontal or vertical direction. On the other hand, in the present embodiment, as shown in fig. 8, a series of sub-partition shapes can be used to comprehensively accommodate various situations. In the following description, information related to the example of fig. 8 is referred to as sub-partition information.

On the other hand, fig. 3b shows an intra prediction mode for intra prediction, but it should be noted that other prediction modes exist as well.

Modification of sub-block based intra prediction modes

In this embodiment, the intra predictor 542 within the video decoding apparatus may modify an intra prediction mode based on a sub-block obtained from a sub-partition of the current block.

The prediction mode modifier 910 according to the present embodiment modifies the prediction mode of the current block using the sub-block information and the modification method to generate a modified prediction mode for intra prediction of the sub-block. The prediction mode modifier 910 may be included in an intra predictor 542 within the video decoding device. The intra predictor 542 may perform intra prediction of the sub-block using the modified prediction mode.

The sub-block information may include all or part of the size, width, height, aspect ratio, and division direction of the sub-blocks, and the number of sub-blocks. In this embodiment, the sub-blocks have a uniform shape.

The prediction mode modifier 910 may utilize information of the current block in addition to the intra prediction mode of the current block. Here, the information about the current block may include all or part of the size, width, height, and aspect ratio of the current block.

Various implementations of the present embodiment are possible depending on the sub-block information, the modification method, and the target application. In the following description, various embodiments will be described. For convenience of explanation, it is assumed that the current block is square.

In the following description, an embodiment in which the prediction mode modifier 910 uses the aspect ratio of a sub-block as sub-block information is described. As described above, the aspect ratio of the current block and the aspect ratio of the sub-block of the current block may be different. Thus, unlike the existing method applied on each current block before sub-partitioning, the prediction mode modifier 910 applies a block aspect ratio based technique (e.g., WAIP) based on each sub-block and modifies the prediction mode of the sub-block. Thus, the coding efficiency can be improved as compared with the existing method.

For example, as shown in the examples of fig. 10a and 10b, consider a case where the prediction mode of the current block is 66, and the current block is sub-partitioned into sub-blocks (1), (2), (3), and (4) having aspect ratios of 1:4. In this case, the prediction mode 66 of the current block is inefficient in intra prediction of the sub-block. Accordingly, the intra prediction mode may be modified to prevent such inefficient prediction. For example, if the sub-block is long in the horizontal direction, the prediction mode modifier 910 may modify the intra prediction mode to one of the prediction modes belonging to the vertical group. If the sub-block is vertically long, the prediction mode modifier 910 may modify the intra prediction mode to one of the prediction modes belonging to the horizontal group. As described above, the prediction mode modifier 910 may improve coding efficiency by modifying an intra prediction mode based on a sub-block.

Here, the horizontal group represents a group of prediction modes smaller than or equal to the prediction mode 34 corresponding to the upper left diagonal line in the example of fig. 3b, and the vertical group represents a group of prediction modes larger than the prediction mode 34.

The prediction mode modifier 910 may modify the intra prediction mode by comparing the aspect ratio of the sub-block with a preset threshold. For example, if the aspect ratio is greater than a certain ratio (i.e., when the block is significantly long in the horizontal (or vertical) direction) or small (i.e., when the block is not significantly long in the horizontal (or vertical) direction), the prediction mode modifier 910 may modify the intra prediction mode.

On the other hand, the prediction mode modifier 910 may use a method for modifying a prediction mode by selecting one method from the following methods.

As shown in fig. 10a, the prediction mode modifier 910 may generate a modified prediction mode by rotating the prediction mode by a specific angle S. Here, the angle S may be 0, 45, 90, 135, 180, 225, 270, 325 degrees, etc., depending on the implementation. Fig. 10a shows a case where S is 180 degrees.

Alternatively, the prediction mode modifier 910 may set the modified prediction mode to a specific mode X, as shown in fig. 10 b. The specific pattern X may be one of prediction patterns grouped according to a predetermined standard, or may be a predetermined pattern.

Here, the predetermined criterion for grouping the prediction modes may be one of a vertical group, a horizontal group, a diagonal direction of the current block, a diagonal direction of the sub-block, a direction mode group, a non-direction mode group, a prediction mode group calculated according to machine learning, or a combination of some of the above.

Further, the predetermined mode may be designated as one of a planar mode, a DC mode, a diagonal direction mode of a sub-block, a diagonal direction mode of a current block, a mode calculated through machine learning, a vertical mode, a horizontal mode, or a diagonal direction mode.

Here, the diagonal line represents an upper right direction diagonal line of the block, and the diagonal line group represents a group of direction patterns corresponding to the upper right direction diagonal lines of the plurality of blocks having different aspect ratios. For example, the diagonal direction pattern may be 66, which is the upper right direction pattern in the example of fig. 3 b.

In the following description, an embodiment in which the prediction mode modifier 910 uses the partition direction of a sub-block as information about the sub-block is described.

The prediction mode modifier 910 may select a prediction direction advantageous for intra prediction according to the sub-partition direction. As shown in fig. 10a, when the sub-division is performed in the vertical direction and the prediction mode in the direction 2 is utilized, the encoding efficiency can be improved as compared to the prediction mode in the direction 66. This feature is obtained due to the availability of reconstructed reference pixels to be used for prediction. Thus, according to the present embodiment, the prediction mode modifier 910 may modify the prediction mode according to the direction of the sub-partition. In other words, the prediction mode modifier 910 may utilize different modification methods for respective cases in which the sub-partitions are performed in the vertical and horizontal directions. For example, when the sub-partition is vertically performed, the prediction mode modifier 910 modifies the prediction mode to a mode corresponding to a horizontal group. When the sub-partition is horizontally performed, the prediction mode modifier 910 modifies the prediction mode to a mode corresponding to a vertical group. Here, the prediction mode modifier 910 may utilize the method for modifying the prediction mode as described above.

The above implementation assumes that the sub-blocks have a uniform shape. In another embodiment, as shown in fig. 8, when sub-block consistency cannot be guaranteed, the intra prediction mode may be modified with reference to the representative block in the sub-block.

The prediction mode modifier 910 according to another embodiment modifies the prediction mode of the current block using the representative block information and the modification method to generate a modified prediction mode for intra prediction of the sub-block. Here, various sub-partition structures may be utilized, as shown in fig. 8. In particular, the present embodiment can be applied even when the sub-blocks have different shapes.

On the other hand, the representative block may be selected according to at least one of the examples in fig. 12a to 12 e.

As shown in fig. 12a, a sub-block at a specific position within the current block may be selected as a representative block. Here, the representative blocks may be positioned at various points, including center, left side, right side, above, below, upper left side, lower right side, and edges.

As shown in fig. 12b, the largest sub-block within the current block may be selected as the representative block.

As shown in fig. 12c, the smallest sub-block within the current block may be selected as the representative block.

As shown in fig. 12d, a sub-block having the same shape as the current block may be selected as the representative block. Depending on the application, it may be implemented in a modified form such that the sub-block having the most similar shape to the current block is selected as the representative block.

As shown in fig. 12e, a sub-block having the most frequently occurring shape among the sub-blocks may be selected as the representative block. In other words, the sub-block having the most frequently occurring shape is selected as a representative block among the sub-blocks of the sub-partition. For example, when the current block is sub-partitioned into six sub-blocks having sizes of 4×16, 8×16, 4×4, and 4×4, the sub-block having the size of the most frequently occurring sub-block size 4×4 is selected as the representative block.

Alternatively, the video encoding device may select the representative block based on optimization of bit rate distortion. After selecting the representative block using the representative block selection method based on the sub-partition information, the video encoding apparatus may transmit the representative block information to the video decoding apparatus.

The representative block information may describe characteristics of a representative block selected from sub-blocks of the partition, including at least one of a position, a size, a width, a height, a shape (or an aspect ratio), or a prediction mode of the representative block. Here, the prediction mode representing the block may be the same as the prediction mode of the current block.

As another embodiment, information indicating a method for selecting a representative block may be signaled from a video encoding apparatus to a video decoding apparatus. After selecting the representative block using the indicated selection method, the video decoding apparatus may derive representative block information.

With the representative block information, the prediction mode modifier 910 modifies the prediction modes of the representative block and the remaining sub-blocks according to the modification method as described above. The video decoding apparatus may decode all the sub-blocks in the modified prediction mode. Here, the information representing the block and the modification method according to the information used by the prediction mode modifier 910 are as follows.

In the following description, an embodiment in which the prediction mode modifier 910 uses a representative block size as information of a representative block is described.

In some cases, the size, height or width of the sub-blocks may be significantly reduced. In this case, the application of the current technology employing various directions to the intra prediction mode may lead to suboptimal coding efficiency. Accordingly, the prediction mode modifier 910 may modify the intra prediction mode based on at least one or more conditions of the size, height, and width of the block. In other words, the prediction mode modifier 910 modifies the prediction mode when the block size corresponds to a specific condition, thereby preventing the coding efficiency from being unnecessarily deteriorated.

In the following description, specific conditions concerning the size are described with reference to the example of fig. 14.

The prediction mode modifier 910 may modify the prediction mode when the height or width of the representative block is greater than N1. Here, N1 may be 1, 2, 4, 8, 16, 32, 64, or 128, depending on the implementation.

The prediction mode modifier 910 may modify the prediction mode when the size of the representative block is greater than N2. Here, N2 may be 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, or 1024, depending on the implementation. Furthermore, the size of a block may be calculated as the product of the height and width of the block.

When the above specific condition is satisfied, the prediction mode modifier 910 may modify the prediction mode of the sub-block to an intra prediction mode different from the intra prediction mode of the current block. Here, the prediction mode modifier 910 may utilize the modification method described above.

For example, assume that the prediction mode of the current block is 52 and the current block is sub-partitioned as shown in fig. 15 a. In this case, the video decoding apparatus may modify the prediction direction according to the condition after setting the representative block, as shown in fig. 15 a. If the representative block size is greater than N2, the prediction mode modifier 910 may divide the prediction mode into a non-directional mode group and a directional mode group, as shown in fig. 15a, and then modify the prediction modes of the representative block and the remaining sub-blocks into plane modes included in the non-directional mode group. The video decoding apparatus may perform intra prediction of the sub-block using the modified prediction mode.

As another example, as shown in fig. 15b, when a preset mode is selected in the vertical direction (VER), the prediction mode modifier 910 may modify the prediction modes representing the block and the remaining sub-blocks in the vertical direction. Alternatively, the prediction mode modifier 910 may modify the prediction modes representing the block and the remaining sub-blocks to VER modes obtained by rotating the prediction mode of the current block by S. As with the two examples of the illustration according to fig. 15b, the prediction mode modifier 910 may derive the same result regardless of the change in implementation.

In the following description, an embodiment in which the prediction mode modifier 910 uses a representative block shape as information representative of a block is described.

The shape of the sub-blocks of the sub-partition may be different from the shape of the current block. Accordingly, when a prediction mode for signaling a current block is applied to the current block and the sub-block, the coding efficiency may vary. This problem occurs more severely when using the WAIP technique; whether the restored reference pixels are available or use existing reference pixels, coding efficiency may be significantly affected. Accordingly, the prediction mode modifier 910 may modify the prediction modes of the representative block and the remaining sub-blocks according to the shape of the representative block, and the video decoding apparatus may perform intra prediction of the representative block and the remaining sub-blocks using the modified prediction modes.

In the following description, a condition for determining whether modification is necessary is described with reference to the examples of fig. 16a and 16 b.

When the shape of the representative block is different from the shape of the current block, the prediction mode modifier 910 may modify the prediction mode of the sub-block, as shown in fig. 16 a.

When the shape of the representative block is not square, the prediction mode modifier 910 may modify the prediction mode of the sub-block, as shown in fig. 16 b.

When the modification condition is satisfied, the prediction mode modifier 910 may modify the prediction mode of the sub-block into an intra prediction mode having a value different from that of the current block. Here, the prediction mode modifier 910 may utilize the modification method described above.

For example, assume that the prediction mode of the current block is 66, and the current block is sub-partitioned as shown in fig. 17. At this time, the video decoding apparatus may modify the prediction direction according to the condition after setting the representative block as in the example of fig. 17. Since the example of fig. 17 satisfies a condition representing that the shape of the block is not square, which is a modification condition according to the example of fig. 16b, the prediction mode modifier 910 may modify the intra prediction mode. The prediction mode modifier 910 may rotate a prediction direction, which is one of the methods for modifying a prediction mode described above, by 180 degrees and set a prediction mode representing a block and the remaining sub-blocks to a prediction mode 2, as shown in the example of fig. 17. The video decoding apparatus may perform intra prediction representing the block and the remaining sub-blocks using the modified prediction mode.

On the other hand, in order to quantize the shape of a block (or sub-block), the prediction mode modifier 910 may use a block (or sub-block) aspect ratio.

In the following description, an embodiment in which the prediction mode modifier 910 uses the position of the representative block as information of the representative block is described.

When reconstructing sub-blocks of a sub-partition in sequential order, a sub-block reconstructed later may refer to pixel values of a newly reconstructed sub-block. In this case, the encoding efficiency can be further improved by using a pixel value closer to or more similar to the original pixel value as a reference pixel. Thus, during reconstruction according to intra prediction, if a newly reconstructed pixel value is not available due to the representative block being located at a specific position, the prediction mode modifier 910 may modify the prediction mode of the sub-block using the same method as the modification method described above.

On the other hand, the specific position may be the center, above, below, right side, left side, upper right side, lower left side, or lower right side of the current block.

For example, as shown in fig. 18, when the representative block is located at the center of the current block corresponding to a specific position, the prediction mode modifier 910 may modify the prediction modes of the representative block and the remaining sub-blocks into a non-directional mode, which is one of the modification methods described above, i.e., a planar mode.

In the following description, an embodiment in which the prediction mode modifier 910 uses a prediction mode representing a block as information representing the block is described. As described above, the prediction mode of the representative block transmitted from the video encoding apparatus may be the same as the prediction mode of the current block.

When reconstructing sub-blocks of a sub-partition in sequential order, a sub-block reconstructed later may refer to pixel values of a newly reconstructed sub-block. In this case, the encoding efficiency can be further improved by using a pixel value closer to or more similar to the original pixel value as a reference pixel. Thus, during reconstruction according to intra prediction, if the representative block does not utilize the newly reconstructed pixel value, the prediction mode modifier 910 may modify the prediction mode of the sub-block using the same method as the modification method described above.

For example, assume that the intra prediction mode of the current block is 4, and the current block is sub-partitioned as shown in fig. 19. At this time, each sub-block does not utilize the newly reconstructed pixel value. Accordingly, the prediction mode modifier 910 may modify the prediction mode of the sub-block using the modification method described above. As shown in the example of fig. 19, the prediction mode modifier 910 may modify the prediction modes representing the block and the remaining sub-blocks into a non-directional mode (i.e., a planar mode), a specific directional mode (i.e., mode 66), or a directional mode rotated 180 degrees (i.e., mode 68).

As another embodiment, a different modified prediction mode may be generated for each sub-block.

The prediction mode modifier 910 according to another embodiment modifies the prediction mode of the current block using the sub-block information and the modification method, and thus may generate a modified prediction mode for each sub-block. In other words, in the examples of fig. 9 and 11, the intra prediction mode is modified such that all sub-blocks included in the current block use the same prediction mode. However, as shown in fig. 21, the prediction mode modifier 910 according to the present embodiment may modify the intra prediction mode to be different for the corresponding sub-block.

When the present embodiment is applied, since each sub-block uses a different prediction mode, the video decoding apparatus can adaptively select a filter and apply it to the boundary of each sub-block.

On the other hand, the prediction mode modifier 910 may adaptively modify the prediction mode of each sub-block using only one prediction mode without signaling each sub-block. At this time, the prediction mode modifier 910 may utilize the above-described method for modifying the prediction mode.

For example, assume that the intra prediction mode of the current block is 4, and the current block is sub-partitioned as shown in fig. 22. The prediction mode modifier 910 may derive the result shown in the example of fig. 22 using a method for modifying the prediction mode of each sub-block. When the height of the block is greater than N1, a condition for modifying the prediction mode used by the prediction mode modifier 910 is satisfied, and the modification method employed modifies the prediction mode to a planar mode (this is a non-directional mode).

On the other hand, whether to utilize the modified prediction mode for each sub-block may be determined according to a previous protocol between the video encoding apparatus and the video decoding apparatus. Alternatively, the video encoding apparatus may transmit a flag indicating whether to utilize the modified prediction mode for each sub-block to the video decoding apparatus.

In the following description, an embodiment of modifying a prediction mode of a sub-block using a preset model according to a preset order is described.

As another embodiment, after a model for modifying a prediction mode is preset, the prediction mode modifier 910 may modify the prediction mode of the sub-block using the preset model. Here, the preset model that can be utilized is as follows.

For example, the prediction mode modifier 910 may utilize a model such as increasing or decreasing the prediction mode by a variation according to a preset order. Here, the variation may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or the like. In addition, whether the prediction mode increases or decreases within the model may be preset or signaled.

The amount of change in the model may be set in advance between the video encoding apparatus and the video decoding apparatus. As another example, the video decoding apparatus may derive the variation amount by referring to sub-block information, sub-partition information related to the example of fig. 8, or representative block information. Here, the sub-partition information is related to the example of fig. 8, the sub-block information is related to the example of fig. 9, and the representative block information is related to the example of fig. 11.

On the other hand, whether to utilize the preset model may be determined according to a previous protocol between the video encoding apparatus and the video decoding apparatus. Alternatively, the video encoding apparatus may transmit a flag indicating whether to utilize the preset model to the video decoding apparatus.

On the other hand, the preset order refers to an order of encoding (or decoding) sub-blocks within the current block, and may be set as one of the examples of fig. 23.

For example, when the preset order is the same as (1) shown in fig. 23, the prediction mode modifier 910 may select a prediction mode modification order for the sub-blocks as in the example of fig. 24.

In another example, it is assumed that the intra prediction mode of the current block is 66, and the current block is divided into four sub-blocks as shown in fig. 25. As shown in fig. 25, the prediction mode modifier 910 may modify the prediction mode to start at 66 and decrease by 2.

In yet another example, a prediction mode modification flag may be used to modify the prediction mode of a sub-block.

The prediction mode modifier 910 according to another embodiment may modify the prediction mode of the sub-block by selectively applying one of the above-described implementations based on the prediction mode modification flag. The video encoding apparatus may indicate information on how to modify the prediction mode of all or each sub-block by transmitting a mode modification flag (in the following description, the mode modification flag is denoted as "sub_pred_mode_flag") as shown in fig. 26 to the video decoding apparatus.

As in the example of fig. 27, when sub_pred_mode_flag is 0, existing ISP technology is used. When sub_pred_mode_flag is 1, the prediction mode modifier 910 may perform a preset implementation. Here, one of the above-described several implementation examples may be designated as a preset implementation according to a protocol in advance between the video encoding apparatus and the video decoding apparatus. In the example of fig. 27, a preset implementation example generates a modified prediction mode for each sub-block.

In another aspect, the description above relates to a method of modifying a prediction mode of a sub-block in an intra predictor 542 of a video decoding device. However, the same description may be applied to the intra predictor 122 of the video encoding apparatus. In other words, to modify the prediction mode of the sub-block, the intra predictor 122 may further include a prediction mode modifier.

In the following description, a method of modifying a prediction mode of a sub-block by a video decoding apparatus based on sub-block information is described with reference to fig. 28.

The entropy decoder 510 within the video decoding apparatus decodes an intra prediction mode of the current block, information of the current block, and sub-block information from the bitstream S2800. Here, the sub-block information provides information related to a sub-block obtained by partitioning the current block.

The sub-block information may include all or part of information about the size, width, height, aspect ratio, division direction, and number of sub-blocks.

The information about the current block may include all or part of information about the size, width, height, and aspect ratio of the current block.

The intra predictor 542 within the video decoding apparatus selects a method S2802 for modifying a prediction mode based on information of the current block and sub-block information.

The intra predictor 542 may select a method for modifying a prediction mode as follows.

The intra predictor 542 may use a method of rotating the intra prediction mode of the current block by a specific angle S.

Alternatively, the intra predictor 542 may set the modified prediction mode to the specific mode X. The specific pattern X may be one of prediction patterns grouped according to a predetermined standard, or may be a predetermined pattern.

Here, the predetermined criterion for grouping prediction modes may be one of a vertical group, a horizontal group, a diagonal direction of the current block, a diagonal direction of the sub-block, a direction mode group, a non-direction mode group, a prediction mode group calculated according to machine learning, or a combination of some of the above.

The intra predictor 542 modifies the intra prediction mode of the current block based on the method of modifying the prediction mode to generate a modified prediction mode of the sub-block S2804.

In the following description, steps S2806 to S2814 corresponding to the generation of the modified prediction mode S2804 are described in detail.

The intra predictor 542 checks whether or not a modified prediction mode is applied to each sub-block according to the previous convention S2806. When the modified prediction mode is not applied to each sub-block (no at S2806), the intra predictor 542 generates the same modified prediction mode for the sub-block S2808.

When the modified prediction mode is applied to each sub-block (yes at S2806), the intra predictor 542 checks whether or not a preset model according to the previous convention is applied S2810. When the preset model is not applied (no at S2810), the intra predictor 542 generates a modified prediction mode for each sub-block S2812.

When the preset model is applied (yes at S2810), the intra predictor 542 generates a prediction mode of the modified sub-block using the preset model according to a predetermined order S2814.

In the following description, a method of modifying a prediction mode of a sub-block by a video decoding apparatus based on representative block information is described with reference to fig. 29.

The entropy decoder 510 within the video decoding apparatus decodes the intra prediction mode of the current block, information on the current block, and representative block information from the bitstream S2900. On the other hand, the representative block is a block selected from the sub-blocks, and may be selected by the video encoding apparatus according to the representative block selection method.

The representative block information may describe characteristics of the representative block including at least one of a position, a size, a width, a height, or a shape (or aspect ratio) of the representative block.

The intra predictor 542 within the video decoding apparatus selects a method S2902 for modifying a prediction mode based on the information of the current block and the representative block information.

When the representative block size satisfies a certain condition, the intra predictor 542 may select a method for modifying the above prediction mode. Here, when the height or width of the representative block is greater than N1 or the size of the representative block is greater than N2, a specific condition may be satisfied.

The intra predictor 542 may select a method for modifying the prediction mode according to the shape of the representative block. For example, when the shape of the representative block is different from the shape of the current block or is not square, the intra predictor 542 may select a method for modifying the prediction mode.

The intra predictor 542 may select a method for modifying the prediction mode according to the position of the representative block. When the newly reconstructed pixel value is not available due to the representative block being located at a specific position upon reconstruction according to intra prediction, the intra predictor 542 may select a method for modifying the prediction mode. Here, the specific position may be a center, an upper, a lower, a right side, a left side, an upper right side, a lower left side, or a lower right side of the current block.

The intra predictor 542 may select a method for modifying a prediction mode according to a prediction mode representing a block. When reconstructing from intra prediction, the intra predictor 542 may select a method for modifying the prediction mode when the representative block does not utilize the newly reconstructed pixel value.

The intra predictor 542 modifies the intra prediction mode of the current block based on the method of modifying the prediction mode to generate a modified prediction mode of the sub-block S2904.

Since steps corresponding to step S2904 of generating the modified prediction mode are the same as steps S2806 to S2814 according to fig. 28, further description is omitted.

In the following description, a method of modifying a prediction mode of a sub-block by a video encoding apparatus based on sub-block information is described with reference to fig. 30.

As described above, the method for modifying the prediction mode of a sub-block according to the present embodiment may also be performed by the intra predictor 122 of the video encoding apparatus for bit rate distortion analysis. In this case, during the bit rate distortion analysis process, the video encoding apparatus searches for the best intra prediction mode, current block information, and sub-block information of the current block. During the search process, the intra predictor 122 obtains an intra prediction mode of the current block, current block information, and sub-block information S3000.

In the illustration of fig. 30, step S3002 of selecting a method for modifying a prediction mode and step S3004 of generating a modified prediction mode perform the same operations as the corresponding steps in fig. 28. Therefore, a description of the overlapping step is omitted.

The video encoding apparatus may encode the optimal intra prediction mode of the current block according to the bit rate distortion analysis, the current block information, and the sub-block information, and then transmit the encoded information to the video decoding apparatus.

In the following description, a method of modifying a prediction mode of a sub-block by a video encoding apparatus based on representative block information is described with reference to fig. 31.

During the bit rate distortion analysis process, the video encoding apparatus searches for the best intra prediction mode, current block information, and sub-block information of the current block. During the search process, the intra predictor 122 obtains an intra prediction mode of the current block, current block information, and sub-block information S3100.

In the illustration of fig. 31, step S3102 of selecting a method for modifying a prediction mode and step S3104 of generating a modified prediction mode perform the same operations as the corresponding steps in fig. 29. Therefore, a description of the overlapping step is omitted.

Although steps in the respective flowcharts are described as sequentially performed, these steps merely exemplify the technical ideas of some embodiments of the present invention. Accordingly, one of ordinary skill in the relevant art may perform the steps by changing the order depicted in the various figures or by performing two or more steps in parallel. Accordingly, the steps in the various flowcharts are not limited to the order in which they occur as shown.

It should be understood that the foregoing description presents illustrative embodiments that may be implemented in various other ways. The functions described in some embodiments may be implemented by hardware, software, firmware, and/or combinations thereof. It should also be understood that the functional components described in this specification are labeled as "..units" to highlight the possibility of their independent implementation.

On the other hand, the various methods or functions described in some embodiments may be implemented as instructions stored in a non-volatile recording medium, which may be read and executed by one or more processors. Non-volatile recording media include all types of recording devices that store data in a form readable by a computer system, for example. For example, the nonvolatile recording medium may include a storage medium such as an erasable programmable read-only memory (EPROM), a flash memory drive, an optical disk drive, a magnetic hard disk drive, a Solid State Drive (SSD), and the like.

Although the exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention. Accordingly, embodiments of the present invention have been described for brevity and clarity. The scope of the technical idea of the embodiment of the invention is not limited by the illustration. Accordingly, it will be understood by those of ordinary skill that the scope of the present invention is not limited by the embodiments explicitly described above, but is instead limited by the claims and their equivalents.

(reference numerals)

122: intra-frame predictor

510: entropy decoder

542: intra-frame predictor

910: a prediction mode modifier.

Cross Reference to Related Applications

The present application claims priority from korean patent application No.10-2020-0157792, filed 11/23 in 2020, and korean patent application No.10-2021-0162006, filed 11/23 in 2021, the entire contents of which are incorporated herein by reference.

Claims

1. An intra prediction method performed by a video decoding device for generating a prediction mode of a modified sub-block, the method comprising:

decoding an intra prediction mode of a current block, information of the current block, and sub-block information, which provides information related to a sub-block obtained by partitioning the current block, from a bitstream;

Selecting a method for modifying a prediction mode based on information of a current block and sub-block information; and

based on the method for modifying the prediction mode, a modified prediction mode is generated by modifying the intra prediction mode of the current block.

2. The method of claim 1, wherein the sub-block information includes all or part of information about a size, a width, a height, an aspect ratio, a division direction, or a number of sub-blocks.

3. The method of claim 1, wherein the information of the current block includes all or part of information regarding a size, a width, a height, or an aspect ratio of the current block.

4. The method of claim 1, wherein the method for modifying the prediction mode rotates the intra prediction mode of the current block by a specific angle S.

5. The method of claim 1, wherein the method for modifying the prediction mode sets the modified prediction mode to a specific mode, wherein the specific mode is one of prediction modes grouped according to a predetermined standard or is a predetermined mode.

6. The method of claim 5, wherein the predetermined criteria is one of a vertical group, a horizontal group, a diagonal direction of a current block, a diagonal direction of a sub-block, a directional pattern group, a non-directional pattern group, a prediction pattern group calculated from machine learning, or a combination of the above.

7. The method of claim 5, wherein the predetermined mode is one of a planar mode, a DC mode, a diagonal direction mode of a sub-block, a diagonal direction mode of a current block, a mode calculated by machine learning, a vertical mode, a horizontal mode, or a diagonal direction mode.

8. The method of claim 1, wherein generating the modified prediction mode comprises generating the same modified prediction mode for the sub-block.

9. The method of claim 1, wherein generating a modified prediction mode comprises: when the application of the modified prediction mode is agreed in advance for each sub-block, the modified prediction mode is generated for each sub-block.

10. The method of claim 9, wherein generating a modified prediction mode comprises: when the preset model is pre-agreed to be applied, the prediction modes of the modified sub-blocks are generated using the preset model according to a predetermined order.

11. The method of claim 10, wherein the predetermined order is an order in which sub-blocks within a current block are decoded.

12. A video decoding apparatus that generates a prediction mode of a modified sub-block, the apparatus comprising:

an entropy decoder configured to decode an intra prediction mode of a current block, information of the current block, and sub-block information from a bitstream, wherein the sub-block information provides information related to a sub-block obtained by partitioning the current block; and

An intra predictor configured to select a method for modifying a prediction mode based on information of a current block and sub-block information, and generate a modified prediction mode by modifying the intra prediction mode of the current block based on the method for modifying the prediction mode.

13. The apparatus of claim 12, wherein the sub-block information comprises all or part of information about a size, width, height, aspect ratio, splitting direction, or number of sub-blocks.

14. The apparatus of claim 12, wherein the information of the current block comprises all or part of information regarding a size, a width, a height, or an aspect ratio of the current block.

15. The apparatus of claim 12, wherein the means for modifying the prediction mode rotates the prediction mode of the current block by a specific angle S.

16. The apparatus of claim 12, wherein the means for modifying the prediction mode sets the modified prediction mode to a specific mode, wherein the specific mode is one of prediction modes grouped according to a predetermined criterion or is a predetermined mode.

17. The apparatus of claim 16, wherein the predetermined criteria is one of a vertical group, a horizontal group, a diagonal direction of a current block, a diagonal direction of a sub-block, a directional pattern group, a non-directional pattern group, a prediction pattern group calculated from machine learning, or a combination of the above.

18. The apparatus of claim 16, wherein the predetermined mode is one of a planar mode, a DC mode, a diagonal direction mode of a sub-block, a diagonal direction mode of a current block, a mode calculated by machine learning, a vertical mode, a horizontal mode, or a diagonal direction mode.

19. An intra prediction method performed by a video encoding device for generating a prediction mode for a modified sub-block, the method comprising:

obtaining an intra prediction mode of a current block, information of the current block, and sub-block information, wherein the sub-block information provides information related to a sub-block obtained by partitioning the current block;