CN116134812A

CN116134812A - Video encoding and decoding using arbitrary block partitioning

Info

Publication number: CN116134812A
Application number: CN202180057609.0A
Authority: CN
Inventors: 沈东圭; 朴时奈; 朴俊泽; 崔韩松; 朴胜煜; 林和平
Original assignee: Hyundai Motor Co; Industry Academic Collaboration Foundation of Kwangwoon University; Kia Corp
Current assignee: Hyundai Motor Co; Industry Academic Collaboration Foundation of Kwangwoon University; Kia Corp
Priority date: 2020-08-04
Filing date: 2021-08-04
Publication date: 2023-05-16
Also published as: KR20220017380A

Abstract

The present invention provides a video decoding method for decoding a target block using intra prediction. The method comprises the following steps: boundary line information specifying at least one boundary line for dividing a target block according to bitstream decoding, wherein the boundary line information allows the target block to be divided into a plurality of non-rectangular blocks; determining an intra prediction mode of the non-rectangular block based on the boundary line information; generating a prediction block of the target block by performing intra prediction on each of the non-rectangular blocks using the intra prediction mode; reconstructing a residual block of the target block from the bitstream; and reconstructing the target block by adding the prediction block and the residual block.

Description

Video encoding and decoding using arbitrary block partitioning

Technical Field

The present invention relates to video encoding and decoding using arbitrary block partitioning.

Background

Since video data has a large amount of data compared to audio data or still image data, a large amount of hardware resources including a memory are required to store or transmit video data without performing compression processing.

Thus, in general, when storing or transmitting video data, an encoder is used to compress the video data for storage or transmission, and a decoder receives the compressed video data for decompression and playback. Such video compression techniques include h.264/AVC, high Efficiency Video Coding (HEVC), and Versatile Video Coding (VVC), which have an improvement in coding efficiency of about 30% or more over HEVC.

However, since the picture size, resolution, and frame rate gradually increase, the amount of data to be encoded also increases. Accordingly, there is a need for a new compression technique that provides higher coding efficiency and improved image enhancement than existing compression techniques. In particular, compression techniques are needed that can more efficiently encode pictures with complex textures, such as pictures that include edges in different directions (boundaries between objects) due to the presence of various objects.

Disclosure of Invention

Technical problem

The present disclosure proposes a method of efficiently encoding or decoding video including edges in different directions. More specifically, the present disclosure proposes a method of predicting and transforming a block comprising edges that are not horizontally or vertically oriented.

Technical proposal

According to an aspect of the present invention, there is provided a video decoding method for decoding a target block using intra prediction. The method comprises the following steps: boundary line information specifying at least one boundary line for dividing a target block according to bitstream decoding, wherein the boundary line information allows the target block to be divided into a plurality of non-rectangular blocks; determining an intra prediction mode of the non-rectangular block based on the boundary line information; generating a prediction block of the target block by performing intra prediction on each of the non-rectangular blocks using the intra prediction mode; reconstructing a residual block of the target block from the bitstream; and reconstructing the target block by adding the prediction block and the residual block.

Reconstructing the residual block may include: reconstructing a plurality of rectangular transform coefficient blocks from the bitstream; generating a residual sub-block by inversely transforming each of the transform coefficient blocks; the residual block is reconstructed by dividing the target block into a plurality of regions including a boundary region including the boundary line and formed in the vicinity of the boundary line and one or more non-boundary regions not including the boundary line, based on the boundary line specified by the boundary line information, the number of transform coefficient blocks, and the size of each of the transform coefficient blocks, and rearranging the residual signal in each of the residual sub-blocks into a corresponding region.

According to another aspect of the present invention, there is provided a video encoding method for encoding a target block using intra prediction. The method comprises the following steps: dividing the target block using at least one boundary line, wherein the boundary line allows the target block to be divided into a plurality of non-rectangular blocks; determining an intra prediction mode of the non-rectangular block based on the boundary line; generating a prediction block of the target block by performing intra prediction on each of the non-rectangular blocks; generating a residual block of the target block by subtracting the prediction block from the target block; and encoding boundary line information for specifying the boundary line and a residual signal in the residual block.

Encoding the residual signal in the residual block may include: dividing the target block into a plurality of regions, wherein the plurality of regions include a boundary region including a boundary line and formed near the boundary line and one or more non-boundary regions not including the boundary line; and generating a rectangular residual sub-block by rearranging residual signals in the regions for each of the plurality of regions and transforming the residual sub-block.

Drawings

Fig. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure.

Fig. 2 is a diagram for explaining a method of dividing blocks using the QTBTTT structure.

Fig. 3 is a view illustrating a plurality of intra prediction modes.

Fig. 4 is a diagram of surrounding blocks of a current block.

Fig. 5 is an exemplary block diagram of a video decoding device capable of implementing the techniques of this disclosure.

Fig. 6 is a diagram of a geometric partitioning target block in accordance with an aspect of the present disclosure.

Fig. 7 is a diagram illustrating various predefined boundary lines according to an aspect of the present disclosure.

Fig. 8 is a diagram for explaining a method of configuring reference pixels for respective sub-blocks according to an aspect of the present disclosure.

Fig. 9 is another diagram for explaining a method of configuring reference pixels for respective sub-blocks according to an aspect of the present disclosure.

Fig. 10 is a diagram for explaining a method of determining an intra prediction mode of each sub-block according to an aspect of the present disclosure.

Fig. 11 is a diagram for explaining a method of encoding an intra prediction mode of each sub-block according to an aspect of the present disclosure.

Fig. 12 is a diagram for explaining a method of transforming a target block geometrically divided into a plurality of sub-blocks according to an aspect of the present disclosure.

Fig. 13 is a flowchart illustrating a method of encoding a residual signal in a target block geometrically divided into a plurality of sub-blocks according to an aspect of the present disclosure.

Fig. 14 and 15 are diagrams for explaining a method of dividing a target block into non-rectangular areas and rearranging residual signals within the non-rectangular areas according to an aspect of the present disclosure.

Fig. 16 is a flowchart showing a decoding method performed by the video decoding apparatus corresponding to the encoding method of fig. 13.

Fig. 17 is another diagram for explaining a method of dividing a target block into non-rectangular areas and rearranging residual signals within the non-rectangular areas according to an aspect of the present disclosure.

Fig. 18 is a diagram for explaining a method of determining quantization parameters according to an aspect of the present disclosure.

Detailed Description

Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that when reference is made to a component in each figure, the same reference numeral indicates the same component although the component is shown in a different figure. Further, in describing the present invention, if it is judged that detailed description of related known configurations or functions may obscure the gist of the present invention, detailed description thereof is omitted.

Fig. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure. Hereinafter, a video encoding apparatus and components of the apparatus will be described with reference to fig. 1.

The video encoding apparatus may include a block divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, a reordering unit 150, an entropy encoder 155, an inverse quantizer 160, an inverse transformer 165, an adder 170, a loop filtering unit 180, and a memory 190.

Each component of the video encoding device may be implemented as hardware or software or a combination of hardware and software. The functions of the respective components may be implemented as software, and the microprocessor may be implemented to perform the software functions corresponding to the respective components.

An image (video) is composed of one or more sequences containing a plurality of pictures. Each picture is divided into a plurality of regions, and encoding is performed on each region. For example, a picture is divided into one or more tiles or/and slices. Here, one or more tiles may be defined as a tile group. Each tile or slice is divided into one or more Coding Tree Units (CTUs). Each CTU is divided into one or more Coding Units (CUs) by a tree structure. Information applied to each CU is encoded as a syntax of the CU, and information commonly applied to the CUs included in one CTU is encoded as a syntax of the CTU. In addition, information commonly applied to all blocks in one slice is encoded as syntax of a slice header, and information applied to all blocks constituting one or more pictures is encoded in a Picture Parameter Set (PPS) or a picture header. Furthermore, information commonly indicated by sequences of multiple pictures is encoded in a Sequence Parameter Set (SPS). In addition, information commonly applied to one tile or tile group may be encoded as syntax of a tile or tile group header. The syntax included in SPS, PPS, slice header, and tile or group of tiles header may be referred to as high level syntax.

Additionally, the bitstream may include one or more Adaptive Parameter Sets (APS) that include parameters referenced by a picture or a group of pixels (e.g., a slice) smaller than the picture. The picture header or slice header includes an ID for identifying the APS to be used in the corresponding picture or slice. Pictures referring to different PPS or slices referring to different picture headers may share the same parameters through the same APS ID.

Each of the plurality of pictures may be divided into a plurality of sub-pictures that can be independently encoded/decoded and/or independently displayed. When sub-picture division is applied, information about the layout of sub-pictures within the picture is signaled (Signaling).

The block divider 110 determines the size of a Coding Tree Unit (CTU). Information about the size of the CTU (CTU size) is encoded as syntax of the SPS or PPS and transmitted to the video decoding apparatus.

The block divider 110 divides each picture constituting a video into a plurality of CTUs having a predetermined size, and then recursively (recursively) divides the CTUs using a tree structure. In the tree structure, leaf nodes (leaf nodes) become Coding Units (CUs) that are basic units of coding.

The tree structure may be a Quadtree (QT) in which an upper node (or parent node) is divided into four lower nodes (or child nodes) of the same size, a Binary Tree (BT) in which an upper node is divided into two child nodes, a Trigeminal Tree (TT) in which an upper node is divided into three child nodes in a ratio of 1:2:1, or a structure composed of two or more of a QT structure, a BT structure, and a TT structure. For example, a quadtree plus binary tree (QTBT) structure may be used, or a quadtree plus binary tree trigeminal tree (QTBTTT) structure may be used. The BTTT may be collectively referred to herein as a multi-type tree (MTT).

Fig. 2 is a diagram for explaining a method of dividing blocks using the QTBTTT structure. As shown in fig. 2, CTUs may first be divided into QT structures. QT partitioning may be repeated until the size of the partitioned block reaches the minimum block size MinQTSize of leaf nodes allowed in QT. A first flag (qt_split_flag) indicating whether each node of the QT structure is divided into four nodes of a lower layer is encoded by the entropy encoder 155 and signaled (signaling) to the video decoding apparatus. When the leaf node of QT is not greater than the maximum block size (MaxBTSize) of the root node allowed in BT, it may be further divided into one or more of BT structure or TT structure. The BT structure and/or the TT structure may have a plurality of division directions. For example, there may be two directions, namely, a direction of dividing the blocks of the nodes horizontally and a direction of dividing the blocks vertically. As shown in fig. 2, when MTT division starts, a second flag (MTT _split_flag) indicating whether a node is divided, a division direction (vertical or horizontal) flag indicating a division time, and/or a flag indicating a division type (binary or ternary) are encoded by the entropy encoder 155 and signaled to the video decoding device. Alternatively, a CU partition flag (split_cu_flag) indicating whether or not a node is partitioned may be encoded before encoding a first flag (qt_split_flag) indicating whether or not each node is partitioned into 4 nodes of a lower layer. When the value of the CU partition flag (split_cu_flag) indicates that no partition is performed, a block of the node becomes a leaf node in the partition tree structure and becomes a Coding Unit (CU) that is a basic unit of coding. When the value of the CU partition flag (split_cu_flag) indicates that the partition has been performed, the video encoding apparatus encodes the flag from the first flag in the above-described manner.

When QTBT is used as another example of a tree structure, there may be two division modes, one is to divide a block horizontally into two blocks of the same size (i.e., symmetrical horizontal division) and one is to divide a block vertically into two blocks of the same size (i.e., symmetrical vertical division). A partition flag (split_flag) indicating whether each node of the BT structure is partitioned into lower-layer blocks and partition type information indicating a partition type are encoded by the entropy encoder 155 and transmitted to the video decoding apparatus. On the other hand, there may be additional types of partitioning the blocks of a node into two asymmetric blocks. The asymmetric division type may include a type of dividing a block into two rectangular blocks at a size ratio of 1:3, or a type of dividing a block of a node diagonally.

A CU may have various sizes according to QTBT or QTBTTT partitions of the CTU. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of QTBTTT) is referred to as a "current block". Since QTBTTT partitioning is employed, the shape of the current block may be square or rectangular.

The predictor 120 predicts a current block to generate a predicted block. Predictor 120 includes an intra predictor 122 and an inter predictor 124.

The intra prediction unit 122 configures reference samples according to pre-reconstructed samples located around a current block in a current picture including the current block, and predicts samples in the current block using the reference samples. Depending on the prediction direction, there are a variety of intra prediction modes. For example, as shown in fig. 3, the plurality of intra prediction modes may include two non-directional modes including a planar mode and a DC mode, and 65 directional modes. For each prediction mode, the reference samples and the computational formula to be used are defined differently.

The intra predictor 122 may determine an intra prediction mode to be used when encoding the current block. In some examples, intra predictor 122 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode to be used from among the test modes. For example, the intra predictor 122 may calculate a rate distortion value using rate-distortion (rate-distortion) analysis of several tested intra prediction modes, and may select an intra prediction mode having the best rate distortion characteristics among the tested modes.

The intra predictor 122 selects one intra prediction mode from among a plurality of intra prediction modes, and predicts the current block using a reference sample and a calculation formula determined according to the selected intra prediction mode. Information about the selected intra prediction mode is encoded by the entropy encoder 155 and transmitted to the video decoding device.

The inter predictor 124 generates a prediction block of the current block through motion compensation. The inter predictor 124 searches for a block most similar to the current block among reference pictures encoded and decoded earlier than the current picture, and generates a predicted block of the current block using the searched block. Then, the inter predictor 124 generates a motion vector corresponding to a displacement between the current block in the current picture and the predicted block in the reference picture. In general, motion estimation is performed on a luminance (luma) component, and a motion vector calculated based on the luminance component is used for the luminance component and the chrominance component. Motion information including information on a reference picture and information on a motion vector for predicting a current block is encoded by the entropy encoder 155 and transmitted to a video decoding apparatus.

The inter predictor 124 may perform interpolation on the reference picture or the reference block to increase prediction accuracy. That is, sub-samples between two consecutive integer samples are interpolated by applying filter coefficients to a plurality of consecutive integer samples comprising the two integer samples. When an operation of searching for a block most similar to the current block is performed on the interpolated reference picture, the motion vector may be represented at a precision level of a fractional sample unit instead of a precision level of an integer sample unit. A different precision or resolution of the motion vector may be set for each target region to be encoded, e.g. for each unit such as a slice, a tile, a CTU or a CU. When such adaptive motion vector resolution is applied, information on the motion vector resolution to be applied to each target area should be signaled for each target area. For example, when the target area is a CU, information about the resolution of a motion vector applied to each CU is signaled.

The inter predictor 124 may perform inter prediction using bi-prediction. In bi-prediction, the inter predictor 124 uses two reference pictures and two motion vectors representing the positions of blocks most similar to the current block in the respective reference pictures. The inter predictor 124 selects a first reference picture and a second reference picture from the reference picture list0 (RefPicList 0) and the reference picture list1 (RefPicList 1), searches for a block similar to the current block in the respective reference pictures, and generates a first reference block and a second reference block, respectively. It then generates a prediction block for the current block by averaging or weighting the first reference block and the second reference block. Then, it transmits motion information including information about two reference pictures and two motion vectors for predicting the current block to the encoder 150. Here, refPicList0 may be composed of pictures whose display order is before the current picture among the pre-reconstructed pictures, and RefPicList1 may be composed of pictures whose display order is after the current picture among the pre-reconstructed pictures. However, the embodiment is not limited thereto. The pre-reconstructed picture in display order after the current picture may be further included in RefPicList0, and conversely, the pre-reconstructed picture before the current picture may be further included in RefPicList 1.

The motion information (motion vectors, reference pictures) should be signaled to the video decoding device. Various methods may be used to minimize the number of bits required to encode motion information.

For example, when a reference picture and a motion vector of a current block are identical to those of surrounding blocks, motion information about the current block may be transmitted to a video decoding apparatus by encoding information for identifying the surrounding blocks. This method is called "merge mode".

In the merge mode, the inter predictor 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as "merge candidates") from among surrounding blocks of the current block.

As shown in fig. 4, all or part of the left block L, the upper block a, the upper right block AR, the lower left block BL, and the upper left block AL adjacent to the current block in the current picture may be used as surrounding blocks for deriving merge candidates. In addition, blocks located within a reference picture (which may be the same as or different from a reference picture used to predict the current block) other than the current picture in which the current block is located may be used as merging candidates. For example, a co-located block (co-located block) or a block adjacent to the co-located block in the reference picture that is co-located with the current block may be additionally used as a merging candidate.

The inter predictor 124 configures a merge list including a predetermined number of merge candidates using such surrounding blocks. The inter predictor 124 selects a merge candidate to be used as motion information about the current block from among the merge candidates included in the merge list, and generates merge index information for identifying the selected candidate. The generated combined index information is encoded by the encoder 155 and transmitted to the video decoding apparatus.

Another method of encoding motion information is AMVP mode.

In the AMVP mode, the inter predictor 124 derives a predicted motion vector candidate of a motion vector of a current block by using surrounding blocks of the current block. All or part of the left block L, the upper block a, the upper right block AR, the lower left block BL, the upper left block AL adjacent to the current block in the current picture in fig. 2 may be used as a surrounding block for deriving the predicted motion vector candidates. In addition, blocks located within reference pictures (which may be the same as or different from the reference picture used to predict the current block) other than the current picture including the current block may be used as surrounding blocks for deriving predicted motion vector candidates. For example, a co-located block or a block adjacent to a co-located block in the reference picture that is co-located with the current block may be used.

The inter predictor 124 derives predicted motion vector candidates using motion vectors of surrounding blocks and determines predicted motion vectors of the current block using the predicted motion vector candidates. Then, a motion vector difference is calculated by subtracting the predicted motion vector from the motion vector of the current block.

The predicted motion vector may be obtained by applying a predefined function (e.g., a function for calculating a median, average, etc.) to the predicted motion vector candidates. In this case, the video decoding device is also aware of the predefined function. Since surrounding blocks used to derive predicted motion vector candidates have already been encoded and decoded, the video decoding apparatus has also known the motion vectors of the surrounding blocks. Therefore, the video encoding apparatus does not need to encode information for identifying predicted motion vector candidates. In this case, therefore, information on the motion vector difference and information on the reference picture used to predict the current block are encoded.

The predicted motion vector may be determined by selecting any one of the predicted motion vector candidates. In this case, information for identifying the selected predicted motion vector candidate is further encoded together with information on the motion vector difference and information on a reference picture to be used for predicting the current block.

The subtractor 130 subtracts the prediction block generated by the intra predictor 122 or the inter predictor 124 from the current block to generate a residual block.

The transformer 140 may transform the residual signal in the residual block. The two-dimensional size of the residual block may be used as a Transform Unit (TU), which is a block size for performing a transform. Alternatively, the residual block may be divided into a plurality of sub-blocks, and residual signals in the respective sub-blocks may be transformed by using each sub-block as a TU.

The transformer 140 divides the residual block into one or more sub-blocks and applies a transform to the one or more sub-blocks to transform residual values of the sub-blocks from the pixel domain to the frequency domain. In the frequency domain, a transform block is referred to as a coefficient block (coefficient block) or transform block (transform block) that contains one or more transform coefficient values. A two-dimensional transform kernel may be used for the transform and a one-dimensional transform kernel may be used for the horizontal transform and the vertical transform. The transform kernel may be based on a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST). The transform kernel may also be referred to as a transform matrix.

The transformer 140 may independently transform the residual block in the horizontal direction and the vertical direction. For the transformation, various types of transformation kernels or transformation matrices may be used. For example, transform checks for horizontal transforms and vertical transforms may be defined as Multiple Transform Sets (MTS). The transformer 140 may select a pair of transform kernels having the best transform efficiency in the MTS and transform the residual block in the horizontal direction and the vertical direction, respectively. Information (mts_idx) about the transform function pair selected in the MTS is encoded by the entropy encoder 155 and signaled to the video decoding device.

The quantizer 145 quantizes the transform coefficient output from the transformer 140 using a quantization parameter, and outputs the quantized transform coefficient to the entropy encoder 155. For some blocks or frames, the quantizer 145 may directly quantize the relevant residual block without transformation. The quantizer 145 may apply different quantization coefficients (scaling values) according to the positions of the transform coefficients in the transform block. The quantization coefficient matrix applied to the two-dimensionally arranged quantization transform coefficients may be encoded and signaled to the video decoding apparatus.

The rearrangement unit 150 may rearrange coefficient values of the quantized residual values. The rearrangement unit 150 may change the two-dimensional coefficient array into a one-dimensional coefficient sequence by coefficient scanning. For example, the rearrangement unit 150 may use zig-zag scan (zig-zag scan) or diagonal scan to scan coefficients from DC coefficients to coefficients in a high frequency region to output a one-dimensional coefficient sequence. Depending on the size of the transform unit and the intra prediction mode, a vertical scan that scans the two-dimensional coefficient array in the column direction or a horizontal scan that scans the two-dimensional block coefficients in the row direction may be used instead of the zig-zag scan. That is, the scan pattern to be used may be determined among zig-zag scan, diagonal scan, vertical scan, and horizontal scan according to the size of the transform unit and the intra prediction mode.

The entropy encoder 155 encodes the one-dimensional quantized transform coefficients output from the rearrangement unit 150 using various encoding techniques such as context-based adaptive binary arithmetic code (CABAC) and exponential golomb (Exponential Golomb) to generate a bitstream.

The entropy encoder 155 encodes information associated with block division such as a CTU size, a CU division flag, a QT division flag, an MTT division type, and an MTT division direction so that a video decoding apparatus can divide blocks in the same manner as the video encoding apparatus. In addition, the entropy encoder 155 encodes information on a prediction type indicating that the current block is encoded by intra prediction or inter prediction, and encodes intra prediction information (i.e., information on an intra prediction mode) or inter prediction information (merging index of a merging mode, information on a reference picture index and a motion vector difference of an AMVP mode) according to the prediction type. The entropy encoder 155 also encodes information related to quantization, i.e., information about quantization parameters and information about quantization matrices.

The inverse quantizer 160 inversely quantizes the quantized transform coefficient output from the quantizer 145 to generate a transform coefficient. The inverse transformer 165 transforms the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain and reconstructs a residual block.

The adder 170 adds the reconstructed residual block to the prediction block generated by the predictor 120 to reconstruct the current block. The samples in the reconstructed current block are used as reference samples for performing intra prediction of the next block.

The loop filtering unit 180 filters the reconstructed samples to reduce block artifacts (blocking artifacts), sharpness artifacts (ringing artifacts), and blurring artifacts (blurring artifacts) generated due to block-based prediction and transform/quantization. The loop filtering unit 180 may include at least one of a deblocking filter 182, a Sampling Adaptive Offset (SAO) filter 184, and an Adaptive Loop Filter (ALF) 186.

The deblocking filter 182 filters boundaries between reconstructed blocks to remove block artifacts resulting from block-wise encoding/decoding, and the SAO filter 184 performs additional filtering on the deblock filtered video. The SAO filter 184 is a filter for compensating for differences between reconstructed samples and original samples caused by lossy coding (losslessy coding), and filters in such a way that a corresponding offset is added to each reconstructed sample. The ALF186 performs filtering on the target sample to be filtered by applying filter coefficients to the target sample and surrounding samples of the target sample. The ALF186 may divide samples included in the picture into predetermined groups and then determine one filter to be applied to the corresponding group to differentially filter each group. Information about the filter coefficients to be used for the ALF may be encoded and signaled to the video decoding device.

The reconstructed block filtered by the loop filtering unit 180 is stored in the memory 190. If all blocks in a picture are reconstructed, the reconstructed picture may be used as a reference picture for inter prediction of blocks in a picture to be encoded next.

Fig. 5 is an exemplary functional block diagram of a video decoding device capable of implementing the techniques of this disclosure. Hereinafter, a video decoding apparatus and components of the apparatus will be described with reference to fig. 5.

The video decoding apparatus may include an entropy decoder 510, a reordering unit 515, an inverse quantizer 520, an inverse transformer 530, a predictor 540, an adder 550, a loop filtering unit 560, and a memory 570.

Similar to the video encoding device of fig. 1, each component of the video decoding device may be implemented as hardware, software, or a combination of hardware and software. Further, the functions of each component may be implemented in software, and the microprocessor may be implemented to execute the functions of the software corresponding to each component.

The entropy decoder 510 determines a current block to be decoded by decoding a bitstream generated by a video encoding apparatus and extracting information related to block division, and extracts information on prediction information and a residual signal required to reconstruct the current block.

The entropy decoder 510 extracts information about the size of CTUs from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), determines the size of the CTUs, and divides the picture into CTUs of the determined size. Then, the decoder determines the CTU as the uppermost layer of the tree structure, i.e., the root node, and extracts partition information about the CTU to partition the CTU using the tree structure.

For example, when CTUs are divided using the QTBTTT structure, a first flag (qt_split_flag) related to QT division is extracted to divide each node into four nodes of a lower layer. For a node corresponding to a leaf node of QT, information about a second flag (mtt_split_flag) and a division direction (vertical/horizontal) and/or a division type (binary/ternary) related to division of MTT is extracted to divide the leaf node into MTT structures. Thus, each node below the leaf node of QT is recursively divided into BT or TT structures.

As another example, when CTUs are divided using the QTBTTT structure, a CU division flag (split_cu_flag) indicating whether to divide a CU may be extracted. When the corresponding block is divided, a first flag (qt_split_flag) may be extracted. In the partitioning operation, zero or more recursive MTT partitioning may occur per node after zero or more recursive QT partitioning. For example, CTUs may directly perform MTT division without QT division, or may perform QT division only a plurality of times.

For another example, when dividing CTUs using a QTBT structure, a first flag (qt_split_flag) related to QT division is extracted, and each node is divided into four nodes of a lower layer. Then, a division flag (split_flag) indicating whether a node corresponding to a leaf node of QT is further divided into BT and division direction information are extracted.

If the current block to be decoded is determined through the division of the tree structure, the entropy decoder 510 extracts information on a prediction type indicating whether the current block is intra prediction or inter prediction. When the prediction type information indicates intra prediction, the entropy decoder 510 extracts a syntax element (elemenet) of intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the entropy decoder 510 extracts syntax elements of the inter prediction information, i.e., information indicating a motion vector and a reference picture to which the motion vector refers.

The entropy decoder 510 also extracts information related to quantization and information on quantized transform coefficients of the current block as information on a residual signal.

The reordering unit 515 may change the sequence of the one-dimensional quantized transform coefficients entropy-decoded by the entropy decoder 510 into a two-dimensional coefficient array (i.e., block) in the reverse order of the coefficient scan performed by the video encoding apparatus.

The inverse quantizer 520 inversely quantizes the quantized transform coefficients using quantization parameters. The inverse quantizer 520 may apply different quantized coefficients (scaling values) to quantized transform coefficients arranged in two dimensions. The inverse quantizer 520 may perform inverse quantization by applying a matrix of quantized coefficients (scaled values) from the video encoding apparatus to a two-dimensional array of quantized transform coefficients.

The inverse transformer 530 inversely transforms the inversely quantized transform coefficients from the frequency domain to the spatial domain to reconstruct the residual signal, thereby generating a reconstructed residual block of the current block. In addition, when applying MTS, the inverse transformer 530 determines transform kernels or transform matrices to be applied in the horizontal direction and the vertical direction, respectively, using MTS information (mts_idx) signaled from the video encoding apparatus, and inversely transforms transform coefficients in the transform blocks in the horizontal direction and the vertical direction using the determined transform kernels.

The predictor 540 may include an intra predictor 542 and an inter predictor 544. The intra predictor 542 is activated when the prediction type of the current block is intra prediction, and the inter predictor 544 is activated when the prediction type of the current block is inter prediction.

The intra predictor 542 determines an intra prediction mode of the current block among a plurality of intra prediction modes based on syntax elements of the intra prediction mode extracted from the entropy decoder 510, and predicts the current block using reference samples around the current block according to the intra prediction mode.

The inter predictor 544 determines a motion vector of the current block and a reference picture to which the motion vector refers based on syntax elements of the inter prediction mode extracted from the entropy decoder 510, and predicts the current block based on the motion vector and the reference picture.

The inter predictor 544 may perform interpolation filtering according to a motion vector of the current block. In other words, when the fractional part of the motion vector is not 0, the inter predictor 544 generates a sub-sample or fractional sample indicated by the fractional part through interpolation. Interpolation is performed in the horizontal direction when the fractional part of the horizontal component (x component) of the motion vector is not 0, and interpolation is performed in the vertical direction when the fractional part of the vertical component (y component) of the motion vector is not 0.

The adder 550 reconstructs the current block by adding the residual block output from the inverse transformer 530 to the prediction block output from the inter predictor or the intra predictor. The samples in the reconstructed current block are used as reference samples for intra prediction of the next block to be decoded.

The loop filtering unit 560 may include at least one of a deblocking filter 562, an SAO filter 564, and an ALF 566. Deblocking filter 562 deblocking filters boundaries between reconstructed blocks to remove block artifacts resulting from block-by-block decoding. The SAO filter 564 filters in a manner that adds a corresponding offset to the reconstructed block after deblocking filtering to compensate for differences between reconstructed samples and original samples caused by lossy encoding. The ALF 566 performs filtering on the target sample to be filtered by applying filter coefficients to the target sample and surrounding samples of the target sample. ALF 566 may divide samples in a picture into predetermined groups and then determine one filter to be applied to the corresponding group to differentially filter each group. The filter coefficients of the ALF are determined based on information about the filter coefficients decoded from the bitstream.

The reconstructed block filtered by the loop filtering unit 560 is stored in the memory 570. When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of blocks in a picture to be encoded next.

As described above, the conventional video encoding apparatus and video decoding apparatus encode and decode pictures in units of square blocks or rectangular blocks. However, when a side (edge) in a direction other than the horizontal or vertical direction (e.g., a boundary between objects within a picture) is included in a picture, encoding based on square blocks or rectangular blocks may decrease encoding efficiency. For example, CTUs to be divided into tree structures may include edges in a diagonal direction. In order to efficiently encode CTUs, the CUs within the CTU containing the edges must be recursively divided into smaller square or rectangular CUs. However, repeated division results in an increase in the amount of encoding of the division information related to the tree structure division of the CTU.

On the other hand, in order to reduce the encoding amount of the partition information or due to a limitation on the level or depth at which the CTU can be maximally partitioned into tree structures, the video encoding apparatus may not partition the CU including diagonal lines. The square block or rectangular block based coding of a CU increases the amount of coded data of the corresponding CU, i.e. the number of bits needed to code the residual signal.

The present disclosure provides a method of efficiently encoding pictures including edges in various directions using arbitrary block partitioning, i.e., geometric block partitioning.

1. Block partitioning

The video encoding apparatus divides a square or rectangular target block into non-rectangular blocks. Here, a rectangular block refers to a square block or a rectangular block, and a non-rectangular block refers to a block that is not square or rectangular.

When there are edges inside the target block, i.e. when the target block comprises boundaries between objects, the video encoding device may geometrically divide the target block into a plurality of non-rectangular sub-blocks based on the edges.

Fig. 6 illustrates a geometric partitioning target block according to one embodiment of the present disclosure.

As shown in fig. 6 (a), when an edge 602 exists in a target block, the video encoding apparatus may divide the target block into sub-blocks a and B based on the edge 602. As shown in fig. 6 (B), when there are two

edges

612, 614 in the target block, the video encoding apparatus may divide the target block into three sub-blocks A, B and C.

The video encoding apparatus encodes partition information related to geometric partitions of a target block. The partition information may include a flag indicating whether a geometric partition, i.e., an arbitrary block partition, has been applied to the target block. Also, when the flag indicates a geometric division, the division information includes division pattern information related to a geometric division pattern of the target block.

The division pattern information includes information for representing boundary lines between the sub-blocks and may further include information for indicating the number of boundary lines.

As one example, the information about the boundary line may be defined as the position or coordinates of two end points at which the boundary line intersects the boundary of the target block. The positions of the two endpoints may be represented with different accuracies. The accuracy of the two end point positions of the boundary line may be expressed in units of integer pixels or in units of fractional pixels, for example, in units of 1/2, 1/4, 1/8, 1/16, or 1/32. The video encoding device may signal the locations of the two endpoints directly or an index indicating the coordinates of the two endpoints. For example, each side of the target block may be divided at a predefined number of equal intervals, and an index may be assigned to the division point of each side. The video encoding apparatus may encode indexes of the division points corresponding to both end points of the boundary line of the target block. The index of the dividing point may be allocated in various ways. For example, an index may be allocated to each division point in the clockwise direction from the division point of the upper left corner of the target block. Alternatively, each edge of the target block may be independently assigned an index of the division point. For example, each edge of the target block may be assigned an index starting from 0. In this case, the video encoding apparatus encodes a flag indicating whether or not a division point of the boundary line exists for each side of the target block, and then encodes an index of the division point of the side where the division point exists.

As another embodiment, the information about the boundary line may be information indicating the angle and position of the boundary line. The position of the boundary line may be expressed in terms of a distance from the center of the target block to the boundary line. The angle and distance of the boundary line can be expressed with different accuracies. For example, the angle of the boundary line may be expressed in units of integers, for example, in units of 1 degree or 2 degrees, or in units of fractions, for example, in units of 1/2, 1/4, 1/8, 1/16, or 1/32. The distance of the boundary line may also be expressed in integer pixel or fractional pixel precision. The video encoding apparatus may directly signal the angle and the distance of the boundary line or may signal indexes corresponding to the angle and the distance of the boundary line of the target block among the predefined angles and distances, respectively.

The angle and distance of the two end points of the boundary line or the boundary line may be predictively encoded. For example, the video encoding apparatus may select a block having the same or similar division pattern as the target block among the pre-encoded blocks and encode a difference between both end points of the boundary line of the selected block and both end points of the boundary line of the target block or a difference between the distance and angle of the boundary line of the selected block and the distance and angle of the boundary line of the target block. Also, selection information indicating which block has been selected among the pre-encoded blocks may be encoded. Here, the pre-encoded block may be selected from surrounding blocks adjacent to the target block, as shown in fig. 4; alternatively or additionally, the pre-encoded blocks may be selected from a predefined number of blocks that have been recently encoded using geometric partitioning. Since the video encoding apparatus and the video decoding apparatus can select the pre-encoded block in the same manner, there is no need to signal additional information for configuring the pre-encoded block.

As another embodiment, the information about the boundary line may be expressed as an index indicating the boundary line in the target block among the boundary lines along various predefined directions. Fig. 7 is a diagram showing various predefined boundary lines. Fig. 7 shows 64 borderlines. The video encoding device encodes an index corresponding to a boundary line of the target block among the 64 boundary lines. When the target block is allowed to be divided using a plurality of boundary lines, the video encoding apparatus may signal the number of boundary lines and as many indexes as the number of boundary lines.

The video encoding device encodes the above-described division information and signals the encoded division information to the video decoding device. The video decoding apparatus divides the target block into sub-blocks using the division information in the same manner as the video encoding apparatus.

On the other hand, the video encoding device may signal a flag indicating whether geometric partitioning is possible at a higher level than the target block, for example at the level of SPS, PPS, slice or tile. The above-mentioned division information of the blocks in the image area corresponding to the level is signaled when the corresponding flag indicates that geometric division is possible.

2. Prediction

The above-described geometric partitioning can be applied to inter-prediction encoding and intra-prediction encoding. In the case of geometrically dividing the target block into a plurality of sub-blocks, the predictor 120 of the video encoding apparatus performs inter prediction or intra prediction on each sub-block. Prediction information for inter prediction or intra prediction is signaled to the video decoding apparatus, and the predictor 540 of the video decoding apparatus predicts each sub-block using the prediction information in the same manner as the predictor 120 of the video encoding apparatus.

2-1) inter prediction

The inter predictor 124 of the video encoding apparatus determines motion vectors of the respective sub-blocks and generates a predicted sub-block for the respective sub-blocks using the corresponding motion vectors. Then, the inter predictor 124 generates a prediction block of the target block by combining the predicted sub-blocks. To remove artifacts at the boundaries between sub-blocks, the inter predictor 124 may perform additional filtering on the predicted pixels near the boundaries between sub-blocks. The motion information of each sub-block is encoded by the encoder 155 and transmitted to the video decoding apparatus. The inter predictor 544 of the video decoding apparatus generates a prediction block of the target block by predicting each sub-block using the received motion information in the same manner as the inter predictor 124 of the video encoding apparatus.

On the other hand, the geometric division may be applied only when the target block is encoded in a merge or skip mode. In this case, as described above, the inter predictor 124 constructs a merge list including a predetermined number of merge candidates from surrounding blocks of the target block. Subsequently, the inter predictor 124 selects a merge candidate to be used as a motion vector of each sub-block among the merge candidates included in the merge list, and generates merge index information for identifying the selected candidate.

The combined index information of each sub-block is encoded by the encoder 155 and transmitted to the video decoding apparatus. The inter predictor 544 of the video decoding apparatus receiving the merge index information of each sub-block constructs a merge list in the same manner as the inter predictor 124 of the video encoding apparatus and predicts each sub-block by selecting a merge candidate indicated by the merge index information from the merge list.

2-2) intra prediction

The intra predictor 122 of the video encoding apparatus constructs reference pixels (reference samples) from pre-reconstructed pixels around the target block and determines an intra prediction mode for each sub-block. The intra predictor 122 then predicts each sub-block from the reference pixel according to the determined intra prediction mode. The intra predictor 542 of the video decoding apparatus predicts each sub-block using information received from the video encoding apparatus in the same manner as the intra predictor 122 of the video encoding apparatus.

Since the operation of the intra predictor 542 of the video decoding apparatus is the same as that of the intra predictor 122 of the video encoding apparatus, the operation of the video encoding apparatus will be mainly described below.

The reference pixels of each sub-block may be configured differently based on one or more of the angle (or direction) of the boundary line, the position of the boundary line in the target block (or the distance of the center of the target block from the boundary line), the position of the sub-block relative to the boundary line, and the position of the peripheral line (column and/or row of pixels) of the target block for configuring the reference pixels. When dividing the target block based on the boundary between objects, each sub-block is likely to have a different texture property because the sub-blocks belong to different objects. On the other hand, blocks located in the same direction based on the boundary line are likely to have similar texture features. Accordingly, the reference pixel of each sub-block is configured using surrounding pixels located in the same direction as the corresponding sub-block based on the boundary line.

Fig. 8 is a diagram illustrating a method of configuring reference pixels of respective sub-blocks.

The reference pixels may be configured using surrounding pixels included in a row adjacent to an upper side of the target block and a column adjacent to a left side of the target block. As shown in fig. 8, the reference pixel of the sub-block a is composed of surrounding

pixels

802, 804 located in the same direction as the sub-block a based on the boundary line among surrounding pixels included in the upper side row or the left side column. Similarly, the reference pixel of the sub-block B is composed of surrounding pixels 806 located in the same direction as the sub-block B based on the boundary line among surrounding pixels included in the upper side row or the left side column. The example of fig. 8 relates to a method for configuring reference pixels when encoding/decoding of a block proceeds from left to right in a horizontal direction and from top to bottom in a vertical direction, such as in raster scan order. However, a different encoding/decoding order may be used, and surrounding pixels adjacent to the right side or the lower side of the target block may be used as reference pixels according to the encoding/decoding order.

The intra predictor 122 may specify a boundary line within the target block using an angle (or direction) of the boundary line and a position of the boundary line within the target block, and calculate positions (P, Q) at which the boundary line intersects with the upper row and the left column. The intra predictor 122 determines a reference pixel located in the same direction as each sub-block based on the boundary line using the positions P and Q.

As shown in fig. 8, the reference pixels may be configured using surrounding pixels included in an upper row and a left column immediately adjacent to the target block, but the present invention is not limited thereto. For example, the intra predictor 122 may configure the reference pixels using pre-reconstructed surrounding pixels included in other rows or columns not immediately adjacent to the target block.

Fig. 9 is another diagram for explaining a method of configuring reference pixels for respective sub-blocks.

As shown in fig. 9, the intra predictor 122 may configure the reference pixels not only using a line immediately adjacent to the target block but also using surrounding pixels included in at least one non-adjacent line. For example, the reference pixels may be constructed using surrounding pixels included in a predetermined number of columns on the left side of the target block and a predetermined number of rows on the upper side of the target block. The intra predictor 122 may select which line among a plurality of lines to use. The intra predictor 122 may calculate a position where the selected line and the boundary line intersect and construct reference pixels of the respective sub-blocks differently using the calculated positions. On the other hand, index information on a selected line among the plurality of lines may be encoded and transmitted to the video decoding apparatus. The video decoding apparatus selects a line for constructing a reference pixel among a plurality of lines using the received index information. An index may be assigned to each line (row and column) based on the distance of the row from the target block. For example, index 0 may be assigned to a first line immediately adjacent to the target block, index 1 may be assigned to a second line, and so on. According to an embodiment, a plurality of rows and a plurality of columns may be used instead of one row and one column. Information about a plurality of rows and columns may be transmitted through respective indexes. Alternatively, an index for selecting one or more candidates from a list consisting of candidates, each of which forms a pair with two or more lines, may be transmitted. Also, indexes of the left line and the upper line may be differently configured. For example, a first line may be selected from the left column and a second line may be selected from the upper row.

As shown in fig. 3, the intra prediction mode for predicting each sub-block may include a directional mode and a non-directional mode (e.g., a DC mode and a PLANAR mode). The intra prediction mode of each sub-block may be selected based on at least one of an angle (or direction) of the boundary line and a position of the sub-block within the target block.

In some embodiments, the orientation pattern applicable to each sub-block may be limited based on at least one of an angle (or direction) of the boundary line and a position of the sub-block within the target block relative to the boundary line. In other words, only a part of the intra prediction modes among the entire intra prediction modes may be selected for each sub-block based on at least one of the angle (or direction) of the boundary line and the position of the sub-block in the target block. As described above, each sub-block is predicted using a different reference sample. Thus, intra predictor 122 limits the directional modes applicable to each sub-block such that only the reference samples allowed for each sub-block are used. This limitation reduces the number of bits required to encode the intra prediction mode of each sub-block.

For example, when the angle (or direction) of the boundary line is referred to as boundary_angle, the orientation pattern may be divided into a set of orientation patterns having angles ranging from boundary_angle to boundary_angle+180 degrees, that is, angles ranging from boundary_angle to boundary_angle+180 degrees in the counterclockwise direction, and a set of orientation patterns having angles ranging from outside the above-described range. Intra predictor 122 applies directional modes belonging to different sets to each sub-block. Referring to the example of fig. 10, the target block is divided into sub-block a and sub-block B by a boundary line connecting the upper right corner and the lower left corner of the target block at an angle of 45 degrees. The sub-block a is predicted using the top-left reference pixel 1002 and the top-left reference pixel 1004 of the target block, and the sub-block B is predicted using the top-right and bottom-left pixels 1006. Thus, the orientation mode applicable to the sub-block a is limited to the intra prediction mode having an angle ranging from 45 degrees to 225 degrees in the counterclockwise direction, and the orientation mode applicable to the sub-block B is limited to the intra prediction mode having an angle ranging from 45 degrees to-135 degrees in the clockwise direction. For the various geometric division modes shown in fig. 8, the orientation modes allowed for the sub-blocks a and B may be adaptively changed according to the angle of the boundary line and the position of the sub-block with respect to the boundary line.

Moreover, the orientation mode may be limited according to the range of available reference pixels. In the embodiment of fig. 10, the number of reference pixels 1006 of the sub-block B may be limited by the horizontal and vertical lengths of the target block including the sub-block B, and thus, the orientation mode in which the reference pixels 1006 cannot be referred to is not used as the intra prediction mode of the sub-block B. For example, when the target block size in fig. 10 is n×n, the available reference pixels may be limited to 2N surrounding samples that are continuous in the horizontal (row) direction from the sample immediately above the leftmost sample in the target block and 2N surrounding pixels that are continuous in the vertical (column) direction from the sample immediately to the left of the leftmost sample. In this case, the intra prediction modes allowed for sub-block B are limited to-130 degrees to-112.5 degrees and 22.5 degrees to 45 degrees.

In the present embodiment, the intra predictor 122 selects an intra prediction mode to be applied to each sub-block based on the boundary line. Then, the intra predictor 122 determines an intra prediction mode of the sub-block among the intra prediction modes selected for the sub-block. Information representing the determined intra prediction mode is delivered to a video decoding apparatus. In the same manner as the intra predictor 122 of the video encoding apparatus, the intra predictor 542 of the video decoding apparatus selects an intra prediction mode for each sub-block based on the boundary line. Then, the intra predictor 542 determines an intra prediction mode of the sub-block among the intra prediction modes selected for the sub-block.

In another embodiment, the intra prediction mode of each sub-block may be limited under certain conditions. The intra predictor 122 may check an intra prediction mode of at least one surrounding block including at least a portion of the reference pixels configured for each sub-block, and determine whether to limit the intra prediction mode of the corresponding sub-block by comparing an angle indicated by the intra prediction mode of the surrounding block with an angle of a boundary line. For example, the intra predictor 122 checks an intra prediction mode of one or more surrounding blocks including at least one pixel among the

reference pixels

1002, 1004 of the sub-block a. It is assumed that when there is an intra prediction mode exceeding an angle in the range of 45 degrees to 225 degrees among intra prediction modes of surrounding blocks, texture features of two regions divided by a boundary line may not be discriminated from each other. Thus, intra predictor 122 does not limit the intra prediction mode of sub-block a. In this case, the reference pixels of the sub-block a may not be limited to surrounding pixels indicated by 1002 and 1004 in fig. 10.

For each of the sub-blocks, the intra predictor 122 determines an intra prediction mode of the sub-block among intra prediction modes applicable to the sub-block, and generates a predicted sub-block using the determined intra prediction mode.

The intra prediction modes of the respective sub-blocks are encoded and transmitted to a video decoding apparatus. The video encoding device may encode information directly indicating the intra prediction mode of each sub-block, but may also predictively encode the information.

In some embodiments, the video encoding apparatus constructs the candidate list using intra prediction modes of surrounding blocks of the target block. The candidate list is shared for all sub-blocks. The video encoding apparatus signals index information for identifying an intra prediction mode of each sub-block from the candidate list. As one example, the video encoding apparatus selects a prediction mode having a mode number closest to an intra prediction mode of a sub-block among prediction modes included in a candidate list and encodes an index corresponding to the selected prediction mode. Then, a difference between the actual intra prediction mode of the sub-block and the prediction mode corresponding to the coding index is encoded. As another example, the difference value may be encoded when there is no prediction mode of the sub-block in the candidate list, i.e., when the actual intra prediction mode of the sub-block is not the same as the prediction mode corresponding to the encoded index. For this, a flag indicating whether a prediction mode of a sub-block exists in the candidate list may be encoded before the difference value.

In another embodiment, the candidate list may be differently configured for each sub-block based on at least one of a direction (or angle) of the boundary line and a position of each sub-block divided by the boundary line. Referring to fig. 10, as described above, the sub-block a is likely to have texture characteristics similar to those of a surrounding block including at least a portion of surrounding pixels 1004 located above or to the left of the target block and surrounding pixels 1002 located above and to the left of the target block. Thus, the video encoding device may generate a candidate list of sub-block a according to the intra prediction mode of the surrounding block comprising at least a portion of the surrounding

pixels

1002, 1004. Sub-block B is likely to have texture characteristics similar to surrounding blocks that include at least a portion of surrounding pixels 1006 located either above right or below left of the target block. Thus, the video encoding device may generate the candidate list for sub-block B according to the intra prediction mode of the surrounding block including at least a portion of the surrounding pixels 1006.

In the present embodiment, the encoding method for identifying information of the intra prediction mode of the corresponding sub-block from each candidate list may be the same as in the above-described embodiment. As another example, the video encoding apparatus encodes a flag indicating whether an intra prediction mode of a sub-block exists in a corresponding candidate list. When an intra prediction mode of a sub-block exists in the corresponding candidate list, an index corresponding to the intra prediction mode of the sub-block from the candidate list is encoded. Otherwise, identification information for identifying an intra prediction mode of the sub-block among the remaining intra prediction modes other than the prediction modes included in the candidate list is encoded according to the intra prediction mode applicable to the sub-block.

The video encoding apparatus may independently and individually encode the intra prediction modes of the respective sub-blocks. Alternatively, for a partial sub-block, the prediction mode may be encoded in the same manner as described above, and for the remaining sub-blocks, the difference value of the intra prediction mode from the partial sub-block may be encoded. For example, the video encoding apparatus classifies the sub-blocks into groups having prediction modes similar to each other. Then, for each group, the video encoding apparatus encodes the intra prediction mode of one sub-block using the method of the above embodiment, and encodes the intra prediction modes of the remaining sub-blocks in the group using the difference values of the encoded intra prediction modes from one sub-block. Referring to fig. 11, when boundary lines are considered, sub-blocks a and B and sub-blocks C and D are likely to have intra prediction modes similar to each other. Accordingly, the video encoding apparatus encodes the intra prediction modes of the sub-block a and the sub-block C according to the above-described embodiments. The prediction modes of the sub-block B and the sub-block D are encoded using the prediction mode difference between the sub-block B and the sub-block a and the prediction mode difference between the sub-block D and the sub-block C, respectively.

The intra predictor 542 of the video decoding apparatus intra predicts each sub-block using an intra prediction mode of each sub-block received from the video encoding apparatus. Since the operation of the intra predictor 542 of the video decoding apparatus is the same as that of the intra predictor 122 of the video encoding apparatus, a further description will be omitted.

On the other hand, matrix-based intra prediction (MIP) may be applied to predict each sub-block. To predict sub-blocks geometrically partitioned from a target block using the MIP mode, the intra predictor 122 configures a reference sample for each sub-block in the same manner as described above, performs a matrix operation using a different reference sample for each sub-block, and generates a predicted sub-block for each sub-block.

Referring to fig. 10, the intra predictor 122 configures

reference samples

1002, 1004 for the sub-block a and generates an input vector for matrix operation according to the

reference samples

1002, 1004. The input vector may be generated by downsampling a reference sample. For example, the input vector may be generated by averaging a predefined number of consecutive reference samples. The intra predictor 122 multiplies the input vector by a matrix to generate predicted pixels constituting the sub-block a. The prediction pixels are rearranged according to the shape of the sub-block a. The intra predictor 122 generates a final predicted sub-block corresponding to the shape of the sub-block a by interpolation using the generated predicted pixel and the

reference pixels

1002, 1004 of the sub-block a.

Intra predictor 122 constructs an input vector from reference samples 1006 of sub-block B. Then, a prediction sub-block corresponding to the sub-block B is generated in the same manner as the sub-block a.

On the other hand, multiple matrices may be used for MIP. Each sub-block may use the same matrix or a different matrix. Information for identifying the matrix used is encoded and transmitted to a video decoding device.

To predict each sub-block, intra Block Copy (IBC) may be applied. The intra predictor 122 determines a block vector indicating a block most similar to each sub-block within the pre-decoding area of the current picture including the target block. The intra predictor 122 then generates a predictor block for each sub-block using the pre-reconstructed pixels of the region indicated by each block vector.

When a predictor block is generated for each sub-block according to the above-described intra prediction method, a predicted block of the target block is generated by combining the predictor blocks. To remove artifacts at the boundaries between sub-blocks, filtering may be performed on the predicted pixels near the boundaries between sub-blocks. Alternatively or in addition to filtering, an offset corresponding to each sample position may be added to the predicted samples in the predicted block of the target block. Here, the offset may be generated by intra prediction based on the position. The location-based intra prediction may be a PLANAR mode or a combination of location-dependent intra predictions (Position Dependent intra Prediction Combination, PDPC) defined by the VVC standard.

3. Transformation

The transformer 140 of the video encoding apparatus transforms a residual block, which is a difference between a target block and a prediction block of the target block, into a transform coefficient block in the frequency domain. The transform coefficients in the block of transform coefficients are encoded and signaled to the video decoding device and inverse transformed by an inverse transformer 530 of the video decoding device into a residual block in the spatial domain.

In some embodiments of the present disclosure, the transformer 140 of the video encoding apparatus may transform the target block using the same transform unit as the size of the target block, or may transform each sub-block independently. For example, the transformer 140 may rearrange the residual signals in each sub-block into a rectangle or square to generate a rectangle or square residual block and transform the rectangle or square residual block independently or individually.

In another embodiment of the present disclosure, as shown in fig. 12, the transformer 140 classifies residual signals in a target block into residual signal regions near boundaries and other regions, and generates rectangular or square residual sub-blocks by rearranging the residual signals in each region. The transformer 140 transforms and quantizes the residual sub-blocks, respectively, to generate transform coefficient blocks corresponding to the respective sub-blocks. The transform coefficients in each block of transform coefficients are encoded and transmitted to a video decoding device. The transformer 530 of the video decoding apparatus generates a plurality of residual sub-blocks by inverse transforming and inverse quantizing each of the transform coefficient blocks. Then, the transformer 530 generates a residual block of the target block by arranging the residual signal in each residual sub-block to an original position in a corresponding region of the target block. A method of encoding the residual signal in the target block will be described in detail hereinafter.

Fig. 13 is a flowchart illustrating a method of encoding a residual signal in a target block geometrically divided into a plurality of sub-blocks according to one embodiment of the present disclosure.

The transformer 140 of the video encoding apparatus determines a plurality of transform units for transforming the target block (S1110). The transform unit has a size of one-dimensional or two-dimensional blocks divided from the target block so as not to overlap each other, and represents a unit in which a transform is performed. Also, the transformer 140 may determine a transform kernel for transforming the residual signal corresponding to each transform unit among a plurality of transform kernels.

As shown in fig. 14, the transformer 140 divides the target block into a plurality of regions in consideration of boundary lines between geometrically divided sub-blocks (S1220), anda plurality of residual sub-blocks having a rectangular or square shape of the size of the corresponding transform unit are generated by rearranging the residual signals in each region (S1230). The plurality of regions includes a boundary region including a boundary line and one or more non-boundary regions not including a boundary line. The number of regions divided from the target block may be equal to the number of transform units. Also, the number of samples in each region may be the same as the number of samples in the corresponding transform unit. For example, when the horizontal length of the transformation unit is 2 ^m A vertical length of 2 ⁿ The number of samples in the region corresponding to the transform unit is 2 ^m ×2 ⁿ 。

For example, as shown in fig. 14, the plurality of regions may include a first region (boundary region) including a boundary line and a second region (non-boundary region) excluding the first region from the target block and not including the boundary line. The first region may be a set of rows, each row consisting of a predetermined number of consecutive samples in the horizontal direction. Fig. 14 shows an example in which each row in the first region contains four samples in the horizontal direction. However, the present invention is not limited to a specific example, and the first region may be a set of columns, each consisting of a predetermined number of consecutive samples in the vertical direction. The number of samples in the horizontal or vertical direction constituting the first region may be determined by considering the size of the transform unit determined according to the first region, i.e., the number of samples in the transform unit. The first region is configured such that the number of samples in the first region is equal to the number of samples in the corresponding transform unit. Also, the type of transform kernel to be applied to the residual signal in the first region may be additionally considered to determine the number of samples.

The target block may be divided into three or more regions instead of two regions. For example, as shown in fig. 15, the target block may be divided into three areas (first to third areas) according to the distance from the boundary line. The residual signals in each region are rearranged to generate rectangular residual sub-blocks. The size of each region is determined according to the size of the transform unit determined by the region. The number of samples in each region is determined to be equal to the number of samples of the corresponding transform unit.

The transformer 140 transforms and quantizes each rectangular residual sub-block to generate a plurality of transform coefficient blocks (S1240). The transform information including information on the transform coefficients of the transform coefficient block and information on the transform units is encoded by the encoder 155 and transmitted to the video decoding apparatus. The transformation information may further include information about the transformation core.

On the other hand, although the above embodiments describe transforming, quantizing, and encoding all residual sub-blocks corresponding to respective regions, only a part of residual sub-blocks may be transformed, quantized, and encoded. The remaining residual sub-blocks are filled with zeros. For example, for a target block predicted by geometric division, only a boundary region including a boundary line among a plurality of regions may be encoded. Alternatively, one or more regions to be transformed, quantized, and encoded may be selected among a plurality of regions, and information indicating the selected regions may be signaled to the video decoding apparatus.

The decoder 510 of the video decoding apparatus decodes the transform information according to the bitstream. The decoder 510 determines rectangular or square transform units based on the transform information and generates a transform coefficient block by decoding transform coefficients corresponding to the respective transform units (S1610). The inverse quantizer 520 of the video decoding apparatus inversely quantizes each transform coefficient block, and the inverse transformer 530 inversely transforms the inversely quantized transform coefficient block to generate a residual sub-block S1620.

The inverse transformer 530 determines a plurality of regions within the target block by considering the boundary line specified by the information on the boundary line, the number of transform units, and the size of each transform unit. In other words, the inverse transformer 530 determines a boundary region including a boundary line and one or more non-boundary regions not including the boundary line within the target block (S1630). In order to identify the plurality of regions, information about the transform kernel included in the transform information may be further considered in addition to the size of the transform unit. The inverse transformer 530 generates a residual block of the target block by arranging or rearranging the residual signal of each residual sub-block into a corresponding region (S1640).

As described above, the video encoding apparatus may encode and transmit only a part of a plurality of regions including a boundary region and one or more non-boundary regions. In this case, the video decoding apparatus decodes, inverse quantizes, and inverse transforms only transform coefficient blocks corresponding to a part of the plurality of regions encoded by the video encoding apparatus to generate residual sub-blocks. Then, the video decoding apparatus rearranges residual signals within residual sub-blocks in the corresponding region. The remainder of the region is filled with zeros.

Which region among a plurality of regions is encoded may be predetermined between the video encoding apparatus and the video decoding apparatus, or the video encoding apparatus may signal information indicating the encoded region to the video decoding apparatus.

In the embodiment of fig. 12, the video encoding apparatus divides the target block into a plurality of regions according to transform units after determining rectangular or square transform units, and rearranges pixels within each region according to the size of the transform units to generate rectangular residual sub-blocks. Fig. 16 illustrates a decoding operation corresponding to the encoding operation of fig. 12.

As another embodiment, the transformer 140 may determine a transformation unit after first dividing the target block into a plurality of regions.

The transformer 140 divides the target block into a boundary region including the boundary line and one or more non-boundary regions not including the boundary line. Here, how to configure the boundary region and the non-boundary region between the encoding device and the decoding device may be preconfigured according to the type of the boundary line (for example, the intersection point at which the boundary line intersects with each side of the target block, or the angle of the boundary line and the position of the boundary line). Alternatively, the information related to region division may be determined at a higher level than the block level, such as a sequence level, a picture level, a slice level, or a tile level, and signaled to the video decoding device.

After determining the plurality of regions, the converter 140 determines a rectangular or square conversion unit corresponding to each region by considering the total number of pixels within each region or the number of pixels in the horizontal and vertical directions. Then, the transformer 140 rearranges the residual signals within each region according to the size of the transform unit to generate rectangular or square residual sub-blocks.

In this embodiment, additional signaling of transform information regarding the size of the transform unit may not be required. This is because the video decoding apparatus can determine the type of the boundary line from the information about the boundary line and can determine the transform unit from the type of the boundary line.

The video decoding apparatus recognizes the type of the boundary line based on the information on the boundary line and determines a rectangular or square transform unit according to the boundary line type. Then, the video decoding apparatus decodes the transform coefficients corresponding to the respective transform units to generate a plurality of transform coefficient blocks, and generates a residual sub-block by inversely quantizing and inversely transforming the transform coefficient blocks. The video decoding apparatus identifies a boundary region including a boundary line and one or more non-boundary regions excluding the boundary line within the target block by considering the boundary line and the size of the transform unit. Then, the video decoding apparatus generates a residual block of the target block by rearranging residual signals within each residual sub-block in the corresponding region.

In another embodiment, the video encoding apparatus may set a virtual area outside the target block based on the boundary line and set the boundary area to include not only an area inside the target block but also the virtual area. The residual signal in the dummy area is filled with a specific fixed value or residual signal value within the target block. Referring to fig. 17, a virtual line parallel to the boundary line is provided in each of two areas divided by the boundary line. The area formed between the two virtual lines is set as a boundary area. At this time, a triangle area formed by the lower left corner and the upper right corner of the target block may be set as the virtual area. The residual signals in the boundary region including the virtual region are rearranged into a rectangular or square first residual sub-block. Since the target block is rectangular or square and the boundary region includes a virtual region outside the target block, residual signals in non-boundary regions other than the boundary region inside the target block may not be rearranged into a rectangle or square. Accordingly, when the residual signals in the non-boundary region are rearranged into the rectangular or square second residual sub-block, the video encoding apparatus excludes the same number of residual signals as the number of residual signals in the virtual region from the non-boundary region. The position of the residual signal may be excluded in a predefined non-boundary area between the encoding device and the decoding device, or information about the position of the excluded residual signal may be signaled from the encoding device to the decoding device.

In the same manner as described in the other embodiments, the video decoding apparatus rearranges residual signals in the first residual sub-block into the boundary region after reconstructing the first residual sub-block. In this case, the filled dummy area is excluded. On the other hand, the residual signals in the second residual sub-block are rearranged into a non-boundary region. The residual signal excluded from the non-boundary region by the encoding device is filled with the prediction value. For example, the excluded residual signal may be filled with a specific value, or may also be filled with an average value of neighboring residual signals. As another method, the region excluding the residual signal may be filled with reconstructed pixel values around the excluded region at a pixel level after reconstruction instead of the residual level. The video decoding apparatus first reconstructs pixel values of regions other than the excluded region within the target block. The video decoding apparatus then predicts pixel values of the excluded region using reconstructed pixel values around the excluded region. In other words, the video decoding apparatus predicts the pixel values of the excluded region such that the sum of the residual signal of the excluded region and the predicted pixel corresponding to the residual signal is equal to the weighted average of the reconstructed pixel values around the excluded region.

Next, a method of quantizing or inverse-quantizing the target block will be described.

Multiple regions (boundary region and one or more non-boundary regions) in the target block may be quantized or inverse quantized using the same quantization parameters. For example, a quantization parameter is determined for the target block, and each region shares the quantization parameter of the target block. The video encoding apparatus encodes an incremental quantization parameter that is a difference between a quantization parameter of a target block and a predicted quantization parameter. Here, the predictive quantization parameter is generated from quantization parameters of a pre-reconstructed block adjacent to the target block. Referring to fig. 18, a predicted quantization parameter is generated from quantization parameters of at least a part of pre-reconstructed blocks among surrounding blocks "a" to "g". For example, the video encoding apparatus may set an average value of quantization parameters of the upper side block b and the left side block c as a predictive quantization parameter. The video decoding apparatus derives the predicted quantization parameter in the same manner as the video encoding apparatus, and calculates the quantization parameter of the target block by adding the predicted quantization parameter to the delta quantization parameter received from the video encoding apparatus.

Alternatively, the quantization parameter may be determined independently for each region in the target block. The video encoding apparatus signals an incremental quantization parameter to be applied to a residual signal corresponding to each region. The predicted quantization parameter of each region is predicted by the quantization parameter of a block adjacent to the corresponding region. Referring to fig. 18, a predicted quantization parameter of a region a (boundary region) is predicted according to a quantization parameter of at least a part of surrounding blocks "d", "e", and "f". According to an embodiment, the value for predicting the quantization parameter may be limited according to the width of the boundary region a or the area ratio between the regions a and B. For example, as shown in fig. 17, when the boundary region a is 1 to 2 pixel regions around the boundary line, the quantization parameter is predicted by "d" and "e". In another embodiment, if the boundary region a extends from the boundary, prediction may be performed using not only "e" and "d" but also "b", "c", "g" and "f" in addition. The predictive quantization parameter may be an average value of quantization parameters of pre-reconstructed surrounding blocks among the surrounding blocks. On the other hand, the predicted quantization parameter of the region B (non-boundary region) is predicted from the quantization parameter of the pre-reconstructed block among the surrounding blocks "a" to "g". For example, the predictive quantization parameter may be an average value of pre-reconstructed surrounding blocks.

Alternatively, the video encoding apparatus determines a quantization parameter for one region (e.g., a boundary region) among a plurality of regions, and derives quantization parameters of the remaining regions by adding an offset to the quantization parameter. For example, the video encoding device may transmit the delta quantization parameter of the boundary region and omit transmission of the delta quantization parameter of the non-boundary region. In the example of fig. 18, the prediction of the non-boundary region B is relatively good compared to the boundary region a. Therefore, the absolute amplitude of the residual signal may be small. Therefore, quantization may be always performed in the non-boundary region B using quantization parameters obtained by adding an offset to the quantization parameters of the boundary region a for encoding/decoding efficiency, and transmission of delta quantization parameters for the non-boundary region B may be omitted. The offset is a value of a quantization parameter added to the boundary region a, which may be a positive number. The offset may be adaptively determined according to the prediction mode. Different offsets may be applied depending on whether the prediction mode of the target block is inter prediction or intra prediction. Also, even in the inter prediction mode, different offsets may be applied according to whether the target block is in the merge mode or the AMVP mode. For example, in the merge mode, the offset may have a maximum value, and in the intra prediction mode, the offset may have a minimum value. The video decoding apparatus calculates a predictive quantization parameter in the same manner as the video encoding apparatus, and determines a quantization parameter of the boundary region by adding the predictive quantization parameter to the delta quantization parameter. The quantization parameters of the remaining regions are derived by adding the same offset as the video encoding apparatus to the quantization parameters of the boundary region. The offset may be a predefined fixed value between the video encoding device and the video decoding device or may be signaled at a high-level.

In the above description, it should be understood that the exemplary embodiments may be implemented in various other ways. The functions described in one or more examples may be implemented in hardware, software, firmware, or any combination thereof. It should be appreciated that the functional components described in this disclosure are labeled by adding a "unit" to the component name to specifically emphasize their implementation independence.

In another aspect, the various functions or methods of the present disclosure may be implemented as instructions stored in a non-transitory recording medium, which may be read and executed by one or more processors. A non-transitory recording medium includes, for example, all types of recording apparatuses in which data is stored in a form readable by a computer system. For example, non-transitory recording media include storage media such as erasable programmable read-only memory (EPROM), flash drives, optical drives, hard disk drives, and Solid State Drives (SSDs).

The above description is merely illustrative of the technical idea of the present embodiment, and various modifications and variations can be made by those skilled in the art without departing from the essential characteristics of the present embodiment. Therefore, the present embodiment is not intended to limit the technical idea of the present embodiment, but is used to describe the technical idea of the present embodiment, and the technical idea of the present embodiment is not limited to these embodiments. The scope of the present embodiment should be construed in accordance with the appended claims, and all technical ideas within the equivalent scope should be construed to be included in the scope of the present embodiment.

Cross Reference to Related Applications

The present application claims priority from korean patent application No. 10-2020-0097611 filed on 8 th month 4 of 2020, korean patent application No. 10-2020-0099240 filed on 8 th month 7 of 2020, and korean patent application No. 10-2021-0102494 filed on 4 of 2021, 8 th month, the entire contents of which are incorporated herein by reference.

Claims

1. A video decoding method for decoding a target block using intra prediction, the method comprising:

decoding boundary line information specifying at least one boundary line for dividing the target block according to a bitstream, wherein the boundary line information allows the target block to be divided into a plurality of non-rectangular blocks;

determining an intra prediction mode of the non-rectangular block based on the boundary line information;

generating a prediction block of the target block by performing intra prediction on each of the non-rectangular blocks using the intra prediction mode;

reconstructing a residual block of the target block from the bitstream; and

the target block is reconstructed by adding the prediction block and the residual block.

2. The method of claim 1, wherein determining the intra prediction mode for each of the non-rectangular blocks comprises:

Selecting an intra prediction mode to be applied to the non-rectangular block based on a direction of the boundary line specified by the boundary line information and a position of the non-rectangular block with respect to the boundary line; and

an intra prediction mode of the non-rectangular block is determined among the selected intra prediction modes using prediction mode information decoded from the bitstream.

3. The method of claim 2, wherein selecting an intra prediction mode to be applied to the non-rectangular block comprises:

determining one or more reference positions adjacent to the target block based on the direction and position of the boundary line specified by the boundary line information and the position of the non-rectangular block relative to the boundary line; and

a candidate list including intra prediction modes to be applied to the non-rectangular block is constructed using intra prediction modes of surrounding blocks including the reference position.

4. The method of claim 3, wherein the prediction mode information includes index information for selecting one intra prediction mode among the intra prediction modes included in the candidate list and a difference between the intra prediction mode of the non-rectangular block and the intra prediction mode indicated by the index information.

5. The method of claim 1, wherein for each of the non-rectangular blocks, generating the prediction block of the target block comprises:

configuring a reference pixel to be used for predicting pre-reconstruction of the non-rectangular block based on a direction and a position of the boundary line specified by the boundary line information and a position of the non-rectangular block with respect to the boundary line; and

intra prediction is performed on the non-rectangular block using the reference pixels.

6. The method of claim 1, wherein reconstructing the residual block comprises:

reconstructing a plurality of rectangular transform coefficient blocks from the bitstream;

generating a residual sub-block by inversely transforming each of the transform coefficient blocks; and

reconstructing the residual block by dividing the target block into a plurality of regions based on boundary lines specified by the boundary line information, the number of transform coefficient blocks and the size of each of the transform coefficient blocks and rearranging residual signals in each of the residual sub-blocks into corresponding regions,

the plurality of regions includes a boundary region including the boundary line and formed near the boundary line, and one or more non-boundary regions not including the boundary line.

7. A video encoding method of encoding a target block using intra prediction, the method comprising:

dividing the target block using at least one boundary line, wherein the boundary line allows the target block to be divided into a plurality of non-rectangular blocks;

determining an intra prediction mode of the non-rectangular block based on the boundary line;

generating a prediction block of the target block by performing intra prediction on each of the non-rectangular blocks;

generating a residual block of the target block by subtracting the prediction block from the target block; and

and encoding boundary line information for specifying the boundary line and a residual signal in the residual block.

8. The method of claim 7, wherein determining the intra-prediction mode for each of the non-rectangular blocks comprises:

selecting an intra prediction mode to be applied to the non-rectangular block based on a direction of the boundary line and a position of the non-rectangular block with respect to the boundary line; and

determining an intra prediction mode of the non-rectangular block among the selected intra prediction modes,

prediction mode information indicating an intra prediction mode for the non-rectangular block is encoded.

9. The method of claim 8, wherein selecting the intra-prediction mode to be applied to the non-rectangular block comprises:

determining one or more reference positions adjacent to the target block based on the direction and position of the boundary line and the position of the non-rectangular block relative to the boundary line; and

10. The method of claim 9, wherein the prediction mode information includes index information for selecting one intra prediction mode among intra prediction modes included in the candidate list and a difference between the intra prediction mode of the non-rectangular block and the intra prediction mode indicated by the index information.

11. The method of claim 7, wherein for each of the non-rectangular blocks, generating the prediction block of the target block comprises:

configuring a pre-reconstructed reference pixel to be used for predicting the non-rectangular block based on the direction and position of the boundary line and the position of the non-rectangular block relative to the boundary line; and

12. The method of claim 7, wherein encoding a residual signal in the residual block comprises:

dividing the target block into a plurality of regions, wherein the plurality of regions include a boundary region including the boundary line and formed near the boundary line, and one or more non-boundary regions not including the boundary line; and

generating a rectangular residual sub-block by rearranging residual signals in the regions for each of the plurality of regions and transforming the residual sub-block.

13. A video decoding apparatus that decodes a target block using intra prediction, the apparatus comprising:

a decoder configured to decode boundary line information and a transform coefficient specifying at least one boundary line for dividing the target block according to a bit stream, wherein the boundary line information allows the target block to be divided into a plurality of non-rectangular blocks;

a predictor configured to determine an intra prediction mode of the non-rectangular blocks based on the boundary line information, and generate a prediction block of the target block by performing intra prediction on each of the non-rectangular blocks using the intra prediction mode;

An inverse transformer configured to reconstruct a residual block of the target block by inversely transforming the transform coefficients; and

an adder configured to reconstruct the target block by adding the prediction block and the residual block.

14. The apparatus of claim 13, wherein for each of the non-rectangular blocks, the predictor is configured to:

selecting an intra prediction mode to be applied to the non-rectangular block based on a direction of the boundary line specified by the boundary line information and a position of the non-rectangular block with respect to the boundary line; and is also provided with

15. The apparatus of claim 13, wherein the decoder is configured to reconstruct a plurality of rectangular transform coefficient blocks from the bitstream, each rectangular transform coefficient block containing transform coefficients, and

wherein the inverse transformer is configured to:

generating a residual sub-block by inversely transforming each of the transform coefficient blocks; and is also provided with

Reconstructing the residual block by dividing the target block into a plurality of regions based on boundary lines specified by the boundary line information, the number of transform coefficient blocks and the size of each of the transform coefficient blocks and rearranging residual signals in each of the residual sub-blocks to corresponding regions,