WO2020008324A1

WO2020008324A1 - Shape dependent intra coding

Info

Publication number: WO2020008324A1
Application number: PCT/IB2019/055565
Authority: WO
Inventors: Hongbin Liu; Li Zhang; Kai Zhang; Yue Wang
Original assignee: Beijing Bytedance Network Technology Co., Ltd.; Bytedance Inc.
Priority date: 2018-07-01
Filing date: 2019-07-01
Publication date: 2020-01-09
Also published as: WO2020008328A1; TW202021344A; CN110677679B; TWI731361B; CN110677678A; CN110677679A; CN110677678B; TW202007153A

Abstract

A method of video bitstream processing includes generating, for a video block that is at least partly intra-coded, a list of intra mode candidates according to a first shape dependency rule that depends on a shape of the video block, and using the list of intra mode candidates to reconstruct a decoded representation of the video block. The shape dependency rule may also be extended to inter coding cases for merge candidate list or advanced motion vector prediction candidate list.

Description

SHAPE DEPENDENT INTRA CODING

CROSS REFERENCE TO RELATED APPLICATIONS

[001] Under the applicable patent law and/or rules pursuant to the Paris Convention, this application is made to timely claim the priority to and benefit of U.S. Provisional Patent Application No. 62/692,805, filed on July 1, 2018. For all purposes under the U.S. law, the entire disclosure of U.S. Provisional Patent Application No. 62/692,805 is incorporated by reference as part of the disclosure of this application.

TECHNICAL FIELD

[002] This document is related to video coding technologies. BACKGROUND

[003] Digital video accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow. SUMMARY

[004] The disclosed techniques may be used by video decoder or encoder embodiments for in which performance of coding of intra-coding of video blocks is improved using a block-shape dependent coding technique.

[005] In one example aspect, a method of video bitstream processing is disclosed. The method includes generating, for a video block that is at least partly intra-coded, a list of intra mode candidates according to a first shape dependency rule that depends on a shape of the video block, and using the list of intra mode candidates to reconstruct a decoded representation of the video block.

[006] In another example aspect, the above-described method may be implemented by a video decoder apparatus that comprises a processor. [007] In another example aspect, the above-described method may be implemented by a video encoder apparatus comprising a processor for decoding encoded video during video encoding process.

[008] In yet another example aspect, these methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.

[009] These, and other, aspects are further described in the present document.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] FIG. 1 is an illustration of a QUAD TREE BINARY TREE (QTBT) structure

[0011] FIG. 2 shows an example derivation process for merge candidates list construction.

[0012] FIG. 3 shows example positions of spatial merge candidates.

[0013] FIG. 4 shows an example of candidate pairs considered for redundancy check of spatial merge candidates.

[0014] FIG. 5 shows examples of positions for the second prediction unit (PU) of Nx2N and 2NxN partitions.

[0015] FIG. 6 is an illustration of motion vector scaling for temporal merge candidate.

[0016] FIG. 7 shows example candidate positions for temporal merge candidate, CO and Cl.

[0017] FIG. 8 shows an example of combined bi-predictive merge candidate.

[0018] FIG. 9 shows an example of a derivation process for motion vector prediction candidates.

[0019] FIG. 10 is an illustration of motion vector scaling for spatial motion vector candidate.

[0020] FIG. 11 shows an example of advanced temporal motion vector prediction (ATMVP) motion prediction for a coding unit (CU).

[0021] FIG. 12 shows an example of one CU with four sub-blocks (A-D) and its neighbouring blocks (a-d).

[0022] FIG. 13 illustrates proposed non-adjacent merge candidates in J0021.

[0023] FIG. 14 illustrates proposed non-adjacent merge candidates in J0058.

[0024] FIG. 15 illustrates proposed non-adjacent merge candidates in J0059.

[0025] FIG. 16 illustrates proposed 67 intra prediction modes. [0026] FIG. 17 shows examples of neighbouring blocks for most probable mode (MPM) derivation.

[0027] FIG. 18 shows examples of corresponding sub-blocks for a chroma CB in I slice.

[0028] FIG. 19A and FIG. 19B show examples of additional blocks used for MPM list.

[0029] FIG. 20 is a block diagram of an example of a video processing apparatus.

[0030] FIG. 21 shows a block diagram of an example implementation of a video encoder.

[0031] FIG. 22 is a flowchart for an example of a video bitstream processing method.

DETAILED DESCRIPTION

[0032] The present document provides various techniques that can be used by a decoder of video bitstreams to improve the quality of decompressed or decoded digital video. Furthermore, a video encoder may also implement these techniques during the process of encoding in order to reconstruct decoded frames used for further encoding. In the following description, the term video block is used to represent a logical grouping of pixels and different embodiments may work with video blocks of different sizes. Furthermore, a video block may correspond to one chroma or luma component or may include another component representation such as RGB representation.

[0033] Section headings are used in the present document for ease of understanding and do not limit the embodiments and techniques to the corresponding sections. As such, embodiments from one section can be combined with embodiments from other sections.

[0034] 1. Summary

[0035] The techniques described in this patent document relate to video coding technologies. Specifically, it is related to intra/inter mode coding in video coding. It may be applied to the existing video coding standard like High Efficiency Video Coding (HEVC), or the standard (Versatile Video Coding) to be finalized. It may be also applicable to future video coding standards or video codec.

[0036] 2. Background

[0037] Video coding standards have evolved primarily through the development of the well- known ITU-T and ISO/IEC standards. The ITU-T produced H.261 and H.263, ISO/IEC produced MPEG-l and MPEG-4 Visual, and the two organizations jointly produced the H.262/MPEG- 2 Video and H.264/MPEG-4 Advanced Video Coding (AVC) and H.265/HEVC standards. Since H.262, the video coding standards are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized. To explore the future video coding technologies beyond HEVC, Joint Video Exploration Team (JVET) was founded by VCEG and MPEG jointly in 2015. Since then, many new methods have been adopted by JVET and put into the reference software named Joint Exploration Model (JEM). In April 2018, the Joint Video Expert Team (JVET) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) was created to work on the VVC standard targeting at 50% bitrate reduction compared to HEVC.

[0038] FIG. 21 is a block diagram of an example implementation of a video encoder.

[0039] 2.1 Quadtree plus binary tree (QTBT) block structure with larger Coding Tree Units (CTUs)

[0040] In HEVC, a CTU is split into coding units (CUs) by using a quadtree structure denoted as coding tree to adapt to various local characteristics. The decision whether to code a picture area using inter-picture (temporal) or intra-picture (spatial) prediction is made at the CU level. Each CU can be further split into one, two or four prediction units (PUs) according to the PU splitting type. Inside one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU splitting type, a CU can be partitioned into transform units (TUs) according to another quadtree structure similar to the coding tree for the CU. One of key feature of the HEVC structure is that it has the multiple partition conceptions including CU, PU, and TU.

[0041] The QTBT structure removes the concepts of multiple partition types, i.e. it removes the separation of the CU, PU and TU concepts, and supports more flexibility for CU partition shapes. In the QTBT block structure, a CU can have either a square or rectangular shape. As shown in FIG. 1, a CTU is first partitioned by a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree structure. There are two splitting types, symmetric horizontal splitting and symmetric vertical splitting, in the binary tree splitting. The binary tree leaf nodes are called coding units (CUs), and that segmentation is used for prediction and transform processing without any further partitioning. This means that the CU, PU and TU have the same block size in the QTBT coding block structure. In the JEM, a CU sometimes consists of coding blocks (CBs) of different colour components, e.g. one CU contains one luma CB and two chroma CBs in the case of P and B slices of the 4:2:0 chroma format and sometimes consists of a CB of a single component, e.g., one CU contains only one luma CB or just two chroma CBs in the case of I slices.

[0042] The following parameters are defined for the QTBT partitioning scheme: - CTU size: the root node size of a quadtree, the same concept as in HEVC;

- MinQTSize : the minimum allowed quadtree leaf node size;

- MaxBTSize : the maximum allowed binary tree root node size;

- MaxBTDepth: the maximum allowed binary tree depth;

- MinBTSize : the minimum allowed binary tree leaf node size.

[0043] In one example of the QTBT partitioning structure, the CTU size is set as 128x128 luma samples with two corresponding 64x64 blocks of chroma samples, theMinQTSize is set as 16x 16, t e MaxBTSize is set as 64x64, the MinBTSize (for both width and height) is set as 4x4, and the MaxBTDepth is set as 4. The quadtree partitioning is applied to the CTU first to generate quadtree leaf nodes. The quadtree leaf nodes may have a size from 16x 16 (i.e., the MinQTSize) to 128x128 (i.e., the CTU size). If the leaf quadtree node is 128x128, it will not be further split by the binary tree since the size exceeds the MaxBTSize (i.e., 64x64). Otherwise, the leaf quadtree node could be further partitioned by the binary tree. Therefore, the quadtree leaf node is also the root node for the binary tree and it has the binary tree depth as 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further splitting is considered. When the binary tree node has width equal to MinBTSize (i.e., 4), no further horizontal splitting is considered. Similarly, when the binary tree node has height equal to MinBTSize, no further vertical splitting is considered. The leaf nodes of the binary tree are further processed by prediction and transform processing without any further partitioning. In the JEM, the maximum CTU size is 256x256 luma samples.

[0044] FIG. 1 illustrates an example of block partitioning by using QTBT, and FIG. 1 (right) illustrates the corresponding tree representation. The solid lines indicate quadtree splitting and dotted lines indicate binary tree splitting. In each splitting (i.e., non-leaf) node of the binary tree, one flag is signalled to indicate which splitting type (i.e., horizontal or vertical) is used, where 0 indicates horizontal splitting and 1 indicates vertical splitting. For the quadtree splitting, there is no need to indicate the splitting type since quadtree splitting always splits a block both horizontally and vertically to produce 4 sub-blocks with an equal size.

[0045] In addition, the QTBT scheme supports the ability for the luma and chroma to have a separate QTBT structure. Currently, for P and B slices, the luma and chroma CTBs in one CTU share the same QTBT structure. However, for I slices, the luma CTB is partitioned into CUs by a QTBT structure, and the chroma CTBs are partitioned into chroma CUs by another QTBT structure. This means that a CU in an I slice consists of a coding block of the luma component or coding blocks of two chroma components, and a CU in a P or B slice consists of coding blocks of all three colour components.

[0046] In HEVC, inter prediction for small blocks is restricted to reduce the memory access of motion compensation, such that bi-prediction is not supported for 4x8 and 8x4 blocks, and inter prediction is not supported for 4x4 blocks. In the QTBT of the JEM, these restrictions are removed.

[0047] 2.2 Inter prediction in HEVC/H.265

[0048] Each inter-predicted PU has motion parameters for one or two reference picture lists. Motion parameters include a motion vector and a reference picture index. Usage of one of the two reference picture lists may also be signalled using inter _predjdc. Motion vectors may be explicitly coded as deltas relative to predictors.

[0049] When a CU is coded with skip mode, one PU is associated with the CU, and there are no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current PU are obtained from neighbouring PUs, including spatial and temporal candidates. The merge mode can be applied to any inter-predicted PU, not only for skip mode. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector (to be more precise, motion vector difference compared to a motion vector predictor), corresponding reference picture index for each reference picture list and reference picture list usage are signalled explicitly per each PU. Such mode is named Advanced motion vector prediction (AMVP) in this disclosure.

[0050] When signalling indicates that one of the two reference picture lists is to be used, the PU is produced from one block of samples. This is referred to as‘uni-prediction’. Uni-prediction is available both for P-slices and B-slices. [0051] When signalling indicates that both of the reference picture lists are to be used, the PU is produced from two blocks of samples. This is referred to as‘bi-prediction’. Bi-prediction is available for B-slices only.

[0052] The following text provides the details on the inter prediction modes specified in HEVC. The description will start with the merge mode.

[0053] 2.2.1 Merge Mode

[0054] 2.2.1.1 Derivation of candidates for merge mode

[0055] When a PU is predicted using merge mode, an index pointing to an entry in the merge candidates list is parsed from the bitstream and used to retrieve the motion information. The construction of this list is specified in the HEVC standard and can be summarized according to the following sequence of steps:

• Step 1 : Initial candidates derivation

o Step 1.1 : Spatial candidates derivation

o Step 1.2: Redundancy check for spatial candidates

o Step 1.3: Temporal candidates derivation

• Step 2: Additional candidates insertion

o Step 2.1 : Creation of bi-predictive candidates

o Step 2.2: Insertion of zero motion candidates

[0056] These steps are also schematically depicted in FIG. 2. For spatial merge candidate derivation, a maximum of four merge candidates are selected among candidates that are located in five different positions. For temporal merge candidate derivation, a maximum of one merge candidate is selected among two candidates. Since constant number of candidates for each PU is assumed at decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of merge candidate (MaxNumMergeCand) which is signalled in slice header. Since the number of candidates is constant, index of best merge candidate is encoded using truncated unary binarization (TU). If the size of CU is equal to

8, all the PUs of the current CU share a single merge candidate list, which is identical to the merge candidate list of the 2Nx2N prediction unit.

[0057] In the following, the operations associated with the aforementioned steps are detailed.

[0058] 2.2.1.2 Spatial candidates derivation [0059] In the derivation of spatial merge candidates, a maximum of four merge candidates are selected among candidates located in the positions depicted in FIG. 3. The order of derivation is Ai_, Bi_, Bo_, Ao and B₂. Position B₂ is considered only when any PU of position Ai, Bi, B₀, A₀ is not available (e.g. because it belongs to another slice or tile) or is intra coded. After candidate at position Ai is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead only the pairs linked with an arrow in FIG. 4 are considered and a candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information. Another source of duplicate motion information is the“ second PU” associated with partitions different from 2Nx2N. As an example, FIG. 5 depicts the second PU for the case of Nx2N and 2N/N, respectively. When the current PU is partitioned as Nx2N, candidate at position Ai is not considered for list construction. In fact, by adding this candidate will lead to two prediction units having the same motion information, which is redundant to just have one PU in a coding unit. Similarly, position Bi is not considered when the current PU is partitioned as 2NxN.

[0060] 2.2.1.3 Temporal candidates derivation

[0061] In this step, only one candidate is added to the list. Particularly, in the derivation of this temporal merge candidate, a scaled motion vector is derived based on co-located PU belonging to the picture which has the smallest POC difference with current picture within the given reference picture list. The reference picture list to be used for derivation of the co-located PU is explicitly signalled in the slice header. The scaled motion vector for temporal merge candidate is obtained as illustrated by the dashed line in FIG. 6, which is scaled from the motion vector of the co-located PU using the POC distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero. A practical realization of the scaling process is described in the HEVC specification [1] For a B-slice, two motion vectors, one is for reference picture list 0 and the other is for reference picture list 1, are obtained and combined to make the bi-predictive merge candidate.

[0062] FIG. 6 is an illustration of motion vector scaling for temporal merge candidate. [0063] In the co-located PU (Y) belonging to the reference frame, the position for the temporal candidate is selected between candidates Co and Ci, as depicted in FIG. 7. If PU at position Co is not available, is intra coded, or is outside of the current CTU row, position Ci is used. Otherwise, position Co is used in the derivation of the temporal merge candidate.

[0064] 2.2.1.4 Additional candidates insertion

[0065] Besides spatial and temporal merge candidates, there are two additional types of merge candidates: combined bi-predictive merge candidate and zero merge candidate. Combined bi- predictive merge candidates are generated by utilizing spatial and temporal merge candidates. Combined bi-predictive merge candidate is used for B-Slice only. The combined bi-predictive candidates are generated by combining the first reference picture list motion parameters of an initial candidate with the second reference picture list motion parameters of another. If these two tuples provide different motion hypotheses, they will form a new bi-predictive candidate. As an example, FIG. 8 depicts the case when two candidates in the original list (on the left), which have mvLO and refldxLO or mvLl and refldxLl, are used to create a combined bi-predictive merge candidate added to the final list (on the right). There are numerous rules regarding the combinations which are considered to generate these additional merge candidates, defined in [1] [0066] Zero motion candidates are inserted to fill the remaining entries in the merge candidates list and therefore hit the MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index which starts from zero and increases every time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is one and two for uni and bi-directional prediction, respectively. Finally, no redundancy check is performed on these candidates.

[0067] 2.2.1.5 Motion estimation regions for parallel processing

[0068] To speed up the encoding process, motion estimation can be performed in parallel whereby the motion vectors for all prediction units inside a given region are derived

simultaneously. The derivation of merge candidates from spatial neighbourhood may interfere with parallel processing as one prediction unit cannot derive the motion parameters from an adjacent PU until its associated motion estimation is completed. To mitigate the trade-off between coding efficiency and processing latency, HEVC defines the motion estimation region (MER) whose size is signalled in the picture parameter set using the

“log2_parallel_merge_level_minus2” syntax element. When a MER is defined, merge candidates falling in the same region are marked as unavailable and therefore not considered in the list construction.

[0069] 2.2.2 AMVP

[0070] AMVP exploits spatio-temporal correlation of motion vector with neighbouring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by firstly checking availability of left, above temporally neighbouring PU positions, removing redundant candidates and adding zero vector to make the candidate list to be constant length. Then, the encoder can select the best predictor from the candidate list and transmit the corresponding index indicating the chosen candidate. Similarly with merge index signalling, the index of the best motion vector candidate is encoded using truncated unary. The maximum value to be encoded in this case is 2 (see FIG. 9). In the following sections, details about derivation process of motion vector prediction candidate are provided.

[0071] 2.2.2.1 Derivation of AMVP candidates

[0072] FIG. 9 summarizes derivation process for motion vector prediction candidate.

[0073] In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidate and temporal motion vector candidate. For spatial motion vector candidate derivation, two motion vector candidates are eventually derived based on motion vectors of each PU located in five different positions as depicted in FIG. 3.

[0074] For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatio-temporal candidates is made, duplicated motion vector candidates in the list are removed. If the number of potential candidates is larger than two, motion vector candidates whose reference picture index within the associated reference picture list is larger than 1 are removed from the list. If the number of spatio-temporal motion vector candidates is smaller than two, additional zero motion vector candidates is added to the list.

[0075] 2.2.2.2 Spatial motion vector candidates

[0076] In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates, which are derived from PUs located in positions as depicted in FIG. 3, those positions being the same as those of motion merge. The order of derivation for the left side of the current PU is defined as Ao, Ai, and scaled Ao, scaled Ai. The order of derivation for the above side of the current PU is defined as Bo, Bi, B₂, scaled B₀, scaled Bi, scaled B₂. For each side there are therefore four cases that can be used as motion vector candidate, with two cases not required to use spatial scaling, and two cases where spatial scaling is used. The four different cases are summarized as follows.

• No spatial scaling

- (1) Same reference picture list, and same reference picture index (same POC)

- (2) Different reference picture list, but same reference picture (same POC)

• Spatial scaling

- (3) Same reference picture list, but different reference picture (different POC)

- (4) Different reference picture list, and different reference picture (different POC) [0077] The no-spatial-scaling cases are checked first followed by the spatial scaling. Spatial scaling is considered when the POC is different between the reference picture of the

neighbouring PU and that of the current PU regardless of reference picture list. If all PUs of left candidates are not available or are intra coded, scaling for the above motion vector is allowed to help parallel derivation of left and above MV candidates. Otherwise, spatial scaling is not allowed for the above motion vector.

[0078] FIG. 10 is an illustration of motion vector scaling for spatial motion vector candidate.

[0079] In a spatial scaling process, the motion vector of the neighbouring PU is scaled in a similar manner as for temporal scaling, as depicted as FIG. 10. The main difference is that the reference picture list and index of current PU is given as input; the actual scaling process is the same as that of temporal scaling.

[0080] 2.2.2.3 Temporal motion vector candidates

[0081] Apart for the reference picture index derivation, all processes for the derivation of temporal merge candidates are the same as for the derivation of spatial motion vector candidates (see FIG. 7). The reference picture index is signalled to the decoder.

[0082] 2.3 New inter merge candidates in JEM

[0083] 2.3.1 Sub-CU based motion vector prediction

[0084] In the JEM with QTBT, each CU can have at most one set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by splitting a large CU into sub-CUs and deriving motion information for all the sub- CUs of the large CU. Alternative temporal motion vector prediction (ATMVP) method allows each CU to fetch multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In spatial-temporal motion vector prediction (STMVP) method motion vectors of the sub-CUs are derived recursively by using the temporal motion vector predictor and spatial neighbouring motion vector.

[0085] To preserve more accurate motion field for sub-CU motion prediction, the motion compression for the reference frames is currently disabled.

[0086] 2.3.1.1 Alternative temporal motion vector prediction

[0087] In the alternative temporal motion vector prediction (ATMVP) method, the motion vectors temporal motion vector prediction (TMVP) is modified by fetching multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. As shown in FIG. 11, the sub-CUs are square NxN blocks (N is set to 4 by default).

[0088] ATMVP predicts the motion vectors of the sub-CUs within a CU in two steps. The first step is to identify the corresponding block in a reference picture with a so-called temporal vector. The reference picture is called the motion source picture. The second step is to split the current CU into sub-CUs and obtain the motion vectors as well as the reference indices of each sub-CU from the block corresponding to each sub-CU, as shown in FIG. 11.

[0089] In the first step, a reference picture and the corresponding block is determined by the motion information of the spatial neighbouring blocks of the current CU. To avoid the repetitive scanning process of neighbouring blocks, the first merge candidate in the merge candidate list of the current CU is used. The first available motion vector as well as its associated reference index are set to be the temporal vector and the index to the motion source picture. This way, in

ATMVP, the corresponding block may be more accurately identified, compared with TMVP, wherein the corresponding block (sometimes called collocated block) is always in a bottom-right or center position relative to the current CU.

[0090] In the second step, a corresponding block of the sub-CU is identified by the temporal vector in the motion source picture, by adding to the coordinate of the current CU the temporal vector. For each sub-CU, the motion information of its corresponding block (the smallest motion grid that covers the center sample) is used to derive the motion information for the sub-CU. After the motion information of a corresponding NxN block is identified, it is converted to the motion vectors and reference indices of the current sub-CU, in the same way as TMVP of HEVC, wherein motion scaling and other procedures apply. For example, the decoder checks whether the low-delay condition (i.e. the POCs of all reference pictures of the current picture are smaller than the POC of the current picture) is fulfilled and possibly uses motion vector MV_X (the motion vector corresponding to reference picture list X) to predict motion vector MV_y (with X being equal to 0 or 1 and Y being equal to l-X) for each sub-CU.

[0091] 2.3.1.2 Spatial-temporal motion vector prediction

[0092] In this method, the motion vectors of the sub-CUs are derived recursively, following raster scan order. FIG. 12 illustrates this concept. Let us consider an 8x8 CU which contains four 4x4 sub-CUs A, B, C, and D. The neighbouring 4x4 blocks in the current frame are labelled as a, b, c, and d.

[0093] The motion derivation for sub-CU A starts by identifying its two spatial neighbours.

The first neighbour is the NxN block above sub-CU A (block c). If this block c is not available or is intra coded the other NxN blocks above sub-CU A are checked (from left to right, starting at block c). The second neighbour is a block to the left of the sub-CU A (block b). If block b is not available or is intra coded other blocks to the left of sub-CU A are checked (from top to bottom, staring at block b). The motion information obtained from the neighbouring blocks for each list is scaled to the first reference frame for a given list. Next, temporal motion vector predictor (TMVP) of sub-block A is derived by following the same procedure of TMVP derivation as specified in HEVC. The motion information of the collocated block at location D is fetched and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged separately for each reference list. The averaged motion vector is assigned as the motion vector of the current sub-CU.

[0094] 2.3.1.3 Sub-CU motion prediction mode signalling

[0095] The sub-CU modes are enabled as additional merge candidates and there is no additional syntax element required to signal the modes. Two additional merge candidates are added to merge candidates list of each CU to represent the ATMVP mode and STMVP mode.

Up to seven merge candidates are used, if the sequence parameter set indicates that ATMVP and STMVP are enabled. The encoding logic of the additional merge candidates is the same as for the merge candidates in the HM, which means, for each CU in P or B slice, two more RD checks is needed for the two additional merge candidates. [0096] In the JEM, all bins of merge index is context coded by context-adaptive binary arithmetic coding (CABAC). While in HEVC, only the first bin is context coded and the remaining bins are context by-pass coded.

[0097] 2.3.2 Non-adjacent merge candidates

[0098] In J0021, Qualcomm proposes to derive additional spatial merge candidates from non- adjacent neighboring positions which are marked as 6 to 49 as in FIG. 13. The derived candidates are added after TMVP candidates in the merge candidate list.

[0099] In J0058, Tencent proposes to derive additional spatial merge candidates from positions in an outer reference area which has an offset of (-96, -96) to the current block.

[00100] As shown in FIG. 14, the positions are marked as A(i,j), B(i,j), C(i,j), D(i,j) and E(i,j). Each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidates. Each candidate A (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous A or D candidates. Each E (i, j) has an offset of 16 in both horizontal direction and vertical direction compared to its previous E candidates. The candidates are checked from inside to the outside. And the order of the candidates is A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). To further study whether the number of merge candidates can be further reduced. The candidates are added after TMVP candidates in the merge candidate list.

[00101] In J0059, the extended spatial positions from 6 to 27 as in FIG. 15 are checked according to their numerical order after the temporal candidate. To save the MV line buffer, all the spatial candidates are restricted within two CTU lines.

[00102] 2.4 Intra prediction in JEM

[00103] 2.4.1 Intra mode coding with 67 intra prediction modes

[00104] To capture the arbitrary edge directions presented in natural video, the number of directional intra modes is extended from 33, as used in HEVC, to 65. The additional directional modes are depicted as red dotted arrows in FIG. 16, and the planar and DC modes remain the same. These denser directional intra prediction modes apply for all block sizes and for both luma and chroma intra predictions.

[00105] 2.4.2 Luma intra mode coding

[00106] To accommodate the increased number of directional intra modes, an intra mode coding method with 6 Most Probable Modes (MPMs) is used. Two major technical aspects are involved: 1) the derivation of 6 MPMs, and 2) entropy coding of 6 MPMs and non-MPM modes. [00107] In the JEM, the modes included into the MPM lists are classified into three groups:

• Neighbour intra modes

• Derived intra modes

• Default intra modes

[00108] Five neighbouring intra prediction modes are used to form the MPM list. Those locations of the 5 neighbouring blocks are the same as those used in the merge mode, i.e., left (L), above (A), below-left (BL), above-right (AR), and above-left (AL) as shown in FIG. 17. An initial MPM list is formed by inserting 5 neighbour intra modes and the planar and DC modes into the MPM list. A pruning process is used to remove duplicated modes so that only unique modes can be included into the MPM list. The order in which the initial modes are included is: left, above, planar, DC, below-left, above-right, and then above-left.

[00109] FIG. 17 shows examples of neighbouring blocks for MPM derivation.

[00110] If the MPM list is not full (i.e., there are less than 6 MPM candidates in the list), derived modes are added; these intra modes are obtained by adding -1 or +1 to the angular modes that are already included in the MPM list. Such additional derived modes are not generated from the non-angular modes (DC or planar).

[00111] Finally, if the MPM list is still not complete, the default modes are added in the following order: vertical, horizontal, mode 2, and diagonal mode. As a result of this process, a unique list of 6 MPM modes is generated.

[00112] For entropy coding of the selected mode using the 6 MPMs, a truncated unary binarization is used. The first three bins are coded with contexts that depend on the MPM mode related to the bin currently being signalled. The MPM mode is classified into one of three categories: (a) modes that are predominantly horizontal (i.e., the MPM mode number is less than or equal to the mode number for the diagonal direction), (b) modes that are predominantly vertical (i.e., the MPM mode is greater than the mode number for the diagonal direction), and (c) the non-angular (DC and planar) class. Accordingly, three contexts are used to signal the MPM index based on this classification.

[00113] The coding for selection of the remaining 61 non-MPMs is done as follows. The 61 non-MPMs are first divided into two sets: a selected mode set and a non-selected mode set. The selected modes set contains 16 modes and the rest (45 modes) are assigned to the non-selected modes set. The mode set that the current mode belongs to is indicated in the bitstream with a flag. If the mode to be indicated is within the selected modes set, the selected mode is signalled with a 4-bit fixed-length code, and if the mode to be indicated is from the non-selected set, the selected mode is signalled with a truncated binary code. The selected modes set is generated by sub-sampling the 61 non-MPM modes as follows: Selected modes set = {0, 4, 8, 12, 16, 20 ... 60}

Non-selected modes set = {1, 2, 3, 5, 6, 7, 9, 10 ... 59}

[00114] At the encoder side, the similar two-stage intra mode decision process of HM is used. In the first stage, i.e., the intra mode pre-selection stage, a lower complexity Sum of Absolute Transform Difference (SATD) cost is used to pre-select N intra prediction modes from all the available intra modes. In the second stage, a higher complexity R-D cost selection is further applied to select one intra prediction mode from the N candidates. However, when 67 intra prediction modes is applied, since the total number of available modes is roughly doubled, the complexity of the intra mode pre-selection stage will also be increased if the same encoder mode decision process of HM is directly used. To minimize the encoder complexity increase, a two- step intra mode pre-selection process is performed. At the first step, N ( N depends on intra prediction block size) modes are selected from the original 35 intra prediction modes (indicated by black solid arrows in FIG. 16) based on the Sum of Absolute Transform Difference (SATD) measure; At the second step, the direct neighbours (additional intra prediction directions as indicated by dashed arrows in FIG. 16) of the selected N modes are further examined by SATD, and the list of selected N modes are updated. Finally, the first MMPMs are added to the N modes if not already included, and the final list of candidate intra prediction modes is generated for the second stage R-D cost examination, which is done in the same way as HM. The value of M is increased by one based on the original setting in the HM, and N is decreased somewhat as shown below in Table 1.

Table 1: Number of mode candidates at the intra mode pre-selection step

[00115] 2.4.3 Chroma intra mode coding

[00116] In the JEM, a total of 11 intra modes are allowed for chroma CB coding. Those modes include 5 traditional intra modes and 6 cross-component linear model modes. The list of chroma mode candidates includes the following three parts:

• CCLM modes

• DM modes, intra prediction modes derived from luma CBs covering the collocated five positions of the current chroma block

o The five positions to be checked in order are: center (CR), top-left (TL), top-right (TR), bottom-left (BL) and bottom-right (BR) 4x4 block within the corresponding luma block of current chroma block for I slices. For P and B slices, only one of these five sub-blocks is checked since they have the same mode index. An example of five collocated luma positions is shown in FIG. 18.

• Chroma prediction modes from spatial neighbouring blocks:

o 5 chroma prediction modes: from left, above, below-left, above-right, and above- left spatially neighbouring blocks

o Planar and DC modes

o Derived modes are added, these intra modes are obtained by adding -1 or +1 to the angular modes which are already included into the list

o Vertical, horizontal, mode 2

[00117] A pruning process is applied whenever a new chroma intra mode is added to the candidate list. The non-CCLM chroma intra mode candidates list size is then trimmed to 5. For the mode signalling, a flag is first signalled to indicate whether one of the CCLM modes or one of the traditional chroma intra prediction mode is used. Then a few more flags may follow to specify the exact chroma prediction mode used for the current chroma CBs.

[00118] 3. Examples of Problems solved by embodiments

[00119] With QTBT, there are quite different CU shapes, like 4x32 and 32x4 etc. For different CU shapes, they may have different correlations with neighboring blocks. However, in the intra mode and inter mode coding, the merge list, AMVP list or MPM list are constructed in the same way for all CU shapes, which is not reasonable.

[00120] Meanwhile, the default intra modes used for MPM list construction are always vertical (VER), horizontal (HOR), mode 2, and diagonal mode (DIG), which is not reasonable.

[00121] 4. Examples of embodiments [00122] To tackle the technical problems described in this patent document, and provide other benefits, a shape dependent intra/inter mode coding is proposed, in which different merge list, AMVP list or MPM list may be constructed.

[00123] The detailed examples below should be considered as examples to explain general concepts. These exemplary features should not be interpreted in a narrow way. Furthermore, these exemplary features can be combined in any manner.

1. It is proposed that the insertion of intra mode candidates in the MPM list depends on the current coding block shape (e.g., the coding block is a CU).

a. In one example, for CU shape with width > N * height, an intra prediction mode fetched from the above neighbouring block is inserted before that fetched from the left neighbouring block, wherein N is equal to 1, 2, 3 or other values.

i. Alternatively, in addition, an intra prediction mode fetched from the above right neighbouring block is inserted before that fetched from the below-left neighbouring block.

ii. Alternatively, in addition, an intra prediction mode fetched from the above- left neighbouring block is inserted before that fetched from the below-left neighbouring block.

iii. Alternatively, in addition, intra prediction modes fetched from neighbouring blocks above the current block are all inserted before those fetched from neighbouring blocks left to the current block. b. In one example, for CU shape with width > N * height, it is proposed to insert more intra-prediction modes fetched from above blocks, like the above-middle block shown in FIG. 19 A).

c. In one example, for CU shape with height > N * width, it is proposed to insert more intra-prediction modes fetched from left blocks, like the left-middle block shown in FIG. 19B).

d. Alternatively, furthermore, the remaining intra prediction modes out of the MPM list may be re-ordered based on the block shape. That is to say, the codeword length or coding context to code the remaining intra prediction modes may depend on the block shape.

2. It is proposed that the default intra modes used for constructing the MPM list depends on the current CU shape.

a. In one example, for CU shape with width > M * height, vertical diagonal (VDIG) mode is used instead of mode 2 (horizonal diagonal), wherein M is equal to 1, 2 or other values.

b. In one example, for CU shape with width > N * height, modes HOR -/+ k are inserted instead of mode 2 or/and diagonal mode, wherein k is equal to 1, 2, 3, ... , 8 )· c. In one example, for CU shape with width > N * height, HOR mode is inserted before VER mode.

d. In one example, for CU shape with height > N * height, mode VER -/+ k are inserted instead of mode 2 or/and diagonal mode.

3. Alternatively, in addition, it is proposed that after constructing the MPM list, the MPM list is further reordered depending on the current CU shape.

a. In one example, for a CU with width > N * height, intra prediction modes closer to the horizonal direction are preferred than the others closer to the vertical direction. i. The MPM list is scanned from the beginning, when an intra prediction mode closer to the vertical direction is encountered, check its following modes, if a mode closer to the horizonal direction is found, they are swapped. Such procedure is repeated until the whole list is processed.

ii. Alternatively, such swap is not applied to modes VER -/+ k even though they are closer to the vertical direction, wherein k is equal to 1, 2, 3 or other values.

b. In one example, for a CU with height > N * width, intra prediction modes closer to the vertical direction are preferred than the others closer to the horizonal direction. i. The MPM list is scanned from the beginning, when an intra prediction mode closer to the horizonal direction is encountered, check its following modes, if a mode closer to the vertical direction is found, they are swapped. Such procedure is repeated until the whole list is processed.

ii. Alternatively, such swap is not applied to modes HOR -/+ k even though they are closer to the horizonal direction.

4. The term‘block shape’ in above bullets may denote:

a. Square block or non-square blocks

b. ratio of width and height of the current coding block

c. Is defined by width and height of the block.

5. The proposed methods may be applied to certain modes, block sizes/shapes, and/or certain sub-block sizes.

a. The proposed methods may be applied to certain modes, such as conventional translational motion (i.e., affine mode is disabled).

b. The proposed methods may be applied to certain block sizes.

i. In one example, it is only applied to a block with wxh>=T, where w and h are the width and height of the current block.

ii. In another example, it is only applied to a block with w >=T && h >=T.

6. The proposed methods may be applied on all colour components. Alternatively, they may be applied only to some colour components. For example, they may be only applied on the luma component.

[00124] FIG. 20 is a block diagram of a video processing apparatus 2000. The apparatus 2000 may be used to implement one or more of the methods described herein. The apparatus 2000 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, and so on. The apparatus 2000 may include one or more processors 2002, one or more memories 2004 and video processing hardware 2006. The processor(s) 2002 may be configured to implement one or more methods described in the present document, such as the methods described with reference to method 2200. The memory (memories) 2004 may be used for storing data and code used for implementing the methods and techniques described herein, such as the methods described with reference to method 2200. The video processing hardware 2006 may be used to implement, in hardware circuitry, some techniques described in the present document. In various

implementations the memories 2004 for and/or the video processing hardware 2006 may be partially or fully incorporated into the processor 2002 itself.

[00125] FIG. 22 is a flowchart for a method 2200 of video bitstream processing. The method 2200 includes generating (2202), for a video block that is at least partly intra-coded, a list of intra mode candidates according to a first shape dependency rule that depends on a shape of the video block, and reconstructing (2204) a decoded representation of the video block using the list of intra mode candidates.

[00126] With reference to method 2200, in some embodiments, the list of intra mode candidates is the list of most-probable-mode (MPM) candidate list. With reference to method 2200, in some embodiments, the first shape dependency rule specifies an order in which neighboring blocks are checked for insertion into the list of intra mode candidates. With reference to method 2200, in some embodiments, the first shape dependency rule specifies that in case that the video block has a width that is greater than N multiples of a height of the video block, where N is an integer greater than or equal to 1, then the list of intra mode candidates is generated by first using intra prediction modes from above neighboring blocks relative to the video block before intra prediction modes from left neighboring blocks relative to the video block.

[00127] With reference to method 2200, in some embodiments, an intra prediction mode from an above-right neighboring block relative to the video block is added to the list of intra mode candidates before an intra prediction mode from a below-left neighboring block relative to the video block, or an intra prediction mode from an above-left neighboring block relative to the video block is added to the list of intra mode candidates before an intra prediction mode from a below-left neighboring block relative to the video block. [00128] With reference to method 2200, in some embodiments, the first shape dependency rule specifies that in case that the video block has a width that is greater than N multiples of a height of the video block, where N is an integer greater than or equal to 1, then the list of intra mode candidates includes intra prediction modes from an above neighboring block relative to the video block. With reference to method 2200, in some embodiments, the above neighboring block is a middle block.

[00129] With reference to method 2200, in some embodiments, the first shape dependency rule specifies that in case that the video block has a height that is greater than N multiples of a width of the video block, where N is an integer greater than or equal to 1 , then the list of intra mode candidates includes intra prediction modes from a left neighboring block relative to the video block. With reference to method 2200, in some embodiments, the left neighboring block is a middle block. With reference to method 2200, in some embodiments, the video bitstream processing comprises a compressed representation of the video block that is encoded using codewords that are allocated using a second shape dependency rule.

[00130] With reference to method 2200, in some embodiments, the first shape dependency rule specifies a default intra mode used for constructing the list of intra mode candidates. With reference to method 2200, in some embodiments, the first shape dependency rule specifies that in case that the video block has a width that is greater than M multiples of a height of the video block, where M is an integer greater than or equal to 1, then the default intra mode corresponds to vertical diagonal mode. With reference to method 2200, in some embodiments, the first shape dependency rule specifies that in case that the video block has a width that is greater than M multiples of a height of the video block, where M is an integer greater than or equal to 1 , then modes HOR -/+ k are used as the default intra mode, where k = 1 , 2, 3, ... , or 8. With reference to method 2200, in some embodiments, the first shape dependency rule specifies that that in case that the video block has a height that is greater than N multiples of a width of the video block, where N is an integer greater than or equal to 1, then modes VER -/+ k are inserted in the list of intra mode candidates, where k = 1, 2, 3, ... , or 8. With reference to method 2200, in some embodiments, the first shape dependency rule specifies that that in case that the video block has a height that is greater than N multiples of a width of the video block, where N is an integer greater than or equal to 1, then the list of infra mode candidates includes a HOR mode before a VER mode. [00131] With reference to method 2200, in some embodiments, the first shape dependency rule specifies an ordering of the list of intra mode candidates that depends on the shape of the video block. With reference to method 2200, in some embodiments, the first dependency rule specifies to use intra prediction modes closer to a horizonal direction over others closer to a vertical direction in case that a width of the video block is greater than N multiple of a height of the video block, where N is an integer greater than or equal to 1. With reference to method 2200, in some embodiments, the method further includes reordering the list of intra mode candidates by: scanning the list of intra mode candidates from a beginning thereof; and swapping, in case an intra prediction mode entry closer to a vertical direction is found, that entry with a subsequent entry that closer to a horizontal direction.

[00132] With reference to method 2200, in some embodiments, the first dependency rule specifies to use intra prediction modes closer to a vertical direction over others closer to the horizonal direction in case that a height of the video block is greater than N multiple of a width of the video block, where N is an integer greater than or equal to 1. With reference to method 2200, in some embodiments, the method further includes reordering the list of intra mode candidates by: scanning the list of intra mode candidates from a beginning thereof; and swapping, in case an intra prediction mode entry closer to a horizontal direction is found, that entry with a subsequent entry that closer to a vertical direction.

[00133] With reference to method 2200, in some embodiments, the video block includes a coding unit (CU). With reference to method 2200, in some embodiments, the shape of the video block is one of a square or a rectangle. With reference to method 2200, in some embodiments, the shape of the video block corresponds to a ratio of the width and the height. With reference to method 2200, in some embodiments, the first shape dependency rule selectively applies two different dependency rules based on a coding condition of the video block. With reference to method 2200, in some embodiments, the coding condition includes whether a number of pixels in the video block or a height of the video block or a width of the video block is greater than or equal to a threshold value. With reference to method 2200, in some embodiments, the method is applied to one or more of a luma component or a chroma component of the video block.

[00134] A video decoding apparatus comprising a processor configured to implement a method recited with reference to method 2200. A video encoding apparatus comprising a processor configured to implement a method recited with reference to method 2200. A computer program product having computer code stored thereon, the code, when executed by a processor, causes the processor to implement a method recited with reference to method 2200.

[00135] With reference to method 2200, the video block may represent a CU of a compressed video bitstream. The shape of the video block may depend on height to width ratio, or actual values of height and width, or relative values of height and widths. In various embodiments, the various lists of candidates may be generated implicitly or explicitly (e.g., by storing a list in memory).

[00136] With reference to method 2200, some examples of neighboring blocks and their use are described in Section 4 of the present document. For example, as described in Section 4, under different shapes of the video block, a preference may be given to either the top neighboring blocks or the left neighboring blocks. In some embodiments, the central or middle block (or sub block) of the top or left side may be the preferred block from which candidates to add to the list are used.

[00137] With reference to method 2200, the video block may be encoded in the video bitstream using a codeword based technique (e.g., a context adaptive binary arithmetic coding or a variable length coding) technique in which bit efficiency may be achieved by using a bitstream generation rule that also depends on the shape of the video block.

[00138] With reference to method 2200, the shape of the encoded video block may be used to either decide which blocks to use for candidates, or decide the order in which to place the candidates in the list of candidates, or both.

[00139] It will be appreciated that the disclosed techniques may be embodied in video encoders or decoders to improve compression efficiency when the coding units being

compressed have shaped that are significantly different than the traditional square shaped blocks or rectangular blocks that are half-square shaped. For example, new coding tools that use long or tall coding units such as 4x32 or 32x4 sized units may benefit from the disclosed techniques.

[00140] The disclosed and other solutions, examples, embodiments, modules and the functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more them. The term“data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

[00141] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).

A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[00142] The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

[00143] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random-access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[00144] While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

[00145] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all

embodiments.

[00146] Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A method of video bitstream processing, comprising:

generating, for a video block that is at least partly intra-coded, a list of intra mode candidates according to a first shape dependency rule that depends on a shape of the video block; and

using the list of intra mode candidates to reconstruct a decoded representation of the video block.

2. The method of claim 1, wherein the list of intra mode candidates is the list of most- probable-mode (MPM) candidate list.

3. The method according to claims 1 or 2, wherein the first shape dependency rule specifies an order in which neighboring blocks are checked for insertion into the list of intra mode candidates.

4. The method of claim 3, wherein the first shape dependency rule specifies that in case that the video block has a width that is greater than N multiples of a height of the video block, where N is an integer greater than or equal to 1, then the list of intra mode candidates is generated by first using intra prediction modes from above neighboring blocks relative to the video block before intra prediction modes from left neighboring blocks relative to the video block.

5. The method of claim 4, wherein

an intra prediction mode from an above-right neighboring block relative to the video block is added to the list of intra mode candidates before an intra prediction mode from a below- left neighboring block relative to the video block, or

an intra prediction mode from an above-left neighboring block relative to the video block is added to the list of intra mode candidates before an intra prediction mode from a below- left neighboring block relative to the video block.

6. The method of claim 1 , wherein the first shape dependency rule specifies that in case that the video block has a width that is greater than N multiples of a height of the video block, where N is an integer greater than or equal to 1 , then the list of intra mode candidates includes intra prediction modes from an above neighboring block relative to the video block.

7. The method of claim 6, wherein the above neighboring block is a middle block.

8. The method of claim 1, wherein the first shape dependency rule specifies that in case that the video block has a height that is greater than N multiples of a width of the video block, where N is an integer greater than or equal to 1 , then the list of intra mode candidates includes intra prediction modes from a left neighboring block relative to the video block.

9. The method of claim 8, wherein the left neighboring block is a middle block.

10. The method of any of claims 1 to 9, wherein the video bitstream processing comprises a compressed representation of the video block that is encoded using codewords that are allocated using a second shape dependency rule.

11. The method of claim 1, wherein the first shape dependency rule specifies a default intra mode used for constructing the list of intra mode candidates.

12. The method of claim 11, wherein the first shape dependency rule specifies that in case that the video block has a width that is greater than M multiples of a height of the video block, where M is an integer greater than or equal to 1, then the default intra mode corresponds to vertical diagonal mode.

13. The method of claim 11, wherein the first shape dependency rule specifies that in case that the video block has a width that is greater than M multiples of a height of the video block, where M is an integer greater than or equal to 1, then modes HOR -/+ k are used as the default intra mode, where k = 1, 2, 3, 4, 5, 6, 7, or 8.

14. The method of claim 11, wherein the first shape dependency rule specifies that that in case that the video block has a height that is greater than N multiples of a width of the video block, where N is an integer greater than or equal to 1, then modes VER -/+ k are inserted in the list of intra mode candidates, where k = 1, 2, 3, 4, 5, 6, 7, or 8.

15. The method of claim 11, wherein the first shape dependency rule specifies that that in case that the video block has a height that is greater than N multiples of a width of the video block, where N is an integer greater than or equal to 1, then the list of infra mode candidates includes a HOR mode before a VER mode.

16. The method of claim 1, wherein the first shape dependency rule specifies an ordering of the list of intra mode candidates that depends on the shape of the video block.

17. The method of claim 1, wherein the first dependency rule specifies to use intra prediction modes closer to a horizonal direction over others closer to a vertical direction in case that a width of the video block is greater than N multiple of a height of the video block, where N is an integer greater than or equal to 1.

18. The method of claim 1, wherein the method further includes reordering the list of intra mode candidates by:

scanning the list of intra mode candidates from a beginning thereof; and

swapping, in case an intra prediction mode entry closer to a vertical direction is found, that entry with a subsequent entry that closer to a horizontal direction.

19. The method of claim 1, wherein the first dependency rule specifies to use intra prediction modes closer to a vertical direction over others closer to the horizonal direction in case that a height of the video block is greater than N multiple of a width of the video block, where N is an integer greater than or equal to 1.

20. The method of claim 1 , wherein the method further includes reordering the list of intra mode candidates by:

scanning the list of intra mode candidates from a beginning thereof; and swapping, in case an intra prediction mode entry closer to a horizontal direction is found, that entry with a subsequent entry that closer to a vertical direction.

21. The method of any of claims 1 to 20, wherein the video block includes a coding unit (CU).

22. The method of any of claims 1 to 21, wherein the shape of the video block is one of a square or a rectangle.

23. The method of any of claims 1 to 21, wherein the shape of the video block corresponds to a ratio of the width and the height.

24. The method of any of claims 1 to 23, wherein the first shape dependency rule selectively applies two different dependency rules based on a coding condition of the video block.

25. The method of claim 24, wherein the coding condition includes whether a number of pixels in the video block or a height of the video block or a width of the video block is greater than or equal to a threshold value.

26. The method of any of claims 1 to 25, wherein the method is applied to one or more of a luma component or a chroma component of the video block.

27. A video decoding apparatus comprising a processor configured to implement a method recited in one or more of claims 1 to 26.

28. A video encoding apparatus comprising a processor configured to implement a method recited in one or more of claims 1 to 26.

29. A computer program product having computer code stored thereon, the code, when executed by a processor, causes the processor to implement a method recited in one or more of claims 1 to 26.