WO2020012449A1

WO2020012449A1 - Shape dependent interpolation order

Info

Publication number: WO2020012449A1
Application number: PCT/IB2019/056000
Authority: WO
Inventors: Hongbin Liu; Li Zhang; Kai Zhang; Yue Wang
Original assignee: Beijing Bytedance Network Technology Co., Ltd.; Bytedance Inc.
Priority date: 2018-07-13
Filing date: 2019-07-15
Publication date: 2020-01-16
Also published as: TW202023276A; WO2020012448A2; CN110719475A; WO2020012448A3; TW202013960A; TWI722486B; CN110719466A; CN110719466B; CN110719475B; TWI704799B

Abstract

The application provides A video processing method, comprising: determining a first prediction mode applied to a first video block; performing a first conversion between the first video block and a coded representation of the first video block by applying a horizontal interpolation and/or a vertical interpolation to the first video block; determining a second prediction mode applied to a second video block; performing a second conversion between the second video block and a coded representation of the second video block by applying a horizontal interpolation and/or a vertical interpolation to the second video block, wherein, based on the determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode, one or both of the horizontal interpolation and the vertical interpolation for first video block use a shorter tap filter compared to that used for the second video block.

Description

SHAPE DEPENDENT INTERPOLATION ORDER

CROSS-REFERENCE TO RELATED APPLICATION

[0001] Under the applicable patent law and/or rules pursuant to the Paris Convention, this application is made to timely claim the priority to and benefits of International Patent Application No. PCT/CN2018/095576, filed on July 13, 2018. The entire disclosures of International Patent Application No. PCT/CN2018/095576, are incorporated by reference as part of the disclosure of this application.

TECHNICAL FIELD

[0002] This patent document relates to video coding techniques, devices and systems.

BACKGROUND

[0003] In spite of the advances in video compression, digital video still accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow.

SUMMARY

[0004] The disclosed techniques may be used by video decoder or encoder embodiments for in which interpolation is improved using a block-shape interpolation order technique.

[0005] In one example aspect, a method of video bitstream processing is disclosed.

The method includes determining a shape of a video block, determining an interpolation order based on the shape of the video block, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation, and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order to reconstruct a decoded representation of the video block.

[0006] In another example aspect, a method of video bitstream processing includes determining characteristics of a motion vector related to a video block, determining an interpolation order based on the characteristics of the motion vector, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation, and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order to reconstruct a decoded representation of the video block.

[0007] In another example aspect, a method for video bitstream processing is disclosed.

The method includes determining a shape of a video block; determining an interpolation order based on the shape of the video block, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation; and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order, to construct an encoded representation of the video block.

[0008] In another example aspect, a method for video bitstream processing is disclosed.

The method includes determining a characteristics of a motion vector related to a video block; determining an interpolation order based on the characteristics of the motion vector, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation; and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order to construct an encoded representation of the video block.

[0009] In one example aspect, a video processing method is disclosed. The method includes: determining a first prediction mode applied to a first video block; performing a first conversion between the first video block and a coded representation of the first video block by applying a horizontal interpolation and/or a vertical interpolation to the first video block, determining a second prediction mode applied to a second video block; performing a second conversion between the second video block and a coded representation of the second video block by applying a horizontal interpolation and/or a vertical interpolation to the second video block, wherein, based on the determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode, one or both of the horizontal interpolation and the vertical interpolation for first video block use a shorter tap filter compared to that used for the second video block

[0010] In another example aspect, a video decoding apparatus that implements a video processing method described herein is disclosed.

[0011] In yet another example aspect, a video encoding apparatus that implements a video processing method described herein is disclosed.

[0012] In yet another representative aspect, the various techniques described herein may be embodied as a computer program product stored on a non-transitory computer readable media. The computer program product includes program code for carrying out the methods described herein.

[0013] In yet another example aspect, an apparatus in a video system is disclosed. The apparatus comprises a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to implement the above-described method.

[0014] The details of one or more implementations are set forth in the accompanying attachments, the drawings, and the description below. Other features will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] FIG. 1 is an illustration of a QUAD TREE BINARY TREE (QTBT) structure

[0016] FIG. 2 shows an example derivation process for merge candidates list construction.

[0017] FIG. 3 shows example positions of spatial merge candidates.

[0018] FIG. 4 shows an example of candidate pairs considered for redundancy check of spatial merge candidates.

[0019] FIG. 5 shows examples of positions for the second prediction unit (PU) of Nx2N and 2NxN partitions.

[0020] FIG. 6 is an illustration of motion vector scaling for temporal merge candidate.

[0021] FIG. 7 shows example candidate positions for temporal merge candidate, CO and C 1.

[0022] FIG. 8 shows an example of combined bi-predictive merge candidate.

[0023] FIG. 9 shows an example of a derivation process for motion vector prediction candidates [0024] FIG. 10 is an illustration of motion vector scaling for spatial motion vector candidate.

[0025] FIG. 11 shows an example of advanced temporal motion vector prediction

(ATMVP) motion prediction for a coding unit (CU).

[0026] FIG. 12 shows an example of one CU with four sub-blocks (A-D) and its neighbouring blocks (a-d).

[0027] FIG. 13 illustrates proposed non-adjacent merge candidates in J0021.

[0028] FIG. 14 illustrates proposed non-adjacent merge candidates in J0058.

[0029] FIG. 15 illustrates proposed non-adjacent merge candidates in J0059.

[0030] FIG. 16 shows an example of integer samples and fractional sample positions for quarter sample luma interpolation.

[0031] FIG. 17 is a block diagram of an example of a video processing apparatus.

[0032] FIG. 18 shows a block diagram of an example implementation of a video encoder.

[0033] FIG. 19 is a flowchart for an example of a video bitstream processing method.

[0034] FIG. 20 is a flowchart for an example of a video bitstream processing method.

[0035] FIG. 21 is a flowchart for an example of a video processing method.

[0036] FIG. 22 is a flowchart for an example of a video bitstream processing method.

[0037] FIG. 23 is a flowchart for an example of a video bitstream processing method.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0038] The present document provides various techniques that can be used by a decoder of video bitstreams to improve the quality of decompressed or decoded digital video. Furthermore, a video encoder may also implement these techniques during the process of encoding in order to reconstruct decoded frames used for further encoding.

[0039] Section headings are used in the present document for ease of understanding and do not limit the embodiments and techniques to the corresponding sections. As such, embodiments from one section can be combined with embodiments from other sections.

[0040] 1. Summary

[0041] This invention is related to video coding technologies. Specifically, it is related to interpolation in video coding. It may be applied to the existing video coding standard like HEVC, or the standard (Versatile Video Coding) to be finalized. It may be also applicable to future video coding standards or video codec.

[0042] 2. Background

[0043] Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. The ITU-T produced H.261 and H.263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly produced the H.262/MPEG-2 Video and H.264/MPEG-4 Advanced Video Coding (AVC) and H.265/HEVC standards. Since H.262, the video coding standards are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized. To explore the future video coding technologies beyond HEVC, Joint Video Exploration Team (JVET) was founded by VCEG and MPEG jointly in 2015. Since then, many new methods have been adopted by JVET and put into the reference software named Joint Exploration Model (JEM). In April 2018, the Joint Video Expert Team (JVET) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) was created to work on the VVC standard targeting at 50% bitrate reduction compared to HEVC.

[0044] FIG. 18 is a block diagram of an example implementation of a video encoder.

[0045] 2.1 Quadtree plus binary tree (QTBT) block structure with larger CTUs

[0046] In HEVC, a CTU is split into CUs by using a quadtree structure denoted as coding tree to adapt to various local characteristics. The decision whether to code a picture area using inter-picture (temporal) or intra-picture (spatial) prediction is made at the CU level. Each CU can be further split into one, two or four PUs according to the PU splitting type. Inside one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU splitting type, a CU can be partitioned into transform units (TUs) according to another quadtree structure similar to the coding tree for the CU. One of key feature of the HEVC structure is that it has the multiple partition conceptions including CU, PU, and TU.

[0047] The QTBT structure removes the concepts of multiple partition types, i.e. it removes the separation of the CU, PU and TU concepts, and supports more flexibility for CU partition shapes. In the QTBT block structure, a CU can have either a square or rectangular shape. As shown in FIG. 1, a coding tree unit (CTU) is first partitioned by a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree structure. There are two splitting types, symmetric horizontal splitting and symmetric vertical splitting, in the binary tree splitting. The binary tree leaf nodes are called coding units (CUs), and that segmentation is used for prediction and transform processing without any further partitioning. This means that the CU, PU and TU have the same block size in the QTBT coding block structure. In the JEM, a CU sometimes consists of coding blocks (CBs) of different colour components, e.g. one CU contains one luma CB and two chroma CBs in the case of P and B slices of the 4:2:0 chroma format and sometimes consists of a CB of a single component, e.g., one CU contains only one luma CB or just two chroma CBs in the case of I slices.

[0048] The following parameters are defined for the QTBT partitioning scheme.

-CTU size: the root node size of a quadtree, the same concept as in HEVC

-MinQTSize: the minimum allowed quadtree leaf node size

-MaxBTSize: the maximum allowed binary tree root node size

-MaxBTDepth: the maximum allowed binary tree depth

-MinBTSize: the minimum allowed binary tree leaf node size

[0049] In one example of the QTBT partitioning structure, the CTU size is set as

128x128 luma samples with two corresponding 64x64 blocks of chroma samples, the MinQTSize is set as 16x16, the MaxBTSize is set as 64x64, the MinBTSize (for both width and height) is set as 4x4, and the MaxBTDepth is set as 4. The quadtree partitioning is applied to the CTU first to generate quadtree leaf nodes. The quadtree leaf nodes may have a size from 16x16 (i.e., the MinQTSize) to 128x128 (i.e., the CTU size). If the leaf quadtree node is 128x128, it will not be further split by the binary tree since the size exceeds the MaxBTSize (i.e., 64x64). Otherwise, the leaf quadtree node could be further partitioned by the binary tree. Therefore, the quadtree leaf node is also the root node for the binary tree and it has the binary tree depth as 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further splitting is considered. When the binary tree node has width equal to MinBTSize (i.e., 4), no further horizontal splitting is considered. Similarly, when the binary tree node has height equal to MinBTSize, no further vertical splitting is considered. The leaf nodes of the binary tree are further processed by prediction and transform processing without any further partitioning. In the JEM, the maximum CTU size is 256x256 luma samples.

[0050] FIG. 1 illustrates an example of block partitioning by using QTBT, and FIG. 1

(right) illustrates the corresponding tree representation. The solid lines indicate quadtree splitting and dotted lines indicate binary tree splitting. In each splitting (i.e., non-leaf) node of the binary tree, one flag is signalled to indicate which splitting type (i.e., horizontal or vertical) is used, where 0 indicates horizontal splitting and 1 indicates vertical splitting. For the quadtree splitting, there is no need to indicate the splitting type since quadtree splitting always splits a block both horizontally and vertically to produce 4 sub-blocks with an equal size.

[0051] In addition, the QTBT scheme supports the ability for the luma and chroma to have a separate QTBT structure. Currently, for P and B slices, the luma and chroma CTBs in one CTU share the same QTBT structure. However, for I slices, the luma CTB is partitioned into CUs by a QTBT structure, and the chroma CTBs are partitioned into chroma CUs by another QTBT stmcture. This means that a CU in an I slice consists of a coding block of the luma component or coding blocks of two chroma components, and a CU in a P or B slice consists of coding blocks of all three colour components.

[0052] In HEVC, inter prediction for small blocks is restricted to reduce the memory access of motion compensation, such that bi-prediction is not supported for 4x8 and 8x4 blocks, and inter prediction is not supported for 4x4 blocks. In the QTBT of the JEM, these restrictions are removed.

[0053] 2.2 Inter prediction in HEVC/H.265

[0054] Each inter-predicted PU has motion parameters for one or two reference picture lists. Motion parameters include a motion vector and a reference picture index. Usage of one of the two reference picture lists may also be signalled using inter_pred_idc. Motion vectors may be explicitly coded as deltas relative to predictors.

[0055] When a CU is coded with skip mode, one PU is associated with the CU, and there are no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current PU are obtained from neighbouring PUs, including spatial and temporal candidates. The merge mode can be applied to any inter-predicted PU, not only for skip mode. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector (to be more precise, motion vector difference compared to a motion vector predictor), corresponding reference picture index for each reference picture list and reference picture list usage are signalled explicitly per each PU. Such mode is named Advanced motion vector prediction (AM VP) in this disclosure.

[0056] When signalling indicates that one of the two reference picture lists is to be used, the PU is produced from one block of samples. This is referred to as‘uni-prediction’. Uni prediction is available both for P-slices and B-slices.

[0057] When signalling indicates that both of the reference picture lists are to be used, the PU is produced from two blocks of samples. This is referred to as‘bi-prediction’. Bi prediction is available for B-slices only.

[0058] The following text provides the details on the inter prediction modes specified in HEVC. The description will start with the merge mode.

[0059] 2.2.1 Merge Mode

[0060] 2.2.1.1 Derivation of candidates for merge mode

[0061] When a PU is predicted using merge mode, an index pointing to an entry in the merge candidates list is parsed from the bitstream and used to retrieve the motion information. The construction of this list is specified in the HEVC standard and can be summarized according to the following sequence of steps:

• Step 1: Initial candidates derivation

o Step 1.1: Spatial candidates derivation

o Step 1.2: Redundancy check for spatial candidates

o Step 1.3: Temporal candidates derivation

• Step 2: Additional candidates insertion

o Step 2.1: Creation of bi-predictive candidates

o Step 2.2: Insertion of zero motion candidates

[0062] These steps are also schematically depicted in FIG. 2. For spatial merge candidate derivation, a maximum of four merge candidates are selected among candidates that are located in five different positions. For temporal merge candidate derivation, a maximum of one merge candidate is selected among two candidates. Since constant number of candidates for each PU is assumed at decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of merge candidate (MaxNumMergeCand) which is signalled in slice header. Since the number of candidates is constant, index of best merge candidate is encoded using truncated unary binarization (TU). If the size of CU is equal to 8, all the PUs of the current CU share a single merge candidate list, which is identical to the merge candidate list of the 2Nx2N prediction unit.

[0063] In the following, the operations associated with the aforementioned steps are detailed.

[0064] 2.2.1.2 Spatial candidates derivation

[0065] In the derivation of spatial merge candidates, a maximum of four merge candidates are selected among candidates located in the positions depicted in FIG. 3. The order of derivation is Al , B 1 , B0, A0 and B2. Position B2 is considered only when any PU of position Al, Bl, B0, A0 is not available (e.g. because it belongs to another slice or tile) or is intra coded. After candidate at position Al is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead only the pairs linked with an arrow in FIG. 4 are considered and a candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information. Another source of duplicate motion information is the“second PU” associated with partitions different from 2Nx2N. As an example, FIG. 5 depicts the second PU for the case of Nx2N and 2NxN, respectively. When the current PU is partitioned as Nx2N, candidate at position Al is not considered for list construction. In fact, by adding this candidate will lead to two prediction units having the same motion information, which is redundant to just have one PU in a coding unit. Similarly, position B 1 is not considered when the current PU is partitioned as 2NxN.

[0066] 2.2.1.3 Temporal candidates derivation

[0067] In this step, only one candidate is added to the hst. Particularly, in the derivation of this temporal merge candidate, a scaled motion vector is derived based on co-located PU belonging to the picture which has the smallest POC difference with current picture within the given reference picture list. The reference picture list to be used for derivation of the co-located PU is explicitly signalled in the slice header. The scaled motion vector for temporal merge candidate is obtained as illustrated by the dashed line in FIG. 6, which is scaled from the motion vector of the co-located PU using the POC distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero. A practical realization of the scaling process is described in the HEVC specification. For a B-slice, two motion vectors, one is for reference picture list 0 and the other is for reference picture list 1, are obtained and combined to make the bi-predictive merge candidate.

[0068] FIG. 6 is an illustration of motion vector scaling for temporal merge candidate.

[0069] In the co-located PU (Y) belonging to the reference frame, the position for the temporal candidate is selected between candidates CO and Cl, as depicted in FIG. 7. If PU at position CO is not available, is intra coded, or is outside of the current CTU row, position Cl is used. Otherwise, position CO is used in the derivation of the temporal merge candidate.

[0070] 2.2.1.4 Additional candidates insertion

[0071] Besides spatial and temporal merge candidates, there are two additional types of merge candidates: combined bi-predictive merge candidate and zero merge candidate. Combined bi-predictive merge candidates are generated by utilizing spatial and temporal merge candidates. Combined bi-predictive merge candidate is used for B-Slice only. The combined bi-predictive candidates are generated by combining the first reference picture list motion parameters of an initial candidate with the second reference picture list motion parameters of another. If these two tuples provide different motion hypotheses, they will form a new bi- predictive candidate. As an example, FIG. 8 depicts the case when two candidates in the original list (on the left), which have mvLO and refldxLO or mvLl and refldxLl, are used to create a combined bi-predictive merge candidate added to the final list (on the right). There are numerous mles regarding the combinations which are considered to generate these additional merge candidates, defined in the related art.

[0072] Zero motion candidates are inserted to fill the remaining entries in the merge candidates list and therefore hit the MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index which starts from zero and increases every time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is one and two for uni and bi-directional prediction, respectively. Finally, no redundancy check is performed on these candidates.

[0073] 2.2.1.5 Motion estimation regions for parallel processing

[0074] To speed up the encoding process, motion estimation can be performed in parallel whereby the motion vectors for all prediction units inside a given region are derived simultaneously. The derivation of merge candidates from spatial neighbourhood may interfere with parallel processing as one prediction unit cannot derive the motion parameters from an adjacent PU until its associated motion estimation is completed. To mitigate the trade-off between coding efficiency and processing latency, HEVC defines the motion estimation region (MER) whose size is signalled in the picture parameter set using the “log2_parallel_merge_level_minus2” syntax element. When a MER is defined, merge candidates falling in the same region are marked as unavailable and therefore not considered in the list construction.

[0075] 2.2.2 AMVP

[0076] AMVP exploits spatio-temporal correlation of motion vector with neighbouring

PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by firstly checking availability of left, above temporally neighbouring PU positions, removing redundant candidates and adding zero vector to make the candidate list to be constant length. Then, the encoder can select the best predictor from the candidate list and transmit the corresponding index indicating the chosen candidate. Similarly with merge index signalling, the index of the best motion vector candidate is encoded using truncated unary. The maximum value to be encoded in this case is 2 (see FIG. 9). In the following sections, details about derivation process of motion vector prediction candidate are provided.

[0077] 2.2.2.1 Derivation of AMVP candidates

[0078] FIG. 9 summarizes derivation process for motion vector prediction candidate.

[0079] In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidate and temporal motion vector candidate. For spatial motion vector candidate derivation, two motion vector candidates are eventually derived based on motion vectors of each PU located in five different positions as depicted in FIG. 3. [0080] For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatio-temporal candidates is made, duplicated motion vector candidates in the list are removed. If the number of potential candidates is larger than two, motion vector candidates whose reference picture index within the associated reference picture list is larger than 1 are removed from the list. If the number of spatio-temporal motion vector candidates is smaller than two, additional zero motion vector candidates is added to the list.

[0081] 2.2.2.1 Spatial motion vector candidates

[0082] In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates, which are derived fromPUs located in positions as depicted in FIG. 3, those positions being the same as those of motion merge. The order of derivation for the left side of the current PU is defined as A0, Al, and scaled A0, scaled Al. The order of derivation for the above side of the current PU is defined as B0, Bl, B2, scaled B0, scaled B l, scaled B2. For each side there are therefore four cases that can be used as motion vector candidate, with two cases not required to use spatial scaling, and two cases where spatial scaling is used. The four different cases are summarized as follows.

• No spatial scaling

— (1) Same reference picture list, and same reference picture index (same POC)

— (2) Different reference picture list, but same reference picture (same POC)

• Spatial scaling

— (3) Same reference picture list, but different reference picture (different POC)

— (4) Different reference picture list, and different reference picture (different POC)

[0083] The no-spatial-scaling cases are checked first followed by the spatial scaling.

Spatial scaling is considered when the POC is different between the reference picture of the neighbouring PU and that of the current PU regardless of reference picture list. If all PUs of left candidates are not available or are intra coded, scaling for the above motion vector is allowed to help parallel derivation of left and above MV candidates. Otherwise, spatial scaling is not allowed for the above motion vector.

[0084] FIG. 10 is an illustration of motion vector scaling for spatial motion vector candidate.

[0085] In a spatial scaling process, the motion vector of the neighbouring PU is scaled in a similar manner as for temporal scaling, as depicted as FIG. 10. The main difference is that the reference picture list and index of current PU is given as input; the actual scaling process is the same as that of temporal scaling.

[0086] 2.2.2.3 Temporal motion vector candidates

[0087] Apart for the reference picture index derivation, all processes for the derivation of temporal merge candidates are the same as for the derivation of spatial motion vector candidates (see FIG. 7). The reference picture index is signalled to the decoder.

[0088] 2.3 New inter merge candidates in JEM

[0089] 2.3.1 Sub-CU based motion vector prediction

[0090] In the JEM with QTBT, each CU can have at most one set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by splitting a large CU into sub-CUs and deriving motion information for all the sub-CUs of the large CU. Alternative temporal motion vector prediction (ATMVP) method allows each CU to fetch multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In spatial-temporal motion vector prediction (STMVP) method motion vectors of the sub-CUs are derived recursively by using the temporal motion vector predictor and spatial neighbouring motion vector.

[0091] To preserve more accurate motion field for sub-CU motion prediction, the motion compression for the reference frames is currently disabled.

[0092] 2.3.1.1 Alternative temporal motion vector prediction

[0093] In the alternative temporal motion vector prediction (ATMVP) method, the motion vectors temporal motion vector prediction (TMVP) is modified by fetching multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. As shown in FIG. 11, the sub-CUs are square NxN blocks (N is set to 4 by default).

[0094] ATMVP predicts the motion vectors of the sub-CUs within a CU in two steps.

The first step is to identify the corresponding block in a reference picture with a so-called temporal vector. The reference picture is called the motion source picture. The second step is to split the current CU into sub-CUs and obtain the motion vectors as well as the reference indices of each sub-CU from the block corresponding to each sub-CU, as shown in FIG. 11.

[0095] In the first step, a reference picture and the corresponding block is determined by the motion information of the spatial neighbouring blocks of the current CU. To avoid the repetitive scanning process of neighbouring blocks, the first merge candidate in the merge candidate list of the current CU is used. The first available motion vector as well as its associated reference index are set to be the temporal vector and the index to the motion source picture. This way, in ATMVP, the corresponding block may be more accurately identified, compared with TMVP, wherein the corresponding block (sometimes called collocated block) is always in a bottom-right or center position relative to the current CU.

[0096] In the second step, a corresponding block of the sub-CU is identified by the temporal vector in the motion source picture, by adding to the coordinate of the current CU the temporal vector. For each sub-CU, the motion information of its corresponding block (the smallest motion grid that covers the center sample) is used to derive the motion information for the sub-CU. After the motion information of a corresponding NxN block is identified, it is converted to the motion vectors and reference indices of the current sub-CU, in the same way as TMVP of HEVC, wherein motion scaling and other procedures apply. For example, the decoder checks whether the low-delay condition (i.e. the POCs of all reference pictures of the current picture are smaller than the POC of the current picture) is fulfilled and possibly uses motion vector MVx (the motion vector corresponding to reference picture list X) to predict motion vector MVy (with X being equal to 0 or 1 and Y being equal to l-X) for each sub-CU.

[0097] 2.3.1.2 Spatial-temporal motion vector prediction

[0098] In this method, the motion vectors of the sub-CUs are derived recursively, following raster scan order. FIG. 12 illustrates this concept. Let us consider an 8x8 CU which contains four 4x4 sub-CUs A, B, C, and D. The neighbouring 4x4 blocks in the current frame are labelled as a, b, c, and d.

[0099] The motion derivation for sub-CU A starts by identifying its two spatial neighbours. The first neighbour is the NxN block above sub-CU A (block c). If this block c is not available or is intra coded the other NxN blocks above sub-CU A are checked (from left to right, starting at block c). The second neighbour is a block to the left of the sub-CU A (block b). If block b is not available or is intra coded other blocks to the left of sub-CU A are checked (from top to bottom, staring at block b). The motion information obtained from the neighbouring blocks for each list is scaled to the first reference frame for a given list. Next, temporal motion vector predictor (TMVP) of sub-block A is derived by following the same procedure of TMVP derivation as specified in HEVC. The motion information of the collocated block at location D is fetched and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged separately for each reference list. The averaged motion vector is assigned as the motion vector of the current sub- CU.

[00100] 2.3.1.3 Sub-CU motion prediction mode signalling

[00101] The sub-CU modes are enabled as additional merge candidates and there is no additional syntax element required to signal the modes. Two additional merge candidates are added to merge candidates list of each CU to represent the ATMVP mode and STMVP mode. Up to seven merge candidates are used, if the sequence parameter set indicates that ATMVP and STMVP are enabled. The encoding logic of the additional merge candidates is the same as for the merge candidates in the HM, which means, for each CU in P or B slice, two more RD checks is needed for the two additional merge candidates.

[00102] In the JEM, all bins of merge index is context coded by CABAC. While in

HEVC, only the first bin is context coded and the remaining bins are context by-pass coded.

[00103] 2.3.2 Non-adjacent merge candidates

[00104] In J0021, Qualcomm proposes to derive additional spatial merge candidates from non-adjacent neighboring positions which are marked as 6 to 49 as in FIG. 13. The derived candidates are added after TMVP candidates in the merge candidate list.

[00105] In J0058, Tencent proposes to derive additional spatial merge candidates from positions in an outer reference area which has an offset of (-96, -96) to the current block.

[00106] As shown in FIG. 14, the positions are marked as A(i,j), B(i,j), C(i,j), D(i,j) and E(i,j). Each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidates. Each candidate A (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous A or D candidates. Each E (i, j) has an offset of 16 in both horizontal direction and vertical direction compared to its previous E candidates. The candidates are checked from inside to the outside. And the order of the candidates is A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). To further study whether the number of merge candidates can be further reduced. The candidates are added after TMVP candidates in the merge candidate list.

[00107] In J0059, the extended spatial positions from 6 to 27 as in FIG. 15 are checked according to their numerical order after the temporal candidate. To save the MV line buffer, all the spatial candidates are restricted within two CTU lines.

[00108] 2.4 Intra prediction in JEM

[00109] 2.4.1 Intra mode coding with 67 intra prediction modes

[00110] For the luma interpolation filtering, an 8-tap separable DCT-based interpolation filter is used for 2/4 precision samples and a 7-tap separable DCT-based interpolation filter is used for 1/4 precisions samples, as shown in Table 1.

Table 1 : 8-tap DCT-IF coefficients for 1 / 4th luma interpolation.

[00111] Similarly, a 4-tap separable DCT-based interpolation filter is used for the chroma interpolation filter, as shown in Table 2.

Table 2: 4-tap DCT-IF coefficients for 1/ 8th chroma interpolation.

[00112] For the vertical interpolation for 4:2:2 and the horizontal and vertical interpolation for 4:4:4 chroma channels, the odd positions in Table 2 are not used, resulting in l/4th chroma interpolation.

[00113] For the bi-directional prediction, the bit-depth of the output of the interpolation filter is maintained to 14-bit accuracy, regardless of the source bit-depth, before the averaging of the two prediction signals. The actual averaging process is done implicitly with the bit-depth reduction process as:

predSamples[ x, y ]= predSamplesL0[ x, y ] + predS ample sLl[ x, y ] + offset ) >> shift where shift = ( 15 - BitDepth ) and offset = 1 << ( shift - 1 )

[00114] If both horizonal component and vertical component of a motion vector point to sub-pixel positions, horizonal interpolation is always performed firstly, and then the vertical interpolation is performed. For example, to interpolate the subpixel j0,0 shown in FIG. 16, first, b0,k (k = -3, -2, ... 3) is interpolated according to equation 2-1, then j0,0 is interpolated according to equation 2-2. Here, shiftl = Min( 4, BitDepthY - 8 ), and shift2 = 6.

[00115] b0,k = ( -A-3,k + 4 * A-2,k - 11 * A-l,k + 40 * A0,k + 40 * Al,k - 11 * A2,k + 4 * A3,k A4,k ) » shiftl (2-1)

[00116] j0,0 = ( -b0 -3 + 4 * bO -2 - 11 * bO -1 + 40 * b0,0 + 40 * b0,l - 11 * b0,2 +

4 * b0,3 - b0,4 ) > > shift2 (2-2)

[00117] Alternatively, we can first perform vertical interpolation and then perform horizonal interpolation. In this case, to interpolation j0,0, first, hk,0 (k = -3, -2, ... 3) is interpolated according to equation 2-3, then, j0,0 is interpolated according to equation 2-4. When BitDepthY is smaller than or equal to 8, shift 1 is 0, nothing is lost in the first interpolation stage, therefore, the final interpolation result is not changed by the interpolation order. However, when BitDepthY is greater than 8, shiftl is greater than 0. In this case, the final interpolation result can be different when different interpolation orders are applied.

[00118] hk,0 = ( -Ak -3 + 4 * Ak -2 - 11 * Ak -1 + 40 * Ak,0 + 40 * Ak,l - 11 * Ak,2 + 4 * Ak,3 - Ak,4 ) >> shiftl (2-3)

[00119] j0,0 = ( -h-3,0 + 4 * h-2,0 - 11 * h-1,0 + 40 * h0,0 + 40 * hl,0 - 11 * h2,0 +

4 * h3,0 - h4,0 ) > > shift2 (2-4)

[00120] 3. Examples of Problems solved by embodiments

[00121] For luma block size WxH, if we always perform horizonal interpolation firstly, the required interpolation (per pixel) is shown in Table 3.

Table 3: interpolation required for WxH luma component by HEVCZ JEM

[00122] On the other hand, if we perform vertical interpolation firstly, the required interpolation is shown in Table 4. Apparently, the optimal interpolation order is the one which requires smaller interpolation times between Table 3 and Table 4.

Table 4: interpolation required for WxH luma component when the interpolation order is reversed.

[00123] For chroma component, if we always perform horizonal interpolation firstly, the required interpolation is ((H + 3) x W + W x H) / (W x H) = 2 + 3 / H. if we always perform vertical interpolation firstly, the required interpolation is ((W + 3) x H + W H) / (W x H) = 2 + 3 / W.

[00124] As mentioned above, different interpolation order can lead to different interpolation result when bitdepth of the input video is greater than 8. Therefore, the interpolation order shall be defined implicitly in both encoder and decoder.

[00125] 4. Examples of embodiments

[00126] To tackle the problems, and provide other benefits, we propose shape dependent interpolation order.

[00127] The detailed examples below should be considered as examples to explain general concepts. These inventions should not be interpreted in a narrow way. Furthermore, these inventions can be combined in any manner.

1. It is proposed that the interpolation order depends on the current coding block shape (e.g., the coding block is a CU).

a. In one example, for block (such as CU, PU or sub-block used in sub-block based prediction like affine, ATMVP or BIO) with width > height, vertical interpolation is firstly performed, and then horizonal interpolation is performed, e.g., pixels d_k,o, li_k.o and ii_k.o are firstly interpolated and eo_,o to ro_,o are then interpolated. An example of jo_,o is shown in equation 2-3 and 2-4.

i. Alternatively, for a block (such as CU, PU or sub-block used in sub block based prediction like affine, ATMVP or BIO) with width >= height, vertical interpolation is firstly performed, and then horizonal interpolation is performed

b. In one example, for a block (such as CU, PU or sub-block used in sub-block based prediction like affine, ATMVP or BIO) with width <= height, horizonal interpolation is firstly performed, and then vertical interpolation is performed. i. Alternatively, for a block (such as CU, PU or sub-block used in sub block based prediction like affine, ATMVP or BIO) with width < height, horizonal interpolation is firstly performed, and then vertical interpolation is performed

c. In one example, both the luma component and the chroma components follow the same interpolation order.

d. Alternatively, when one chroma coding block corresponds to multiple luma coding blocks (e.g., for 4:2:0 color format, one chroma 4x4 block may correspond to two 8x4 or 4x8 luma blocks), luma and chroma may use different interpolation orders. e. In one example, when different interpolation orders are utilized, the scaling factors in the multiple stages (i.e., shift 1 and shift2) may be further changed accordingly.

2. Alternatively, in addition, it is proposed that the interpolation order of luma component can further depend on the MV.

a. In one example, if the vertical MV component points to a quarter-pel position and the horizonal MV component points to a half-pel position, horizonal interpolation is firstly performed, and then vertical interpolation is performed. b. In one example, if the vertical MV component points to a half-pel position and the horizonal MV component points to a quarter-pel position, vertical interpolation is firstly performed, and then horizonal interpolation is performed. c. In one example, the proposed methods are only applied to square coding blocks.

3. The proposed methods may be applied to certain modes, block sizes/shapes, and/or certain sub-block sizes.

a. The proposed methods may be applied to certain modes, such as bi-predicted mode.

b. The proposed methods may be applied to certain block sizes.

i. In one example, it is only applied to a block with wxh<=T, where w and h are the width and height of the current block.

ii. In one example, it is only applied to a block with h <=T.

c. The proposed methods may be applied to certain color component (such as only luma component).

4. It is proposed that, when multi-hypothesis prediction is applied to one block, short-tap or different interpolation filters may be applied compared to those filters applied to normal prediction mode.

a. In one example, bilinear filter may be used.

b. short-tap or a second interpolation filter may be applied to a reference picture list which involves multiple reference blocks while for another reference picture with only one reference block, the same filter as that used for normal prediction mode may be applied.

c. The proposed method may be applied under certain conditions, such as certain temporal layer(s), quantization parameters of a block/a tile/a slice/a picture containing the block is within a range (such as larger than a threshold).

[00128] FIG. 17 is a block diagram of a video processing apparatus 1700. The apparatus

1700 may be used to implement one or more of the methods described herein. The apparatus

1700 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, and so on. The apparatus 1700 may include one or more processors 1702, one or more memories 1704 and video processing hardware 1706. Theprocessor(s) 1702 may be configured to implement one or more methods described in the present document. The memory (memories) 1704 may be used for storing data and code used for implementing the methods and techniques described herein. The video processing hardware 1706 may be used to implement, in hardware circuitry, some techniques described in the present document.

[00129] FIG. 19 is a flowchart for a method 1900 of video bitstream processing. The method 1900 includes determining (1905) a shape of a video block, determining (1910) an interpolation order based on the video block, the interpolation order being indicative of a sequence of performing horizontal interpolation and vertical interpolation, and performing the horizontal interpolation and the vertical interpolation in accordance with the interpolation order for the video block to reconstruct (1915) a decoded representation of the video block.

[00130] FIG. 20 is a flowchart for a method 2000 of video bitstream processing. The method 2000 includes determining (2005) characteristics of a motion vector related to a video block, determining (2010) an interpolation order of the video block based on the characteristics of the motion vector, the interpolation order being indicative of a sequence of performing horizontal interpolation and vertical interpolation, and performing the horizontal interpolation and the vertical interpolation in accordance with the interpolation order for the video block to reconstruct (2015) a decoded representation of the video block.

[00131] With reference to methods 1900 and 2000, some examples of sequences of performing horizontal interpolation and vertical interpolation and their use are described in Section 4 of the present document. For example, as described in Section 4, under different shapes of the video block, a preference may be given to performing one of the horizontal interpolation or vertical interpolation first. In some embodiments, the horizontal interpolation is performed before the vertical interpolation, and in some embodiments the vertical interpolation is performed before the horizontal interpolation.

[00132] With reference to methods 1900 and 2000, the video block may be encoded in the video bitstream in which bit efficiency may be achieved by using a bitstream generation rule related to interpolation orders that also depends on the shape of the video block.

[00133] It will be appreciated that the disclosed techniques may be embodied in video encoders or decoders to improve compression efficiency when the coding units being compressed have shaped that are significantly different than the traditional square shaped blocks or rectangular blocks that are half- square shaped. For example, new coding tools that use long or tall coding units such as 4x32 or 32x4 sized units may benefit from the disclosed techniques.

[00134] FIG. 21 is a flowchart for an example of a video processing method 2100. The method 2100 includes determining (2102) a first prediction mode applied to a first video block; performing (2104) a first conversion between the first video block and a coded representation of the first video block by applying a horizontal interpolation and/or a vertical interpolation to the first video block; determining (2106) a second prediction mode applied to a second video block; performing (2108) a second conversion between the second video block and a coded representation of the second video block by applying a horizontal interpolation and/or a vertical interpolation to the second video block, wherein, based on the determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode, one or both of the horizontal interpolation and the vertical interpolation for first video block use a shorter tap filter compared to that used for the second video block.

[00135] FIG. 22 is a flowchart for a method 2200 of video bitstream processing. The method includes: determining (2205) a shape of a video block, determining (2210) an interpolation order based on the shape of the video block, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation, and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order, to construct (2215) an encoded representation of the video block.

[00136] FIG. 23 is a flowchart for a method 2300 of video bitstream processing. The method includes: determining (2305) a characteristics of a motion vector related to a video block, determining (2310) an interpolation order based on the characteristics of the motion vector, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation, and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order to construct (2315) an encoded representation of the video block.

[00137] Various embodiments and techniques disclosed in the present document can be described in the following listing of examples. [00138] 1. A video processing method, comprising: determining a first prediction mode applied to a first video block; performing a first conversion between the first video block and a coded representation of the first video block by applying a horizontal interpolation and/or a vertical interpolation to the first video block; determining a second prediction mode applied to a second video block; performing a second conversion between the second video block and a coded representation of the second video block by applying a horizontal interpolation and/or a vertical interpolation to the second video block, wherein, based on the determination that the first prediction mode is a multi- hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode, one or both of the horizontal interpolation and the vertical interpolation for first video block use a shorter tap filter compared to that used for the second video block.

[00139] 2. The method of example 1, wherein the first video block is converted with more than 2 reference blocks for bi-prediction and at least for one reference picture list, it uses at least 2 reference blocks.

[00140] 3. The method of example 1, wherein the first video block is converted with more than 1 reference blocks for uni-prediction.

[00141] 4. The method of anyone of examples 1-3, wherein the shorter tap filter is a bilinear filter.

[00142] 5. The method of anyone of examples 1-3, wherein the one or both of the horizontal interpolation and the vertical interpolation use the shorter tap filter to a reference picture list related to multiple reference blocks.

[00143] 6. The method of anyone of examples 1-5, wherein the one or both of the horizontal interpolation or the vertical interpolation use a same filter as used for a normal prediction mode when a reference picture list is related to a single reference block.

[00144] 7. The method of anyone of examples 1-6, wherein the method is applied based on a determination of one or more of: use of temporal layers, or quantization parameters of one or more of a block, a tile, a slice, or a picture containing the video block being within a threshold range.

[00145] 8. The method of anyone of example 7, wherein quantization parameters being within a threshold range comprises the quantization parameters being larger than a threshold. [00146] 9. The method of example 6, wherein the normal prediction mode comprises a uni-prediction using inter prediction with at most one motion vector and one reference index to predict sample values of a sample in a block or a bi-prediction inter mode using inter prediction with at most two motion vectors and reference indices to predict sample values of a sample in a block.

[00147] 10. A video decoding apparatus comprising a processor configured to implement a method recited in one or more of examples 1 to 9.

[00148] 11. A video encoding apparatus comprising a processor configured to implement a method recited in one or more of examples 1 to 9.

[00149] 12. A computer-readable program medium having code stored thereupon, the code comprising instructions that, when executed by a processor, causing the processor to implement a method recited in one or more of examples 1 to 9.

[00150] 13. A method for video bitstream processing, comprising: determining a shape of a video block; determining an interpolation order based on the shape of the video block, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation; and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order, to reconstruct a decoded representation of the video block.

[00151] 14. The method of example 13, wherein the shape of the video block is represented by a width and a height of the video block, and the step of determining the interpolation order further comprising: determining that the vertical interpolation is to be performed before the horizontal interpolation as the interpolation order, when the width of the video block is larger than the height of the video block.

[00152] 15. The method of example 13, wherein the shape of the video block is represented by a width and a height, and the step of determining the interpolation order further comprising: determining that the vertical interpolation is to be performed before the horizontal interpolation as the interpolation order, when the width of the video block is larger than or equal to the height of the video block.

[00153] 16. The method of example 13, wherein the shape of the video block is represented by a width and a height, and the step of determining the interpolation order further comprising: determining that the horizontal interpolation is to be performed before the vertical interpolation as the interpolation order, when the height of the video block is larger than or equal to the width of the video block.

[00154] 17. The method of example 13, wherein the shape of the video block is represented by a width and a height, and the step of determining the interpolation order further comprising: determining that the horizontal interpolation is to be performed before the vertical interpolation as the interpolation order, when the height of the video block is larger than the width of the video block.

[00155] 18. The method of example 13, wherein a luminance component and a chrominance component of the video block are both interpolated, based on the interpolation order or based on different interpolation orders.

[00156] 19. The method of example 13, wherein a luminance component and a chrominance component of the video block are interpolated using different interpolation orders, when each chrominance block for the chrominance component corresponds to multiple luminance blocks for the luminance component.

[00157] 20. The method of example 13, wherein a luminance component and a chrominance component of the video block are interpolated using different interpolation orders, and wherein scaling factors used in the horizontal interpolation and vertical interpolation are different for the luminance component and the chrominance component.

[00158] 21. A method for video bitstream processing, comprising: determining a characteristics of a motion vector related to a video block; determining an interpolation order based on the characteristics of the motion vector, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation; and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order to reconstruct a decoded representation of the video block.

[00159] 22. The method of example 21, wherein the characteristics of the motion vector are represented by a quarter-per position and a half-per position to which the motion vector points, the motion vector includes a vertical component and a horizontal component, and determining the interpolation order includes: determining that the horizontal interpolation is to be performed before the vertical interpolation as the interpolation order, when the vertical component points to the quarter-pel position and the horizontal component points to the half- pel position.

[00160] 23. The method of example 21, wherein the characteristics of the motion vector are represented by a quarter-per position and a half-per position to which the motion vector points, the motion vector includes a vertical component and a horizontal component, and determining the interpolation order includes: determining that the vertical interpolation is to be performed before the horizontal interpolation when the vertical component points to the half- pel position and the horizontal component points to the quarter-pel position.

[00161] 24. The method of any of examples 21-23, wherein a shape of the video block is a square.

[00162] 25. The method of any of examples 13-24, wherein the method is applied to a bi-predicted mode.

[00163] 26. The method of any of examples 13-25, wherein the method is applied when a height of the video block multiplied by a width of the video block is less than or equal to Tl, Tl being a first threshold.

[00164] 27. The method of any of examples 13-25, wherein the method is applied when the video block has a height that is less than or equal to T2, T2 being a second threshold.

[00165] 28. The method of any of examples 13-25, wherein the method is applied to a luminance component of the video block.

[00166] 29. A method for video bitstream processing, comprising:

[00167] determining a shape of a video block;

[00168] determining an interpolation order based on the shape of the video block, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation; and

[00169] performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order, to construct an encoded representation of the video block.

[00170] 30. A method for video bitstream processing, comprising: determining a characteristics of a motion vector related to a video block; determining an interpolation order based on the characteristics of the motion vector, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation; and performing the horizontal interpolation and the vertical interpolation for the video block in the sequence indicated by the interpolation order to construct an encoded representation of the video block.

[00171] 31. A video decoding apparatus comprising a processor configured to implement a method recited in one or more of examples 13 to 28.

[00172] 32. A video encoding apparatus comprising a processor configured to implement a method recited in example 29 or 30.

[00173] 33. A computer program product having computer code stored thereon, the code, when executed by a processor, causes the processor to implement a method recited in any of examples 13 to 30.

[00174] 34. An apparatus in a video system comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to implement the method recited in any one of examples 13 to 30.

[00175] From the foregoing, it will be appreciated that specific embodiments of the presently disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the presently disclosed technology is not limited except as by the appended claims.

[00176] Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term“data processing unit” or“data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

[00177] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[00178] The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

[00179] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[00180] It is intended that the specification, together with the drawings, be considered exemplary only, where exemplary means an example. As used herein, the singular forms“a”, “an” and“the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Additionally, the use of“or” is intended to include“and/or”, unless the context clearly indicates otherwise.

[00181] While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

[00182] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

[00183] Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A video processing method, comprising:

determining a first prediction mode applied to a first video block;

performing a first conversion between the first video block and a coded representation of the first video block by applying a horizontal interpolation and/or a vertical interpolation to the first video block;

determining a second prediction mode applied to a second video block;

performing a second conversion between the second video block and a coded representation of the second video block by applying a horizontal interpolation and/or a vertical interpolation to the second video block,

wherein, based on the determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode, one or both of the horizontal interpolation and the vertical interpolation for first video block use a shorter tap filter compared to that used for the second video block.

2. The method of claim 1, wherein the first video block is converted with more than 2 reference blocks for bi-prediction and at least for one reference picture list, it uses at least 2 reference blocks.

3. The method of claim 1, wherein the first video block is converted with more than 1 reference blocks for uni-prediction.

4. The method of anyone of claims 1-3, wherein the shorter tap filter is a bilinear filter.

5. The method of anyone of claims 1 -3 , wherein the one or both of the horizontal interpolation and the vertical interpolation use the shorter tap filter to a reference picture list related to multiple reference blocks.

6. The method of anyone of claims 1 -5 , wherein the one or both of the horizontal interpolation or the vertical interpolation use a same filter as used for a normal prediction mode when a reference picture list is related to a single reference block.

7. The method of anyone of claims 1-6, wherein the method is applied based on a determination of one or more of: use of temporal layers, or quantization parameters of one or more of a block, a tile, a slice, or a picture containing the video block being within a threshold range.

8. The method of anyone of claim 7, wherein quantization parameters being within a threshold range comprises the quantization parameters being larger than a threshold.

9. The method of claim 6, wherein the normal prediction mode comprises a uni-prediction using inter prediction with at most one motion vector and one reference index to predict sample values of a sample in a block or a bi-prediction inter mode using inter prediction with at most two motion vectors and reference indices to predict sample values of a sample in a block.

10. A video decoding apparatus comprising a processor configured to implement a method recited in one or more of claims 1 to 9.

11. A video encoding apparatus comprising a processor configured to implement a method recited in one or more of claims 1 to 9.

12. A computer-readable program medium having code stored thereupon, the code comprising instructions that, when executed by a processor, causing the processor to implement a method recited in one or more of claims 1 to 9.