CN113302938A

CN113302938A - Integer MV motion compensation

Info

Publication number: CN113302938A
Application number: CN202080008723.XA
Authority: CN
Inventors: 刘鸿彬; 张莉; 张凯; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-01-11
Filing date: 2020-01-13
Publication date: 2021-08-24
Anticipated expiration: 2040-01-13
Also published as: WO2020143830A1; CN113302938B

Abstract

Integer MV motion compensation is described. One example method includes: determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; performing a rounding process on a Motion Vector (MV) of the first block based on a characteristic of the first block; and performing the conversion by using the rounded MV.

Description

Integer MV motion compensation

Cross Reference to Related Applications

According to patent laws and/or regulations applicable to the paris convention, the present application aims to timely claim the priority and benefit of international patent application No. pct/CN2019/071396 filed on day 11 in month 1 in 2019, international patent application No. pct/CN2019/071503 filed on day 12 in month 1 in 2019, and international patent application No. pct/CN2019/077171 filed on day 6 in month 3 in 2019. The entire disclosures of international patent applications No. pct/CN2019/071396, No. pct/CN2019/071503 and No. pct/CN2019/077171 are incorporated by reference as part of the disclosure of the present application.

Technical Field

This document relates to video coding and decoding techniques.

Background

Digital video accounts for the largest bandwidth usage on the internet and other digital communication networks. As the number of networked user devices capable of receiving and displaying video increases, the bandwidth requirements for pre-counting digital video usage will continue to grow.

Disclosure of Invention

The disclosed techniques may be used by video decoder or encoder embodiments in which block-shaped interpolation order techniques are used to improve interpolation.

In one example aspect, a method of video bitstream processing is disclosed. The method comprises the following steps: determining a shape of a first video block; determining an interpolation order based on a shape of the first video block, the interpolation order indicating an order in which horizontal interpolation and vertical interpolation are performed; and performing horizontal interpolation and vertical interpolation on the first video block in order according to the interpolation order to reconstruct a decoded representation of the first video block.

In another example aspect, a method of video bitstream processing, comprises: determining a characteristic of a motion vector associated with the first video block; determining an interpolation order indicating an order in which horizontal interpolation and vertical interpolation are performed based on a characteristic of the motion vector; and performing horizontal interpolation and vertical interpolation on the first video block in order according to the interpolation order to reconstruct a decoded representation of the first video block.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, a size characteristic of a first video block; determining, by the processor, based on the determination of the size characteristic, that a first interpolation filter is to be applied to the first video block; and performing further processing of the first video block using the first interpolation filter.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, a first characteristic of a first video block; determining, by the processor, based on the first characteristic, that a first interpolation filter is to be applied to the first video block; performing further processing of the first video block using a first interpolation filter; determining, by the processor, a second characteristic of the second video block; determining, by the processor, based on the second characteristic, that a second interpolation filter is to be applied to the second video block, the first interpolation filter and the second interpolation filter being different short-tap filters; and performing further processing of the second video block using the second interpolation filter.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, a characteristic of the first video block, the characteristic comprising one or more of: size information of the first video block, a prediction direction of the first video block, or motion information of the first video block; rounding a Motion Vector (MV) associated with the first video block to integer-pixel precision or half-pixel precision based on the determination of the characteristic of the first video block; and performing further processing of the first video block using the rounded motion vector.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, that a first video block is coded in Merge mode; rounding motion information associated with the first video block to integer precision to generate modified motion information based on a determination that the first video block is coded in Merge mode; and performing a motion compensation process on the first video block using the modified motion information.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic being one or both of: a size of the first video block or a shape of the first video block; modifying a motion vector associated with the first video block to integer-pixel precision or half-pixel precision to generate a modified motion vector; and performing further processing of the first video block using the modified motion vector.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic being one or both of: the size of the first video block or the prediction direction of the first video block; determining MMVD side information based on the determination of the characteristics of the first video block; and performing further processing of the first video block using the MMVD side information.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic being one or both of: a size of the first video block or a shape of the first video block; determining a threshold number of half-pixel Motion Vector (MV) components or quarter-pixel MV components to be constrained based on a determination of a characteristic of the first video block; and performing further processing of the first video block using the threshold number.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic comprising a size of the first video block; modifying a Motion Vector (MV) associated with the first video block from fractional precision to integer precision based on the determination of the characteristic of the first video block; and performing motion compensation on the first video block using the modified MV.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a first size of a first video block; determining a first precision of a Motion Vector (MV) associated with the first video block based on the determination of the first size; determining a second size of the second video block, the first size and the second size being different sizes; determining a second precision of the MV associated with the second video block based on the determination of the second size, the first precision and the second precision being different precisions; and performing further processing of the first video block using the first size and performing further processing of the second video block using the second size.

In another example aspect, a method of video processing is disclosed. The method comprises the following steps: determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; determining a filter having interpolation filter parameters for interpolation of the first block based on the characteristic of the first block; and performing the conversion by using a filter having interpolation filter parameters.

In another example aspect, a method of video processing is disclosed. The method comprises the following steps: extracting, for a conversion between a first block of video and a bitstream representation of the first block, reference pixels of a first reference block from a reference picture, wherein the first reference block is smaller than a second reference block required for motion compensation of the first block; padding the first reference block with padding pixels to generate a second reference block; and performing the conversion by using the generated second reference block.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; performing a rounding process on a Motion Vector (MV) of the first block based on a characteristic of the first block; and performing the conversion by using the rounded MV.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; performing motion compensation on the first block using the MV having the first precision; and storing the MV with the second precision for the first block; wherein the first precision is different from the second precision.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a codec mode for a first block of video for a transition between the first block and a bitstream representation of the first block; performing a rounding process on a Motion Vector (MV) of the first block if a codec mode of the first block satisfies a predetermined rule; and performing motion compensation of the first block by using the rounded MV.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: generating a first Motion Vector (MV) candidate list for a first block for a conversion between the first block and a bitstream representation of the first block; performing a rounding process on the MV of the at least one candidate before adding the at least one candidate to the first MV candidate list; and performing the conversion by using the first MV candidate list.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; determining a constraint parameter to be applied to the first block based on a characteristic of the first block, wherein the constraint parameter constrains a maximum number of fractional Motion Vector (MV) components of the first block; and performing the conversion by using the constraint parameter.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: obtaining an indication that signaling of at least one of bi-directional prediction and uni-directional prediction is not allowed when a characteristic of a block satisfies a predetermined rule; determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; and performing the conversion by using the indication when the characteristic of the first block satisfies a predetermined rule.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: signaling an indication that at least one of bi-directional prediction and uni-directional prediction is not allowed when a characteristic of the block satisfies a predetermined rule; determining a characteristic of a first block of the video for a transition between the first block and a bitstream representation of the first block; the conversion is performed based on a characteristic of the first block, wherein during the conversion, at least one of bi-directional prediction and uni-directional prediction is disabled when the characteristic of the first block satisfies a predetermined rule.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, for a transition between a first block of video and a bitstream representation of the first block, whether fractional Motion Vector (MV) or Motion Vector Difference (MVD) precision is allowed for the first block; signaling an Advanced Motion Vector Resolution (AMVR) parameter for the first block based on the determination; and performing the conversion by using the AMVR parameters.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, for a transition between a first block of video and a bitstream representation of the first block, whether fractional Motion Vector (MV) or Motion Vector Difference (MVD) precision is allowed for the first block; based on the determination, obtaining an Advanced Motion Vector Resolution (AMVR) parameter for the first block; and performing the conversion by using the AMVR parameters.

In another example aspect, the above method may be implemented by a video decoder apparatus comprising a processor.

In another example aspect, the above-described method may be implemented by a video encoder apparatus comprising a processor for decoding encoded video during a video encoding process.

In yet another example aspect, the methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.

These and other aspects are further described in this document.

Drawings

FIG. 1 is a diagram of a BINARY quadtree (QUAD TREE BINARY TREE, QTBT) structure.

Figure 2 shows an example derivation process for the Merge candidate list construction.

Fig. 3 shows example positions of spatial domain Merge candidates.

Fig. 4 shows an example of a candidate pair considering redundancy check for spatial domain Merge candidates.

Fig. 5A and 5B show examples of the positions of the second Prediction Units (PUs) of the nx2N and 2 nxn partitions.

Fig. 6 is a diagram of motion vector scaling of temporal Merge candidates.

Fig. 7 shows example candidate positions C0 and C1 for the time domain Merge candidate.

Fig. 8 shows an example of combined bidirectional predictive Merge candidates.

Fig. 9 shows an example of a derivation process of a motion vector prediction candidate.

Fig. 10 is a diagram of motion vector scaling of spatial motion vector candidates.

Fig. 11 shows an example of Advanced Temporal Motion Vector Prediction (ATMVP) for a Coding Unit (CU).

Fig. 12 shows an example of one CU with four sub-blocks (a-D) and its neighboring blocks (a-D).

Figure 13 shows proposed non-neighboring Merge candidates in one example.

Figure 14 shows proposed non-neighboring Merge candidates in one example.

Figure 15 shows proposed non-neighboring Merge candidates in one example.

Fig. 16 shows an example of integer sample and fractional sample positions for quarter-sample luminance interpolation.

Fig. 17 is a block diagram of an example of a video processing apparatus.

Fig. 18 shows a block diagram of an example implementation of a video encoder.

Fig. 19 is a flowchart of an example of a video bitstream processing method.

Fig. 20 is a flowchart of an example of a video bitstream processing method.

Fig. 21 shows an example of repeated boundary pixels of the reference block before interpolation.

Fig. 22 is a flowchart of an example of a video bitstream processing method.

Fig. 23 is a flowchart of an example of a video bitstream processing method.

Fig. 24 is a flowchart of an example of a video bitstream processing method.

Fig. 25 is a flowchart of an example of a video bitstream processing method.

Fig. 26 is a flowchart of an example of a video bitstream processing method.

Fig. 27 is a flowchart of an example of a video bitstream processing method.

Fig. 28 is a flowchart of an example of a video bitstream processing method.

Fig. 29 is a flowchart of an example of a video bitstream processing method.

Fig. 30 is a flowchart of an example of a video bitstream processing method.

Fig. 31 is a flowchart of an example of a video bitstream processing method.

Fig. 32 is a flowchart of an example of a video bitstream processing method.

Fig. 33 is a flowchart of an example of a video bitstream processing method.

Fig. 34 is a flowchart of an example of a video bitstream processing method.

Detailed Description

This document provides various techniques that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

The section headings are used in this document for ease of understanding and do not limit the embodiments and techniques to the corresponding sections. As such, embodiments of one section may be combined with embodiments of other sections.

1. Overview

This patent document relates to video encoding and decoding techniques. In particular, it relates to interpolation in video codecs. It can be applied to existing video codec standards, such as HEVC, or to-be-finalized standards (multifunctional video codec). It may also be applicable to future video codec standards or video codecs.

2. Background of the invention

The video codec standards have evolved largely through the well-known development of the ITU-T and ISO/IEC standards. ITU-T makes H.261 and H.263, ISO/IEC makes MPEG-1 and MPEG-4 visuals, and these two organizations jointly make the H.262/MPEG-2 video and the H.264/MPEG-4 Advanced Video Codec (AVC) and the H.265/HEVC standards. Since h.262, video codec standards were based on hybrid video codec structures, in which temporal prediction plus transform coding was utilized. In order to explore future video codec technologies other than HEVC, joint video exploration group (jfet) was established by VCEG and MPEG in 2015. Thereafter, JVET adopted many new methods and entered them into a reference software named Joint Exploration Model (JEM). In month 4 of 2018, a joint video experts group (jfet) was created between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) with the goal of a VVC standard with a 50% reduction in bit rate compared to HEVC.

Fig. 18 is a block diagram of an example implementation of a video encoder.

2.1 quad Tree plus binary Tree (QTBT) Block Structure with larger CTU

In HEVC, CTUs are partitioned into CUs by using a quadtree structure, denoted as a coding tree, to accommodate various local features. At the CU level, it is decided whether to encode or decode a picture region using inter-picture (temporal) or intra-picture (spatial) prediction. Each CU may be further divided into one, two, or four PUs according to PU division types. Within a PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU partition type, the CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to a coding tree used for the CU. One of the key features of the HEVC structure is that it has multiple partition concepts, including CU, PU and TU.

The QTBT structure removes the concept of multiple partition types, i.e. it removes the separation of CU, PU and TU concepts and supports greater flexibility of CU partition shapes. In a QTBT block structure, a CU may have a square or rectangular shape. As shown in fig. 1, a coding and decoding tree unit (CTU) is first divided by a quadtree structure. The leaf nodes of the quadtree are further partitioned by a binary tree structure. In binary tree partitioning, there are two partition types, symmetric horizontal partitioning and symmetric vertical partitioning. The binary tree leaf nodes are called Codec Units (CUs), and the segments are used for prediction and transform processing without any further partitioning. This means that in a QTBT codec block structure, a CU, PU and TU have the same block size. In JEM, a CU sometimes consists of coding and decoding blocks (CBs) of different color components, e.g., in the case of P and B slices of the 4:2:0 chroma format, one CU contains one luma CB and two chroma CBs, and sometimes consists of CBs of a single component, e.g., in the case of I slices, one CU contains only one luma CB or only two chroma CBs.

The following parameters are defined for the QTBT segmentation scheme.

-CTU size: root node size of quadtree, same concept as in HEVC

-MinQTSize: minimum allowed quadtree leaf node size

-MaxBTSize: maximum allowed binary tree root node size

-MaxBTDepth: maximum allowed binary tree depth

-MinBTSize: minimum allowed binary tree leaf node size

In one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 luma samples with two corresponding 64 × 64 chroma sample blocks, MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (for both width and height) is set to 4 × 4, and MaxBTDepth is set to 4. The quadtree partitions are first applied to the CTUs to generate the quadtree leaf nodes. The size of the leaf nodes of the quadtree can range from 16 × 16 (i.e., MinQTSize) to 128 × 128 (i.e., CTU size). If the leaf quadtree node is 128 x 128, it will not be further partitioned by the binary tree because the size exceeds MaxBTSize (i.e., 64 x 64). Otherwise, the leaf nodes of the quadtree may be further partitioned by the binary tree. Thus, a leaf node of the quadtree is also the root node of the binary tree, and its depth of the binary tree is 0. When the depth of the binary tree reaches MaxBTDepth (i.e., 4), no further partitioning is considered. When the width of the binary tree node is equal to MinBTSize (i.e., 4), no further horizontal partitioning is considered. Similarly, when the height of the binary tree node is equal to MinBTSize, no further vertical partitioning is considered. The leaf nodes of the binary tree are further processed by the prediction and transformation process without any further partitioning. In JEM, the maximum CTU size is 256 × 256 luma samples.

Fig. 1 shows an example of block segmentation by using QTBT, and fig. 1 (right) shows the corresponding tree representation. The solid lines represent quad-tree partitions and the dashed lines represent binary tree partitions. In each partition (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which partition type (i.e., horizontal or vertical) to use, where 0 indicates horizontal partitioning and 1 indicates vertical partitioning. For the quad-tree division, since the quad-tree division always divides a block in horizontal and vertical directions to generate 4 sub-blocks having the same size, it is not necessary to indicate a division type.

Furthermore, the QTBT scheme supports the ability for luminance and chrominance to have separate QTBT structures. Currently, luminance and chrominance CTBs in one CTU share the same QTBT structure for P and B stripes. However, for an I-slice, luma CTB is partitioned into CUs by a QTBT structure and chroma CTB is partitioned into chroma CUs by another QTBT structure. This means that a CU in an I-slice comprises either the coding blocks of the luma component or the coding blocks of the two chroma components, and a CU in a P-or B-slice comprises the coding blocks of all three color components.

In HEVC, inter prediction for small blocks is restricted to reduce memory access for motion compensation, such that bi-prediction is not supported for 4 × 8 and 8 × 4 blocks, and inter prediction is not supported for 4 × 4 blocks. In the QTBT of JEM, these restrictions are removed.

2.2HEVC/H.265 inter prediction

Each inter-predicted PU has motion parameters for one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. The use of one of the two reference picture lists can also be signaled using inter _ pred _ idc. Motion vectors can be explicitly coded as deltas relative to the predictor.

When a CU is coded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no motion vector delta coded or reference picture indices. A Merge mode is specified whereby the motion parameters of the current PU are obtained from neighboring PUs that include spatial and temporal candidates. The Merge mode may be applied to any inter-predicted PU, not just for the skip mode. An alternative to the Merge mode is the explicit transmission of motion parameters, where each PU explicitly signals the motion vector (more precisely, the motion vector difference compared to the motion vector prediction), the corresponding reference picture index for each reference picture list, and the use of reference picture lists. In this disclosure, such a mode is referred to as Advanced Motion Vector Prediction (AMVP).

When the signaling indicates that one of the two reference picture lists is to be used, the PU is generated from one sample block. This is called "one-way prediction". Unidirectional prediction may be used for P slices and B slices.

When the signaling indicates that both reference picture lists are to be used, the PU is generated from two sample blocks. This is called "bi-prediction". Bi-prediction can only be used for B slices.

The following text provides detailed information about inter prediction modes specified in HEVC. The description will start with the Merge mode.

2.2.1Merge mode

2.2.1.1 derivation of candidates for Merge mode

When predicting a PU using the Merge mode, the index pointing to an entry in the Merge candidate list is parsed from the bitstream and used to retrieve motion information. The construction of this list is specified in the HEVC standard and can be summarized in the following sequence of steps:

step 1: initial candidate derivation

Step 1.1: spatial domain candidate derivation

Step 1.2: redundancy check of spatial domain candidates

Step 1.3: time domain candidate derivation

Step 2: additional candidate insertions

Step 2.1: creating bi-directional prediction candidates

Step 2.2: inserting zero motion candidates

These steps are also schematically depicted in fig. 2. For spatial domain Merge candidate derivation, a maximum of four Merge candidates are selected among the candidates located at five different positions. For time domain Merge candidate derivation, at most one Merge candidate is selected among the two candidates. Since a constant number of candidates per PU is assumed at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of Merge candidates (maxnummerge candidates) signaled in the slice header. Since the number of candidates is constant, the index of the best Merge candidate is encoded using Truncated Unary binarization (TU). If the size of the CU is equal to 8, all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2N × 2N prediction unit.

Hereinafter, operations associated with the above steps will be described in detail.

2.2.1.2 spatial domain candidate derivation

In the derivation of spatial domain Merge candidates, a maximum of four Merge candidates are selected among the candidates located at the positions depicted in FIG. 3. The order of derivation is A₁、B₁、B₀、A₀And B₂. Only when in position A₁、B₁、B₀、A₀Does not take into account location B when any PU of (e.g., because it belongs to another slice or slice) is unavailable or intra-coded₂. At the addition position A₁After the candidate of (b), a redundancy check is performed on the addition of the remaining candidates, which ensures that candidates with the same motion information are excluded from the list, thereby improving the codec efficiency. Fig. 4 shows an example of a candidate pair considering redundancy check for spatial domain Merge candidates. In order to reduce computational complexity, all possible candidate pairs are not considered in the mentioned redundancy check. Instead, only pairs linked with arrows in fig. 4 are considered, and if the corresponding candidates for redundancy check do not have the same motion information, the candidates are only added to the list. Another source of repetitive motion information is the "second PU" associated with a partition other than 2 nx 2N. As an example, fig. 5 depicts the second PU in the case of N × 2N and 2N × N. When the current PU is partitioned into Nx 2N, position A₁The candidates of (b) are not considered for list construction. In fact, adding this candidate will result in two prediction units having the same motion information, which is redundant for having only one PU in the coded unit. Similarly, when the current PU is divided into 2N, position B is not considered₁。

2.2.1.3 time-domain candidate derivation

In this step, only one candidate is added to the list. In particular, in the derivation of the temporal region Merge candidate, a scaled motion vector is derived based on a collocated (co-located) PU belonging to a Picture with a smallest POC (Picture Order Count) difference from a current Picture within a given reference Picture list. The reference picture list to be used for deriving the collocated PU is explicitly signaled in the slice header. Fig. 5 is a diagram of motion vector scaling of temporal Merge candidates. As indicated by the dashed line in fig. 5, a scaled motion vector for the temporal Merge candidate is obtained, which is scaled from the motion vector of the collocated PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal region Merge candidate is set equal to zero. A practical implementation of the scaling process is described in the HEVC specification. For B slices, two motion vectors (one for reference picture list 0 and the other for reference picture list 1) are obtained and combined to generate a bi-predictive Merge candidate.

Fig. 6 is a diagram of motion vector scaling for temporal domain Merge candidates.

In collocated PU (Y) belonging to the reference frame, in candidate C₀And C₁The location of the time domain candidate is selected as depicted in fig. 7. If at position C₀PU where is not available, is intra-coded, or is outside the current CTU row (row), then position C is used₁. Otherwise, position C is used in the derivation of the time domain Merge candidate₀。

2.2.1.4 additional candidate insertions

In addition to spatial and temporal Merge candidates, there are two additional types of Merge candidates: a combined bi-directional predicted Merge candidate and zero Merge candidate. A combined bidirectional predictive Merge candidate is generated by using both spatial and temporal Merge candidates. The combined bi-directionally predicted Merge candidates are for B slices only. A combined bi-directional prediction candidate is generated by combining a first reference picture list motion parameter of an initial candidate with a second reference picture list motion parameter of another initial candidate. If these two tuples provide different motion hypotheses, they will form new bi-directional prediction candidates. As an example, fig. 8 depicts the case when two candidates in the original list (on the left) with mvL0 and refIdxL0 or mvL1 and refIdxL1 are used to create a combined bi-predictive Merge candidate that is added to the final list (on the right). There are many rules on combinations that are considered to generate these additional Merge candidates.

Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list and thus reach the maxnummerge capacity. These candidates have zero spatial displacement and a reference picture index that starts from zero and increases each time a new zero motion candidate is added to the list. The number of reference frames that these candidates use is 1 and 2 for unidirectional prediction and bi-directional prediction, respectively. Finally, no redundancy check is performed on these candidates.

2.2.1.5 motion estimation regions for parallel processing

To speed up the encoding process, motion estimation may be performed in parallel, whereby the motion vectors of all prediction units inside a given region are derived simultaneously. Deriving the Merge candidate from the spatial neighborhood may interfere with parallel processing because a prediction unit cannot derive motion parameters from neighboring PUs until its associated motion estimation is complete. To mitigate the trade-off between codec efficiency and processing latency (trade-off), HEVC defines a Motion Estimation Region (MER), signaling the size of the MER in the picture parameter set using a "log 2_ parallel _ merge _ level _ minus 2" syntax element. When defining MER, the Merge candidates falling into the same region are marked as unavailable and are therefore not considered in the list construction.

2.2.2AMVP

AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by first checking the availability of left, top temporally adjacent PU locations, removing redundant candidates and adding zero vectors to make the candidate list a constant length. The encoder may then select the best prediction quantity from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, the index of the best motion vector candidate is encoded using a truncated unary. In this case, the maximum value to be encoded is 2 (see fig. 9). In the following sections, details will be provided regarding the derivation process of the motion vector prediction candidates.

2.2.2.1 derivation of AMVP candidates

Fig. 9 summarizes the derivation of motion vector prediction candidates.

In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For spatial motion vector candidate derivation, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions as depicted in fig. 3.

For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates derived based on two different collocated positions. After generating the first spatio-temporal candidate list, duplicate motion vector candidates in the list are removed. If the number of potential candidates is greater than two, the motion vector candidate within the associated reference picture list whose reference picture index is greater than 1 is removed from the list. If the number of spatio-temporal motion vector candidates is less than two, additional zero motion vector candidates are added to the list.

2.2.2.2 spatial motion vector candidates

In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates derived from PUs located at positions as depicted in fig. 3, those positions being the same as the position of the motion Merge. The derivation order of the left side of the current PU is defined as A₀、A₁And zoom A₀Zoom A₁. The derivation order of the upper side of the current PU is defined as B₀、B₁、B₂Zoom B₀Zoom B₁Zoom B₂. Thus for each side, there are four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling and two of which use spatial scaling. Four different cases are summarized as follows:

without spatial scaling

- (1) identical reference picture list, and identical reference picture index (identical POC)

- (2) different reference picture lists, but the same reference picture (same POC)

Spatial scaling

- (3) same reference picture list, but different reference pictures (different POCs)

- (4) different reference picture lists, and different reference pictures (different POCs)

First check for non-spatial scaling and then spatial scaling. Spatial scaling is considered when POC differs between reference pictures of neighboring PUs and reference pictures of the current PU regardless of the reference picture list. If all PUs of the left candidate are not available or are intra-coded, scaling for the upper side motion vector is allowed to facilitate parallel derivation of the left and upper side MV candidates. Otherwise, spatial scaling is not allowed for the upper motion vector.

Fig. 10 is a diagram of motion vector scaling for spatial motion vector candidates.

As depicted in fig. 10, in the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner as the temporal scaling. The main difference is that the reference picture list and the index of the current PU are given as input; the actual scaling procedure is the same as the time domain scaling procedure.

2.2.2.3 temporal motion vector candidates

All processes for deriving the temporal domain Merge candidate are the same as those for deriving the spatial motion vector candidate, except for reference picture index derivation (see fig. 7). The reference picture index is signaled to the decoder.

2.3 New interframe Merge candidates in JEM

2.3.1 sub-CU-based motion vector prediction

In JEM with QTBT, there can be at most one motion parameter set per CU for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by dividing the large CU into sub-CUs and deriving motion information of all sub-CUs of the large CU. An Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to extract multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In a Spatial-Temporal Motion Vector Prediction (STMVP) method, a Motion Vector of a sub-CU is recursively derived by using Temporal Motion Vector predictors and Spatial neighboring Motion vectors.

In order to maintain a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame is currently disabled.

2.3.1.1 alternative temporal motion vector prediction

In an optional Temporal Motion Vector Prediction (ATMVP) method, the Motion Vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of Motion information (including Motion vectors and reference indices) from blocks smaller than the current CU. As shown in fig. 11, a sub-CU is a square N × N block (N is set to 4 by default).

ATMVP predicts motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture with a so-called temporal vector. The reference picture is also referred to as a motion source picture. The second step is to divide the current CU into sub-CUs and obtain a motion vector and a reference index for each sub-CU from the block corresponding to each sub-CU, as shown in fig. 11.

In the first step, the reference picture and the corresponding block are determined from motion information of spatially neighboring blocks of the current CU. To avoid an iterative scanning process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set as the indices of the temporal vector and the motion source picture. In this way, in ATMVP, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes referred to as a collocated block) is always in a lower right or center position with respect to the current CU.

In a second step, the corresponding block of the sub-CU is identified by a temporal vector in the motion source picture by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (the minimum motion grid covering the central sample points) is used to derive the motion information of the sub-CU. After identifying the motion information of the corresponding nxn block, it is converted into a motion vector and reference index of the current sub-CU in the same way as the TMVP of HEVC, where motion scaling and other processes apply. For exampleThe decoder checks if the low delay condition is met (i.e. POC of all reference pictures of the current picture is smaller than POC of the current picture) and possibly uses the motion vector MV_x(e.g., a motion vector corresponding to reference picture list X) to predict motion vector MV of each sub-CU_y(e.g., where X equals 0 or 1, and Y equals 1-X).

2.3.1.2 spatio-temporal motion vector prediction (STMVP)

In this method, the motion vectors of the sub-CUs are recursively derived in raster scan order. Fig. 12 illustrates this concept. Let us consider an 8 × 8 CU, which contains 4 × 4 sub-CUs: A. b, C and D. The adjacent 4 x 4 blocks in the current frame are labeled a, b, c, and d.

The motion derivation of sub-CU a starts by identifying its two spatial neighbors (neighbor bins). The first neighbor is the nxn block (block c) on top of the sub-CU a. If this block c is not available or intra coded, the other nxn blocks on top of the sub-CU a are examined (from left to right, starting from block c). The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or intra-coded, the other blocks to the left of sub-CU a are checked (from top to bottom, starting from block b). The motion information obtained from the neighboring blocks of each list is scaled to the first reference frame of the given list. Next, the Temporal Motion Vector Predictor (TMVP) of sub-block a is derived by following the same procedure of TMVP derivation as specified by HEVC. The motion information of the collocated block at position D is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged for each reference list separately. The average motion vector is assigned as the motion vector of the current sub-CU.

2.3.1.3 sub-CU motion prediction mode signaling

The sub-CU modes are enabled as additional Merge candidates and no additional syntax elements are needed to signal these modes. Two additional Merge candidates are added to the Merge candidate list of each CU to represent ATMVP mode and STMVP mode. Up to seven Merge candidates may be used if the sequence parameter set indicates ATMVP and STMVP are enabled. The coding logic of the additional Merge candidates is the same as the coding logic of the Merge candidates in the HM, which means that two RD checks may also be needed for two additional Merge candidates for each CU in a P-slice or a B-slice.

In JEM, all bins (bins) of the Merge index are context coded by CABAC. Whereas in HEVC, only the first bin is context coded and the remaining bins are context bypass coded.

2.3.2 non-neighboring Merge candidates

The high-pass suggests deriving additional spatial domain Merge candidates from non-adjacent neighboring positions as labeled 6 to 49 in FIG. 13. The derived candidates are added to the Merge candidate list after the TMVP candidates.

Tencent proposes to derive additional spatial domain Merge candidates from positions in the external reference region with an offset (-96 ) relative to the current block.

As shown in FIG. 14, the locations are labeled A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Each candidate B (i, j) or C (i, j) is offset by 16 in the vertical direction compared to the previous B or C candidate. Each candidate a (i, j) or D (i, j) is shifted by 16 in the horizontal direction compared to the previous a or D candidate. Each E (i, j) is offset by 16 in both the horizontal and vertical directions compared to the previous E candidate. The candidates are checked from inside to outside. And the order of candidates is a (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Further study was made as to whether the number of Merge candidates could be further reduced. The candidates are added to the Merge candidate list after the TMVP candidates.

In some examples, as in fig. 15, the extended spatial domain positions from 6 to 27 may be checked according to their numerical order after the time domain candidate. To preserve the MV line buffer, all spatial candidates are restricted to two CTU lines.

2.4 Intra prediction in JEM

2.4.1 Intra mode codec with 67 Intra prediction modes

For luminance interpolation filtering, an 8-tap separable DCT-based interpolation filter is used for 2/4 precision samples, and a 7-tap separable DCT-based interpolation filter is used for 1/4 precision samples, as shown in table 1.

Table 1: 8-tap DCT-IF coefficient for 1/4 th luminance interpolation

Position of	Filter coefficient
		1/4	{-1,4,-10,58,17,-5,1}
2/4	{-1,4,-11,40,40,-11,4,-1}
		3/4	{1,-5,17,58,-10,4,-1}

Similarly, a 4-tap separable DCT-based interpolation filter is used for the chrominance interpolation filter, as shown in table 2.

Table 2: 4-tap DCT-IF coefficient for 1/8 chroma interpolation

Position of	Filter coefficient
		1/8	{-2,58,10,-2}
2/8	{-4,54,16,-2}
		3/8	{-6,46,28,-4}
4/8	{-4,36,36,-4}
		5/8	{-4,28,46,-6}
6/8	{-2,16,54,-4}
		7/8	{-2,10,58,-2}

The odd positions in table 2 are not used for vertical interpolation for 4:2:2 and for horizontal and vertical interpolation for 4:4:4 chroma channels, resulting in 1/4 th chroma interpolation.

For bi-directional prediction, the bit depth of the output of the interpolation filter is kept to 14-bit precision, regardless of the source bit depth, before averaging the two prediction signals. The actual averaging process is implicitly done using a bit depth reduction process:

predSamples[x,y]＝(predSamplesL0[x,y]+predSamplesL1[x,y]+offset)>>shift

wherein shift (15-BitDepth) and offset 1< (shift-1)

If both the horizontal and vertical components of the motion vector point to sub-pixel locations, then always first horizontal interpolation is performed and then vertical interpolation is performed. For example, for sub-pixel j shown in FIG. 16_0,0Interpolation is performed to obtain sub-pixel j_0,0And (6) carrying out interpolation. In FIG. 16, b is first paired according to equation 2-1_0,k(k-3, -2, … 3) and then j is interpolated according to equation 2-2_0,0And (6) carrying out interpolation. Here, shift1 ═ Min (4, Bi)tDepthY-8) and shift2 ═ 6.

b_0,k＝(-A_-3,k+4*A_-2,k-11*A_-1,k+40*A_0,k+40*A_1,k-11*A_2,k+4*A_3,k-A_4,k)>>shift1 (2-1)

j_0,0＝(-b_0,-3+4*b_0,-2-11*b_0,-1+40*b_0,0+40*b_0,1-11*b_0,2+4*b_0,3-b_0,4)>>shift2 (2-2)

Alternatively, we can perform vertical interpolation first, and then perform horizontal interpolation. In this case, the interpolation j is to be performed_0,0Interpolation is performed by first applying h according to equations 2-3_k,0(k-3, -2, … 3) and then j is interpolated according to equations 2-4_0,0And (6) carrying out interpolation. When BitDepthY is less than or equal to 8, shift1 is 0, and nothing is lost in the first interpolation stage, and therefore the final interpolation result does not change in interpolation order. However, when BitDepthY is greater than 8, shift1 is greater than 0. In this case, when a different interpolation order is applied, the final interpolation result may be different.

h_k,0＝(-A_k,-3+4*A_k,-2-11*A_k,-1+40*A_k,0+40*A_k,1-11*A_k,2+4*A_k,3-A_k,4)>>shift1 (2-3)

j_0,0＝(-h_-3,0+4*h_-2,0-11*h_-1,0+40*h_0,0+40*h_1,0-11*h_2,0+4*h_3,0-h_4,0)>>shift2 (2-4)

3. Examples of problems addressed by embodiments

For the luma block size W × H, if we always perform horizontal interpolation first, the required interpolation (per pixel) is as shown in table 3.

Table 3: interpolation required for WXH luma component through HEVC/JEM

On the other hand, if we first perform vertical interpolation, the required interpolation is as shown in table 4. Obviously, the optimal interpolation order is an order that requires a smaller interpolation time between tables 3 and 4.

Table 4: interpolation required for WXH luminance components when inverting the interpolation sequence

For the chroma component, if horizontal interpolation is always performed first, the required interpolation is ((H +3) × W + W × H)/(W × H) ═ 2+ 3/H. If we always perform vertical interpolation first, the required interpolation is ((W +3) × H + W × H)/(W × H) ═ 2+ 3/W.

As described above, when the bit depth of the input video is greater than 8, different interpolation orders may result in different interpolation results. Therefore, the interpolation order should be implicitly defined in both the encoder and the decoder.

4. Examples of the embodiments

To address these problems and provide other benefits, we propose a shape-dependent interpolation order. Assume that the interpolation filter tap (in motion compensation) is N (e.g., 8, 6, 4, or 2) and the current block size is W × H.

Assume that the number of MVDs allowed in MMVD (such as the number of entries of the distance table) is M. Note that the triangle mode is regarded as a bidirectional prediction mode, and the following technique related to bidirectional prediction can also be applied to the triangle mode.

The following detailed examples should be considered as examples to explain the general concept. These examples should not be construed narrowly. Further, these examples may be combined in any manner.

1. It is proposed that the interpolation order depends on the current codec block shape (e.g. the codec block is a CU).

a. In one example, for width>Tall blocks (such as CU, PU, or sub-blocks used in sub-block based prediction like affine, ATMVP, or BIO), perform vertical interpolation first, then horizontal interpolation,for example, first for pixel d_k,0、h_k,0And n_k,0Interpolation is carried out, then e is carried out_0,0To r_0,0And (6) carrying out interpolation. j is a function of_0,0Examples of (d) are shown in equations 2-3 and 2-4.

i. Alternatively, for a block with width > height (such as a CU, PU or sub-block used in sub-block based prediction like affine, ATMVP or BIO), vertical interpolation is performed first, followed by horizontal interpolation.

b. In one example, for a block of width < ═ height (such as a CU, PU, or sub-block used in sub-block based prediction like affine, ATMVP, or BIO), horizontal interpolation is performed first, followed by vertical interpolation.

i. Alternatively, for blocks of width < height (such as CU, PU or sub-blocks used in sub-block based prediction like affine, ATMVP or BIO), first a horizontal interpolation is performed, then a vertical interpolation is performed.

c. In one example, both the luma component and the chroma component follow the same interpolation order.

d. Alternatively, when one chroma codec block corresponds to multiple luma codec blocks (e.g., one chroma 4 x 4 block may correspond to two 8 x 4 or 4 x 8 luma blocks for a 4:2:0 color format), luma and chroma may use different interpolation orders.

e. In one example, when different interpolation orders are utilized, the scaling factors in the multiple stages (i.e., shift1 and shift2) may also be changed accordingly.

2. Further alternatively, it is proposed that the interpolation order of the luminance components may also depend on the MVs.

a. In one example, if the vertical MV component points to a quarter-pixel position and the horizontal MV component points to a half-pixel position, then first horizontal interpolation is performed, and then vertical interpolation is performed.

b. In one example, if the vertical MV component points to a half-pixel position and the horizontal MV component points to a quarter-pixel position, then vertical interpolation is performed first, followed by horizontal interpolation.

c. In one example, the proposed method is only applied to square codec blocks.

3. It is proposed that for blocks coded in a Merge mode (e.g., a regular Merge list, a triangular Merge list, an affine Merge list, or other non-intra/non-AMVP mode), the associated motion information may be modified to integer precision (e.g., via rounding) before invoking the motion compensation process.

a. Alternatively, the Merge candidates having the score Merge candidate may be excluded from the Merge list.

b. Alternatively, when a Merge candidate derived from spatial or temporal blocks or other means (such as HMVP, pairwise bi-predictive Merge candidates) is associated with a fractional motion vector, the fractional motion vector may first be modified to integer precision (e.g., via rounding) before being added to the Merge list.

c. In one example, a separate HMVP table may be dynamically (on-the-fly) maintained to store motion candidates with integer precision.

d. Alternatively, the above method may be applied only when the Merge candidate is a bidirectional prediction candidate.

e. In one example, the above method may be applied to certain block sizes, such as 4 × 16, 16 × 4, 4 × 8, 8 × 4, 4 × 4.

f. In one example, the above method may be applied to an AMVP codec block, where the Merge candidate may be replaced with an AMVP candidate.

g. In one example, the above approach may be applied to certain block modes, such as non-affine modes.

4. It is proposed that MMVD side information (such as distance table, direction) may depend on block size and/or prediction direction (e.g. unidirectional prediction or bi-directional prediction).

a. In one example, a distance table with all integer accuracies may be defined or signaled.

b. In one example, if a base Merge candidate is associated with a motion vector of fractional precision, it may first be modified (such as via rounding) to integer precision and then used to derive a final motion vector for motion compensation.

5. It is proposed that for certain block sizes or block shapes, the MVs in MMVD mode can be constrained to have integer-pixel precision or half-pixel precision.

a. In one example, if integer pixel precision is selected for the MMVD codec block, the basic Merge candidate used in MMVD may first be modified to integer pixel precision (such as via rounding).

b. In one example, if half-pixel precision is selected for MMVD codec blocks, the basic Merge candidate used in MMVD may be modified to half-pixel precision (such as via rounding).

i. In one example, rounding may be performed during the basic Merge list building process, and therefore rounded MVs are used in pruning.

in one example, rounding may be performed after the basic Merge list building process, so ungrounded MVs are used in pruning.

c. In one example, if integer-pel precision or half-pel precision is used for MMVD mode, only MVDs with the same or lower precision are allowed.

i. For example, if integer-pel precision is used for MMVD mode, only integer-pel precision, 2-pel precision, or N-pel precision (N > ═ 1) MVDs are allowed.

d. In one example, if K MVDs are not allowed in MMVD mode, binarization of the MVD index may be modified because the maximum MVD index is M-K-1 instead of M-1. Meanwhile, different contexts may be used in CABAC coding and decoding.

e. In one example, rounding may be performed after deriving the MV in MMVD mode.

f. The constraint may be different for bi-directional prediction and uni-directional prediction. For example, the constraint may not be applied in unidirectional prediction.

g. The constraint may be different for different block sizes or block shapes.

6. It is proposed that for certain block sizes or block shapes, the maximum number of half-pixel MV components or/and quarter-pixel MV components (e.g. horizontal MV or vertical MV) may be constrained.

a. In one example, the bitstream should comply with the constraint.

b. The constraint may be different for bi-directional prediction and uni-directional prediction. For example, the constraint may not be applied in unidirectional prediction.

i. For example, such a constraint may be applied to bi-directionally predicted 4 x 8 or/and 8 x 4 or/and 4 x 16 or/and 16 x 4 blocks, however, it may not be applied to uni-directionally predicted 4 x 8 or/and 8 x 4 or/and 4 x 16 or/and 16 x 4 blocks.

For example, such constraints may be applied to 4 x 4 blocks for bi-directional prediction and uni-directional prediction.

c. The constraint may be different for different block sizes or block shapes.

d. The constraint may be applied to the triangle pattern.

i. For example, such constraints may be applied to 4 × 16 or/and 16 × 4 blocks that are coded in delta mode.

e. In one example, for a bi-predicted block, up to 3 quarter-pixel MV components may be allowed.

f. In one example, for a bi-predicted block, up to 2 quarter-pixel MV components may be allowed.

g. In one example, for a bi-predicted block, a maximum of 1 quarter-pixel MV component may be allowed.

h. In one example, for a bi-predicted block, up to 0 quarter-pixel MV components may be allowed.

i. In one example, for a uni-directional prediction block, up to 1 quarter-pixel MV component may be allowed.

j. In one example, for a uni-directional prediction block, up to 0 quarter-pixel MV components may be allowed.

k. In one example, for bi-prediction blocks, up to 3 fractional MV components may be allowed.

In one example, for a bi-prediction block, a maximum of 2 fractional MV components may be allowed.

In one example, for a bi-prediction block, a maximum of 1 fractional MV component may be allowed.

In one example, for a bi-prediction block, up to 0 fractional MV components may be allowed.

In one example, for a uni-directional prediction block, a maximum of 1 fractional MV component may be allowed.

In one example, for a uni-directional prediction block, up to 0 fractional MV components may be allowed.

7. It is proposed that some components of the MV can be rounded to integer-pixel precision or half-pixel precision depending on the size (e.g. width and/or height, ratio of width and height) of the block or/and prediction direction or/and motion information.

a. In one example, the MV is rounded to the nearest integer-pel precision MV or/and half-pel precision MV.

b. In one example, different rounding methods may be used. For example, round-down, round-up, round-to-zero, or far-zero rounding may be used.

c. In one example, if the size (i.e., width x height) of the block is less than (or greater than) (and/or equal to) a threshold L (e.g., L16 or 64), MV rounding may be applied to the horizontal or/and vertical MV components.

d. In one example, MV rounding may be applied to the horizontal (or vertical) MV component if the width (or height) of the block is less than (and/or equal to) the threshold L1 (e.g., L1 ═ 4, 8).

e. In one example, the thresholds L and L1 may be different for bi-directional and uni-directional prediction blocks. For example, a smaller threshold may be used for bi-prediction blocks.

f. In one example, MV rounding may be applied if the ratio between width and height is greater than a first threshold or less than a second threshold (such as for a narrow block such as 4 × 16 or 16 × 4).

g. In one example, MV rounding may be applied only when both the horizontal and vertical components of the MV are fractional (i.e., they point to fractional pixel positions rather than integer pixel positions).

h. Whether MV rounding is applied may depend on whether the current block is bi-directional predicted or uni-directional predicted.

i. For example, MV rounding may be applied only when the current block is bi-directionally predicted.

i. Whether MV rounding is applied may depend on the prediction direction (e.g., from list 0 or list 1) and/or the associated motion vector. In one example, for bi-prediction blocks, whether MV rounding is applied may be different for different prediction directions.

i. In one example, if the MV of the prediction direction X (X ═ 0 or 1) has fractional components in both the horizontal and vertical directions, MV rounding may be applied to the N MV components of the prediction direction X; otherwise, MV rounding may not be applied. Here, N is 0,1 or 2.

in one example, if N (N > ═ 0) MV components have fractional precision, MV rounding may be applied to M (0< ═ M < ═ N) MV components of the N MV components.

1, N and M may be different for bi-directional and uni-directional prediction blocks.

2, N and M may be different for different block sizes (width or/and height or/and width x height).

3. For example, for a bi-prediction block, N equals 4 and M equals 4.

4. For example, for a bi-prediction block, N equals 4 and M equals 3.

5. For example, for a bi-prediction block, N equals 4 and M equals 2.

6. For example, for a bi-prediction block, N equals 4 and M equals 1.

7. For example, for a bi-prediction block, N equals 3 and M equals 3.

8. For example, for a bi-prediction block, N equals 3 and M equals 2.

9. For example, for a bi-prediction block, N equals 3 and M equals 1.

10. For example, for a bi-prediction block, N equals 2 and M equals 2.

11. For example, for a bi-prediction block, N equals 2 and M equals 1.

12. For example, for a bi-prediction block, N equals 1 and M equals 1.

13. For example, for a uni-directional prediction block, N equals 2 and M equals 2.

14. For example, for a uni-directional prediction block, N equals 2 and M equals 1.

15. For example, for a uni-directional prediction block, N equals 1 and M equals 1.

in one example, K MV components of the M MV components are rounded to integer-pixel precision and M-K MV components are rounded to half-pixel precision, where K is 0,1, …, M-1.

j. Whether MV rounding is applied may be different for different color components, such as Y, Cb and Cr.

i. For example, whether and how MV rounding is applied may depend on the color format, such as 4:2:0, 4:2:2, or 4:4: 4.

k. Whether and/or how MV rounding is applied may depend on block size (or width, height), block shape, prediction direction, etc.

i. In one example, some MV components of a 4 × 16 or/and 16 × 4 bi-directional predicted luma block or/and a uni-directional predicted luma block may be rounded to half-pixel precision.

in one example, some MV components of a 4 x 16 or/and 16 x 4 bi-directional predicted luma block or/and a uni-directional predicted luma block may be rounded to integer-pixel precision.

in one example, some MV components of a 4 x 4 uni-directional predicted luma block or/and a bi-directional predicted luma block may be rounded to integer-pixel precision.

in one example, some MV components of a 4 x 8 or/and 8 x 4 bi-directionally predicted luma block or/and a uni-directionally predicted luma block may be rounded to integer-pixel precision.

In one example, MV rounding may not be applied to sub-block prediction, such as affine prediction.

i. In an alternative example, MV rounding may be applied to sub-block prediction, such as ATMVP prediction. In this case, each sub-block is treated as a codec block to determine whether and how to apply MV rounding.

8. It is proposed that for some block sizes, the motion vectors of a block should be modified to integer precision before being used for motion compensation (e.g. if they are fractional precision).

9. In one example, the stored motion vectors and the motion vectors used for motion compensation may be of different precision for certain block sizes.

a. In one example, sub-pixel precision (also referred to as fractional precision, such as 1/4 pixels, 1/16 pixels) may be stored for blocks with certain block sizes, but the motion compensation process is based on integer versions of these motion vectors (such as via rounding).

10. It is proposed that an indication that bi-prediction is not allowed for certain block sizes may be signaled in a sequence parameter set/picture parameter set/sequence header/picture header/slice group header/CTU row/region/other high level syntax.

a. Alternatively, an indication that bi-prediction is not allowed for certain block sizes may be signaled in the sequence parameter set/picture parameter set/sequence header/picture header/slice group header/CTU row/region/other high level syntax.

b. Alternatively, an indication that bi-directional prediction and/or uni-directional prediction is not allowed for certain block sizes may be signaled in the sequence parameter set/picture parameter set/sequence header/picture header/slice group header/CTU row/region/other high level syntax.

c. Further, such an indication may alternatively be applied only to certain patterns, such as non-affine patterns.

d. Further alternatively, when unidirectional/bidirectional prediction is not allowed for a block, the signaling of the AMVR index may be modified accordingly, such as only integer-pixel precision is allowed, or conversely different MV precision may be utilized.

e. Further, alternatively, the above methods (such as the bullets 3-9) may also be applicable.

11. It is proposed that a consistent bitstream should follow the rule that for certain block sizes, only integer-pixel motion vectors are allowed for bi-predictive coded blocks.

a. It is proposed that a consistent bitstream should follow the rule that for certain block sizes, only integer-pixel motion vectors are allowed for bi-predictive coded blocks.

The signalling of the AMVR flag may depend on whether fractional motion vectors are allowed for the block or not.

a. In one example, if fractional (i.e., 1/4 pixels) MV/MVD precision is not allowed for a block, a flag indicating whether the MV/MVD precision of the current block is 1/4 pixels may be skipped and implicitly derived as false.

13. In one example, the block sizes are, for example, 4 × 16, 16 × 4, 4 × 8, 8 × 4, 4 × 4.

14. It is proposed that filters with different interpolation filters (e.g. different filter taps and/or different filter interpolation filter coefficients) can be used in the interpolation depending on the size of the block (e.g. width and/or height, ratio of width and height).

a. Different filters may be used for vertical interpolation and horizontal interpolation. For example, a shorter tap filter may be applied to vertical interpolation than a filter used for horizontal interpolation.

b. In one example, interpolation filters with fewer taps than the interpolation filter in VTM-3.0 may be applied in some cases. These interpolation filters with fewer taps are also referred to as "short tap filters".

c. In one example, if the size (i.e., width x height) of the block is less than (or greater than) (and/or equal to) a threshold L (e.g., L16 or 64), then a different filter (e.g., a short tap filter) may be used for horizontal interpolation or/and vertical interpolation.

d. In one example, if the width (or height) of the block is less than (and/or equal to) the threshold L1 (e.g., L1 ═ 4, 8), then a different filter (e.g., a short tap filter) may be used for horizontal (or vertical) interpolation.

e. In one example, if the ratio between the width and the height is greater than a first threshold or less than a second threshold (such as for a narrow block such as 4 x 16 or 16 x 4), then a different filter (e.g., a short tap filter) may be selected than for other kinds of blocks.

f. In one example, the short tap filter may be used only when both the horizontal and vertical components of the MV are fractional (i.e., they point to fractional pixel positions rather than integer pixel positions).

g. Which filter to use (e.g., a short tap filter may or may not be used) may depend on whether the current block is bi-directional predicted or uni-directional predicted.

i. For example, a short tap filter may be used only when the current block is bi-predicted.

h. Which filter to use (e.g., short tap filter may or may not be used) may depend on the prediction direction (e.g., from list 0 or list 1) and/or the associated motion vector. In one example, whether to use a short tap filter may be different for different prediction directions for a bi-predicted block.

i. In one example, if the MV of the prediction direction X (X ═ 0 or 1) has fractional components in both the horizontal and vertical directions, then a short tap filter is used for the prediction direction X; otherwise, the short tap filter is not used.

in one example, if N (N > ═ 0) MV components have fractional precision, then a short tap filter may be applied to M (0< ═ M < ═ N) MV components of the N MV components.

3. For example, for a bi-prediction block, N equals 4 and M equals 4.

4. For example, for a bi-prediction block, N equals 4 and M equals 3.

5. For example, for a bi-prediction block, N equals 4 and M equals 2.

6. For example, for a bi-prediction block, N equals 4 and M equals 1.

7. For example, for a bi-prediction block, N equals 3 and M equals 3.

8. For example, for a bi-prediction block, N equals 3 and M equals 2.

9. For example, for a bi-prediction block, N equals 3 and M equals 1.

10. For example, for a bi-prediction block, N equals 2 and M equals 2.

11. For example, for a bi-prediction block, N equals 2 and M equals 1.

12. For example, for a bi-prediction block, N equals 1 and M equals 1.

Different short-tap filters may be used for the M MV components.

1. In one example, K MV components of the M MV components use an S1 tap filter and M-K MV components use an S2 tap filter, where K is 0, 1. For example, S1 equals 6 and S2 equals 4.

i. In one example, different filters (e.g., short tap filters) may be used for only some pixels. For example, they are only used for the boundary pixels of the block.

i. For example, they are used only for the N1 right column or/and the N2 left column or/and the N3 top row or/and the N4 bottom row of blocks.

j. Whether a short tap filter is used may be different for the uni-directional prediction block and the bi-directional prediction block.

k. Whether or not short tap filters are used may be different for different color components, such as Y, Cb and Cr.

i. For example, whether and how the short tap filter is applied may depend on the color format, such as 4:2:0, 4:2:2, or 4:4: 4.

Different short tap filters may be used for different blocks. The short tap filter selected may depend on block size (or width, height), block shape, prediction direction, etc.

i. In one example, a 7 tap filter is used for horizontal and vertical interpolation of 4 × 16 or/and 16 × 4 bi-directional predicted luma blocks or/and uni-directional predicted luma blocks.

in one example, a 7 tap filter is used for horizontal (or vertical) interpolation of a 4 x 4 uni-directional predicted luma block or/and a bi-directional predicted luma block.

in one example, a 6 tap filter is used for horizontal and vertical interpolation of 4 x 8 or/and 8 x 4 bi-directional predicted luma blocks or/and uni-directional predicted luma blocks.

1. Alternatively, a 6-tap filter and a 5-tap filter (or a 5-tap filter and a 6-tap filter) are used in the horizontal interpolation and the vertical interpolation for the 4 × 8 or/and 8 × 4 bi-directional predicted luminance block or/and uni-directional predicted luminance block, respectively.

Different short tap filters can be used for different kinds of motion vectors.

i. In one example, a longer tap length filter may be used for motion vectors having fractional components in only one direction (i.e., horizontal or vertical), and a shorter tap length filter may be used for motion vectors having fractional components in both the horizontal and vertical directions.

For example, an 8-tap filter is used for a 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction block or/and a uni-directional prediction block with fractional MV components in only one direction, and a short tap filter described in the clause 3.h is used for a 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction block or/and a uni-directional prediction block with fractional MV components in both directions.

in one example, the interpolation filter for affine motion may be different from the interpolation filter for translational motion vectors.

in one example, a short tap interpolation filter may be used for affine motion as compared to an interpolation filter used to translate motion vectors.

In one example, the short tap filter may not be applied to sub-block prediction, such as affine prediction.

i. In an alternative example, a short tap filter may be applied to the sub-block prediction, such as ATMVP prediction. In this case, each sub-block is treated as a codec block to determine whether and how to apply the short tap filter.

In one example, whether and/or how the short tap filter is applied may depend on block size, codec information, and the like.

i. In one example, a short tap filter may be applied when a certain mode is enabled for a block (such as OBMC, interleaved affine prediction mode).

15. It is proposed that (W + N-1-PW) ((W + N-1-PH)) reference pixels (instead of (W + N-1) ((H + N-1)) reference pixels) can be extracted for motion compensation of a W × H block, where PW and PH cannot both equal 0.

a. Furthermore, in one example, for the remaining reference pixels (not extracted, but required for motion compensation), filling or derivation from the extracted reference samples may be applied.

b. Further alternatively, pixels at the reference block boundaries (top, left, bottom, and right) are repeated to generate a (W + N-1) × (H + N-1) block, which is used for final interpolation. For example, as shown in fig. 21, W is 8, H is 4, N is 7, PW is 2, and PH is 3.

c. The extracted reference pixels may be identified by (x + MVXInt-N/2 + offSet1, y + MVYInt-N/2 + offSet2), where (x, y) is the top left position of the current block, (MVXInt, MVYInt) is the integer part of MV, and offSet1 and offSet2 are integers such as-2, -1, 0,1, 2, etc.

d. In one example, PH is zero and only the left boundary or/and the right boundary is repeated.

e. In one example, PW is zero, and only the top boundary or/and the bottom boundary is repeated.

f. In one example, PW and PH are both greater than zero, and first the left boundary or/and the right boundary is repeated, then the top boundary or/and the bottom boundary is repeated.

g. In one example, PW and PH are both greater than zero, and first the top boundary or/and the bottom boundary is repeated, then the left boundary or/and the right boundary is repeated.

h. In one example, the left boundary is repeated M1 times and the right boundary is repeated PW-M1 times, where M1 is an integer and M1> -0.

i. Alternatively, if M1 (or PW-M1) is greater than 1, instead of repeating the first left (or right) column M1 times, multiple columns may be utilized, such as M1 left columns (or PW-M1 right columns) may be repeated.

i. In one example, the top boundary is repeated M2 times and the bottom boundary is repeated PH-M2 times, where M2 is an integer and M2> -0.

i. Alternatively, if M2 (or PH-M2) is greater than 1, instead of repeating the first top (or bottom) row M2 times, multiple rows may be utilized, such as M2 top rows (or PH-M2 bottom rows) may be repeated.

j. In one example, some default values may be used for boundary padding.

k. In one example, such a boundary pixel repetition method may be used only when both the horizontal and vertical components of the MV are fractional (i.e., they point to fractional pixel positions rather than integer pixel positions).

In one example, such a boundary pixel repetition method may be applied to some or all of the reference blocks.

i. In one example, if the MV of the prediction direction X (X ═ 0 or 1) has fractional components in both the horizontal and vertical directions, such a boundary pixel repetition method is used for the prediction direction X; otherwise, the method is not used.

in one example, if N (N > ═ 0) MV components have fractional precision, the boundary pixel repetition method may be applied to M (0< ═ M < ═ N) MV components of the N MV components.

3. For example, for a bi-prediction block, N equals 4 and M equals 4.

4. For example, for a bi-prediction block, N equals 4 and M equals 3.

5. For example, for a bi-prediction block, N equals 4 and M equals 2.

6. For example, for a bi-prediction block, N equals 4 and M equals 1.

7. For example, for a bi-prediction block, N equals 3 and M equals 3.

8. For example, for a bi-prediction block, N equals 3 and M equals 2.

9. For example, for a bi-prediction block, N equals 3 and M equals 1.

10. For example, for a bi-prediction block, N equals 2 and M equals 2.

11. For example, for a bi-prediction block, N equals 2 and M equals 1.

12. For example, for a bi-prediction block, N equals 1 and M equals 1.

Different boundary pixel repetition methods may be used for the M MV components.

The pw and/or PH may be different for different color components (such as Y, Cb and Cr).

i. For example, whether and how border pixel repetition is applied may depend on the color format, such as 4:2:0, 4:2:2, or 4:4: 4.

In one example, the PW and/or PH may be different for different block sizes or shapes.

in one example, PW and PH are set equal to 1 for 4 × 16 or/and 16 × 4 bi-prediction blocks or/and uni-prediction blocks.

v. in one example, PW and PH are set equal to 0 and 1 (or 1 and 0), respectively, for 4 × 4 bi-directional prediction or/and uni-directional prediction blocks.

In one example, PW and PH are set equal to 2 for 4 x 8 or/and 8 x 4 bi-prediction blocks or/and uni-prediction blocks.

1. Alternatively, PW and PH are set equal to 2 and 3 (or 3 and 2) for 4 × 8 or/and 8 × 4 bi-prediction blocks or/and uni-prediction blocks, respectively.

In one example, PW and PH may be different for unidirectional prediction and bidirectional prediction.

p.pw and PH may be different for different kinds of motion vectors.

In one example, PW and PH may be smaller (even zero) for motion vectors having fractional components in only one direction (i.e., horizontal or vertical), and PW and PH may be larger for motion vectors having fractional components in both horizontal and vertical directions.

For example, PW and PH are set equal to 0 for 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction blocks or/and uni-prediction blocks with fractional MV components in only one direction, and PW and PH described by the clause 4.i is used for 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction blocks or/and uni-prediction blocks with fractional MV components in both directions.

Fig. 21 shows an example of repeating the boundary pixels of the reference block before interpolation.

16. The proposed method may be applied to certain modes, block sizes/shapes, and/or certain sub-block sizes.

a. The proposed method may be applied to certain modes, such as bi-predictive mode.

b. The proposed method can be applied to certain block sizes.

i. In one example, it is only applied to blocks of w × h < ═ T, where w and h are the width and height of the current block.

in one example, it is only applied to blocks of h < ═ T.

c. The proposed method may be applied to a certain color component (such as only the luminance component).

17. The above rounding operation can be defined as:

shift (x, s) is defined as

Shift(x,s)＝(x+off)＞＞s

SignShift (x, s) is defined as

Wherein off is an integer, such as 0 or 2^s-1。

c. Which may be defined as those used for motion vector rounding in AMVR processes, affine processes, or other process blocks.

18. In one example, how the MV is rounded may depend on the MV component.

a. For example, the y component of the MV is rounded to integer pixels, but the x component of the MV is not rounded.

b. In one example, the MV may be rounded to integer pixels before motion compensation for the luminance component, but rounded to 2 pixels before motion compensation for the chrominance component when the color format is 4:2: 0.

19. Bilinear filters are proposed for interpolation filtering for one or more specific cases, such as:

4 × 4 unidirectional prediction;

4 × 8 bidirectional prediction;

c.8 × 4 bidirectional prediction;

4.4 × 16 bi-directional prediction;

e.16 × 4 bi-directional prediction;

f.8 × 8 bidirectional prediction;

g.8 × 4 unidirectional prediction;

h.4 × 8 uni-directional prediction;

20. it is proposed that when multi-hypothesis prediction is applied to one block, short taps or different interpolation filters may be applied compared to those applied to the normal prediction mode.

a. In one example, a bilinear filter may be used.

b. A short tap or second interpolation filter may be applied to a reference picture list involving multiple reference blocks, while for another reference picture with only one reference block, the same filter as used for the normal prediction mode may be applied.

c. The proposed method may be applied under certain conditions, such as certain temporal layer(s), the quantization parameter of the block/slice/picture containing the block is within a range (such as greater than a threshold).

Fig. 17 is a block diagram of the video processing apparatus 1700. Apparatus 1700 may be used to implement one or more of the methods described herein. The apparatus 1700 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1700 may include one or more processors 1702, one or more memories 1704, and video processing hardware 1706. The processor(s) 1702 may be configured to implement one or more of the methods described in this document. Memory (es) 1704 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 1706 may be used to implement some of the techniques described in this document in hardware circuits.

Fig. 19 is a flow chart of a method 1900 of video bitstream processing. The method 1900 includes: determining (1905) a shape of the video block; determining (1910) an interpolation order based on the video blocks, the interpolation order indicating an order in which horizontal interpolation and vertical interpolation are performed; and performing horizontal interpolation and vertical interpolation according to the interpolation order of the video blocks to reconstruct (1915) a decoded representation of the video block.

Fig. 20 is a flow chart of a method 2000 of video bitstream processing. The method 2000 includes: determining (2005) a characteristic of a motion vector associated with the video block; determining (2010) an interpolation order of the video blocks based on the characteristics of the motion vectors, the interpolation order indicating an order in which horizontal interpolation and vertical interpolation are performed; and performing horizontal interpolation and vertical interpolation according to the interpolation order of the video blocks to reconstruct (2015) a decoded representation of the video block.

Fig. 22 is a flow chart of a method 2200 of video bitstream processing. The method 2200 comprises: determining (2205) a size characteristic of the first video block; determining (2210) that a first interpolation filter is to be applied to the first video block based on the determination of the size characteristic; and performing (2215) further processing of the first video block using the first interpolation filter.

Fig. 23 is a flow chart of a method 2300 of video bitstream processing. The method 2300 comprises: determining (2305) a first characteristic of a first video block; determining (2310) that a first interpolation filter is to be applied to the first video block based on the determination of the first characteristic; performing (2315) further processing of the first video block using the first interpolation filter; determining (2320) a second characteristic of the second video block; determining (2325) that a second interpolation filter is to be applied to the first video block based on the second characteristic, the first and second interpolation filters being different short-tap filters; and performing (2330) further processing of the second video block using the second interpolation filter.

Some examples of the order in which horizontal interpolation and vertical interpolation are performed and uses thereof are described in section 4 of this document with reference to

methods

1900, 2000, 2200, and 2300. For example, as described in section 4, in different shapes of video blocks, one of horizontal interpolation or vertical interpolation may be preferentially performed first. In some embodiments, horizontal interpolation is performed before vertical interpolation, and in some embodiments, vertical interpolation is performed before horizontal interpolation.

Referring to

methods

1900, 2000, 2200, and 2300, a video block may be encoded in a video bitstream in which bit efficiency may be achieved by using a bitstream generation rule related to an interpolation order, where the interpolation order also depends on the shape of the video block.

The method may include wherein rounding the motion vector comprises one or more of: rounded to the nearest integer-pel precision MV or rounded to half-pel precision MV.

The method may include wherein rounding the MV comprises one or more of: rounding down, rounding up, rounding to zero, or rounding far from zero.

The method may include wherein the size information indicates that the size of the first video block is less than a threshold, and rounding the MV is applied to one or both of the horizontal MV component or the vertical MV component based on the size information indicating that the size of the first video block is less than the threshold.

The method may include wherein the size information indicates that a width or height of the first video block is less than a threshold, and rounding the MV is applied to one or both of the horizontal MV component or the vertical MV component based on the size information indicating that the width or height of the first video block is less than the threshold.

The method may include wherein the threshold is different for bi-directional prediction blocks and uni-directional prediction blocks.

The method may include wherein the size information indicates that a ratio between a width and a height of the first video block is greater than a first threshold or less than a second threshold, and wherein rounding the MV is based on the determination of the size information.

The method may include wherein rounding the MV is further based on both the horizontal component and the vertical component of the MV being fractional.

The method may include wherein rounding the MV is further based on whether the first video block is bi-predictive or uni-predictive.

The method may include wherein rounding the MV is further based on a prediction direction associated with the first video block.

The method may include wherein rounding the MV is further based on a color component of the first video block.

The method may include wherein rounding the MV is further based on a size of the first video block, a shape of the first video block, or a predicted shape of the first video block.

The method may include wherein rounding the MV is applied to the sub-block prediction.

The method may include wherein the short tap filter is applied to the MV component based on the MV component having fractional precision.

The method may include wherein the short tap filter is applied based on a size of the first video block or codec information of the first video block.

The method may include wherein the short tap filter is applied based on a mode of the first video block.

The method may include wherein the default value is for boundary fill associated with the first video block.

The method may include wherein the Merge mode is one or more of: a regular Merge list, a triangular Merge list, an affine Merge list, or other non-intra or non-AMVP mode.

The method may include wherein the Merge candidates having a score Merge candidate are excluded from the Merge list.

The method may include wherein rounding motion information comprises rounding Merge candidates associated with fractional motion vectors to integer precision and the modified motion information is inserted into the Merge list.

The method may include wherein the motion information is a bi-directional prediction candidate.

The method may include wherein the MMVD is an average magnitude of the vector difference.

The method may include wherein the motion vector is in MMVD mode.

The method may include wherein the first video block is an MMVD codec block to be associated with integer-pel precision, and wherein the base Merge candidate used in MMVD is modified to integer-pel precision via rounding.

The method may include wherein the first video block is an MMVD codec block to be associated with half-pixel precision, and wherein the base Merge candidate used in MMVD is modified to half-pixel precision via rounding.

The method may include wherein the threshold number is a maximum number of allowed half-pixel MV components or quarter-pixel MV components.

The method may include wherein the threshold number is different between bi-directional prediction and uni-directional prediction.

The method may include wherein the indication that bi-prediction is not allowed is signaled in a sequence parameter set, a picture parameter set, a sequence header, a picture header, a slice group header, a CTU row, a region, or other high level syntax.

The method may include wherein the method complies with a bitstream rule that only allows integer pixel motion vectors for bi-predictive coded blocks having a particular size.

The method may include, wherein the size of the first video block is: 4 × 6, 16 × 4, 4 × 8, 8 × 4, or 4 × 4.

The method may include wherein modifying or rounding the motion information comprises modifying different MV components differently.

The method may include wherein the y component of the first MV is modified or rounded to integer pixels and the x component of the first MV is not modified or rounded.

The method may include wherein the luma component of the first MV is rounded to integer pixels and the chroma component of the first MV is rounded to 2 pixels.

The method may include wherein the first MV is associated with a video block having a color format that is 4:2: 0.

The method may include wherein the bilateral filter is used for 4 x 4 uni-directional prediction, 4 x 8 bi-directional prediction, 8 x 4 bi-directional prediction, 4 x 16 bi-directional prediction, 16 x 4 bi-directional prediction, 8 x 8 bi-directional prediction, 8 x 4 uni-directional prediction, or 4 x 8 uni-directional prediction.

Fig. 24 is a flow chart of a method 2400 of video processing. The method 2400 includes: determining (2402) characteristics of a first block for a transition between the first block of video and a bitstream representation of the first block; determining (2404) a filter having interpolation filter parameters for interpolation of the first block based on the characteristic of the first block; and performing (2406) the conversion by using a filter having interpolation filter parameters.

In some examples, the interpolation filter parameters include filter taps and/or interpolation filter coefficients, and the interpolation includes at least one of vertical interpolation and horizontal interpolation.

In some examples, the filter comprises a short-tap filter with taps less than conventional interpolation filters.

In some examples, a conventional interpolation filter has 8 taps.

In some examples, the characteristic of the first block comprises a size parameter, wherein the size parameter comprises at least one of a width, a height, a ratio of width and height, a dimension of width x height of the first block.

In some examples, the filter used for vertical interpolation differs from the filter used for horizontal interpolation in the number of taps.

In some examples, the filter used for vertical interpolation has fewer taps than the filter used for horizontal interpolation.

In some examples, the filter used for horizontal interpolation has fewer taps than the filter used for vertical interpolation.

In some examples, a short tap filter is used for horizontal interpolation or/and vertical interpolation when the size of the first block is less than and/or equal to a threshold.

In some examples, a short tap filter is used for horizontal interpolation or/and vertical interpolation when the size of the first block is greater than and/or equal to a threshold.

In some examples, a short tap filter is used for horizontal interpolation when the width of the first block is less than and/or equal to a threshold, or for vertical interpolation when the height of the first block is less than and/or equal to a threshold.

In some examples, short tap filters are used for vertical interpolation and/or horizontal interpolation when the ratio between the width and the height is greater than a first threshold or less than a second threshold.

In some examples, the characteristic of the first block includes at least one Motion Vector (MV) associated with the first block.

In some examples, short-tap filters are used for interpolation only when both the horizontal and vertical components of the MV are fractional.

In some examples, the characteristic of the first block includes a prediction parameter indicating whether the first block is bi-directionally predicted or uni-directionally predicted.

In some examples, whether a short tap filter is used depends on the prediction parameters.

In some examples, a short tap filter is used for interpolation only when the first block is bi-predictive.

In some examples, the characteristic of the first block includes an indication of a prediction direction and/or an associated Motion Vector (MV) from list 0 or list 1.

In some examples, whether a short tap filter is used depends on the prediction direction and/or MV of the first block.

In some examples, whether to use a short tap filter is different for different prediction directions in case the first block is a bi-prediction block.

In some examples, if the MV of the prediction direction X (X is 0 or 1) has fractional components in both the horizontal and vertical directions, then a short tap filter is used for the prediction direction X; otherwise, the short tap filter is not used.

In some examples, if the N MV components have fractional precision, a short tap filter is used for M MV components of the N MV components, where N, M is an integer and 0< ═ M < ═ N.

In some examples, N and M are different for bi-directional prediction blocks and uni-directional prediction blocks.

In some examples, for a bi-prediction block, N equals 4 and M equals 4, or N equals 4 and M equals 3, or N equals 4 and M equals 2, or N equals 4 and M equals 1, or N equals 3 and M equals 3, or N equals 3 and M equals 2, or N equals 3 and M equals 1, or N equals 2 and M equals 2, or N equals 2 and M equals 1, or N equals 1 and M equals 1.

In some examples, for a uni-directional prediction block, N equals 2 and M equals 2, or N equals 2 and M equals 1, or N equals 1 and M equals 1.

In some examples, the short-tap filters include a first short-tap filter having S1 taps and a second short-tap filter having S2 taps, and wherein K MV components of the M MV components use the first short-tap filter and (M-K) MV components of the M MV components use the second short-tap filter, where K is an integer ranging from 0 to M-1 and S1 and S2 are integers.

In some examples, N and M are different for different size parameters of the block, wherein the size parameters include a width or/and a height or/and a width x height of the block.

In some examples, the characteristic of the first block includes a location of a pixel of the first block.

In some examples, whether a short tap filter is used depends on the location of the pixel.

In some examples, the short tap filter is only used for boundary pixels of the first block.

In some examples, short tap filters are used only for the N1 right column or/and the N2 left column or/and the N3 top row or/and the N4 bottom row of the first block, N1, N2, N3, N4 being integers.

In some examples, the characteristic of the first block includes a color component of the first block.

In some examples, whether to use a short tap filter is different for different color components of the first block.

In some examples, the color components include Y, Cb and Cr.

In some examples, the characteristic of the first block includes a color format of the first block.

In some examples, whether and how the short tap filter is applied depends on the color format of the first block.

In some examples, the color format includes 4:2:0, 4:2:2, or 4:4: 4.

In some examples, the filter includes different short tap filters having different taps, and the selection of the different short tap filters is based on characteristics of the block.

In some examples, a 7 tap filter is selected for horizontal and vertical interpolation of 4 × 16 or/and 16 × 4 bi-directional predicted luma blocks or/and uni-directional predicted luma blocks.

In some examples, a 7 tap filter is selected for horizontal interpolation or vertical interpolation of a 4 x 4 uni-directional predicted luma block or/and a bi-directional predicted luma block.

In some examples, a 6 tap filter is selected for horizontal and vertical interpolation of 4 × 8 or/and 8 × 4 bi-directional predicted luma blocks or/and uni-directional predicted luma blocks.

In some examples, a 6-tap filter and a 5-tap filter or a 5-tap filter and a 6-tap filter are selected for horizontal interpolation and vertical interpolation for 4 × 8 or/and 8 × 4 bi-directional predicted luma blocks or/and uni-directional predicted luma blocks, respectively.

In some examples, the filters include different short tap filters with different taps, and the different short tap filters are for different kinds of Motion Vectors (MVs).

In some examples, longer tap length filters from different short tap filters are used for MVs that have fractional components in only one of the horizontal or vertical directions, and shorter tap length filters from different short tap filters are used for MVs that have fractional components in both the horizontal and vertical directions.

In some examples, an 8 tap filter is used for a 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction block or/and a uni-directional prediction block with fractional MV components in only one of the horizontal or vertical directions, and a short tap filter is used for a 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction block or/and a uni-directional prediction block with fractional MV components in both directions.

In some examples, the filter for affine motion is different from the filter for translation motion vectors.

In some examples, the filter for affine motion has fewer taps than the filter for translating motion vectors.

In some examples, the short tap filter is not applied to subblock-based prediction, including affine prediction.

In some examples, a short tap filter is applied to subblock-based prediction including Advanced Temporal Motion Vector Prediction (ATMVP) prediction.

In some examples, each sub-block is used as a codec block to determine whether and how to apply a short tap filter.

In some examples, the characteristics of the first block include a size parameter and codec information of the first block, and whether and how to apply the short tap filter depends on the block size and codec information of the first block.

In some examples, the short tap filter is applied when a certain mode is enabled for the first block that includes at least one of OBMC and interleaved affine prediction modes.

In some examples, the conversion generates the first/second block of video from the bitstream representation.

In some examples, the conversion generates a bitstream representation from the first/second block of video.

Fig. 25 is a flow chart of a method 2500 of video processing. The method 2500 includes: extracting (2502) reference pixels of a first reference block from a reference picture for a conversion between a first block of video and a bitstream representation of the first block, wherein the first reference block is smaller than a second reference block required for motion compensation of the first block; filling (2504) the first reference block with fill pixels to generate a second reference block required for motion compensation of the first block; and performing (2506) the conversion by using the generated second reference block.

In some examples, the first block has a size W × H, the first reference block has a size (W + N-1-PW) (H + N-1-PH), and the second reference block has a size (W + N-1) × (H + N-1), where W is a width of the first block, H is a height of the first block, N is a number of interpolation filter taps for the first block, and PW and PH are integers.

In some examples, the step of padding the first reference block with padding pixels to generate the second reference block comprises: pixels at one or more boundaries of the first reference block are repeated as filler pixels to generate a second reference block.

In some examples, the boundaries are a top boundary, a left boundary, a bottom boundary, and a right boundary of the first reference block.

In some examples, W-8, H-4, N-7, PW-2, and PH-3.

In some examples, the pixels at the top, left, and right boundaries are repeated once, and the pixels at the bottom boundary are repeated twice.

In some examples, the extracted reference pixels are identified by (x + MVXInt-N/2 + offSet1, y + MVYInt-N/2 + offSet2), where (x, y) is the top left position of the first block, (MVXInt, MVYInt) is the integer portion of the Motion Vector (MV) of the first block, and offSet1 and offSet2 are integers.

In some examples, when PH is zero, only pixels at the left boundary or/and the right boundary of the first reference block are repeated.

In some examples, when PW is zero, only pixels at the top boundary or/and the bottom boundary of the first reference block are repeated.

In some examples, when PW and PH are both greater than zero, first pixels at a left boundary or/and a right boundary of the first reference block are repeated, then pixels at a top boundary or/and a bottom boundary of the first reference block are repeated, or first a top boundary or/and a bottom boundary of the first reference block are repeated, then a left boundary or/and a right boundary of the first reference block are repeated.

In some examples, pixels at the left boundary of the first reference block are repeated M1 times and pixels at the right boundary of the first reference block are repeated (PW-M1) times, where M1 is an integer and M1> -0.

In some examples, M1 left columns of pixels of the first reference block or (PW-M1) right columns of pixels of the first reference block are repeated, where M1>1 or PW-M1> 1.

In some examples, the pixels at the top boundary of the first reference block are repeated M2 times and the pixels at the bottom boundary of the first reference block are repeated (PH-M2) times, where M2 is an integer and M2> -0.

In some examples, the M2 top rows of pixels of the first reference block or the (PH-M2) bottom rows of pixels of the first reference block are repeated, where M2>1 or PW-M2> 1.

In some examples, when both the horizontal and vertical components of the MV of the first block are fractional, pixels at one or more boundaries of the first reference block are repeated as filler pixels to generate the second reference block.

In some examples, when the MV in the prediction direction X (X is 0 or 1) has fractional components in both the horizontal and vertical directions, pixels at one or more boundaries of the first reference block are repeated as filler pixels to generate the second reference block.

In some examples, the first reference block is any one of part or all of the reference blocks of the first block.

In some examples, if the MV of the prediction direction X (X is 0 or 1) has fractional components in both the horizontal and vertical directions, pixels at one or more boundaries of the first reference block are repeated as filler pixels to generate a second reference block of the prediction direction X; otherwise, the pixel is not repeated.

In some examples, if the N2 MV components have fractional precision, pixels at one or more boundaries of the first reference block are repeated as filler pixels to generate a second reference block of M MV components of the N2 MV components, where N2, M are integers and 0< ═ M < ═ N2.

In some examples, N2 and M are different for bi-directional and uni-directional prediction blocks.

In some examples, N2 and M are different for different block sizes, the block size being associated with a width or/and height or/and width x height of the block.

In some examples, for bi-prediction blocks, N2 equals 4 and M equals 4, or N2 equals 4 and M equals 3, or N2 equals 4 and M equals 2, or N2 equals 4 and M equals 1, or N2 equals 3 and M equals 3, or N2 equals 3 and M equals 2, or N2 equals 3 and M equals 1, or N2 equals 2 and M equals 2, or N2 equals 2 and M equals 1, or N2 equals 1 and M equals 1.

In some examples, for a uni-directional prediction block, N2 is equal to 2 and M is equal to 2, or N2 is equal to 2 and M is equal to 1, or N2 is equal to 1 and M is equal to 1.

In some examples, pixels at different boundaries of the first reference block are repeated as filler pixels in different ways to generate a second reference block of M MV components.

In some examples, when pixel padding is not used for horizontal MV components, PW is set equal to zero when the first reference block is extracted using MVs.

In some examples, when pixel padding is not used for the vertical MV component, PH is set equal to zero when the first reference block is extracted using MV.

In some examples, PW and/or PH is different for different color components of the first block.

In some examples, the color components include Y, Cb and Cr.

In some examples, the PW and/or PH is different for different block sizes or shapes.

In some examples, PW and PH are set equal to 1 for 4 × 16 or/and 16 × 4 bi-prediction blocks or/and uni-prediction blocks.

In some examples, PW and PH are set equal to 0 and 1, or 1 and 0, respectively, for a 4 × 4 bi-prediction block or/and a uni-prediction block.

In some examples, PW and PH are set equal to 2 for 4 × 8 or/and 8 × 4 bi-prediction blocks or/and uni-prediction blocks.

In some examples, PW and PH are set equal to 2 and 3, or 3 and 2, respectively, for a 4 × 8 or/and 8 × 4 bi-prediction block or/and uni-prediction block.

In some examples, PW and PH are different for unidirectional prediction and bi-directional prediction.

In some examples, PW and PH are different for different kinds of motion vectors.

In some examples, PW and PH are set to a smaller value or equal to zero for Motion Vectors (MVs) that have fractional components in only one of the horizontal or vertical directions, and PW and PH are set to a larger value for MVs that have fractional components in both the horizontal and vertical directions.

In some examples, PW and PH are set equal to 0 for 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction blocks or/and uni-prediction blocks that have fractional MV components in only one of the horizontal or vertical directions.

In some examples, PW and PH are used for 4 × 16 or/and 16 × 4 or/and 4 × 8 or/and 8 × 4 or/and 4 × 4 bi-prediction blocks or/and uni-prediction blocks with fractional MV components in both horizontal and vertical directions.

In some examples, whether and how the pixels at the boundary are repeated depends on the color format of the first block.

In some examples, the color format includes 4:2:0, 4:2:2, or 4:4: 4.

In some examples, the step of padding the first reference block with padding pixels to generate the second reference block comprises: the default value is filled as a filled pixel to generate a second reference block.

In some examples, the conversion generates a first block of the video from the bitstream representation.

Fig. 26 is a flow chart of a method 2600 of video processing. The method 2600 comprises: determining (2602) characteristics of a first block for a transition between the first block of the video and a bitstream representation of the first block; performing (2604) a rounding process on a Motion Vector (MV) of the first block based on the characteristic of the first block; and performing (2606) the conversion by using the rounded MV.

In some examples, performing the rounding process on the MV includes rounding the MV to integer-pixel precision or half-pixel precision.

In some examples, the MV is rounded to the nearest integer-pixel precision MV or half-pixel precision MV.

In some examples, performing the rounding process on the MV includes rounding up, rounding down, rounding to zero, or far zero rounding of the MV.

In some examples, when the size of the first block is less than and/or equal to the threshold L, a rounding process is performed on the horizontal or/and vertical components of the MV.

In some examples, when the size of the first block is greater than and/or equal to the threshold L, a rounding process is performed on the horizontal or/and vertical components of the MV.

In some examples, the rounding process is performed on the horizontal component of the MV when the width of the first block is less than and/or equal to the second threshold L1, or the rounding process is performed on the vertical component of the MV when the height of the first block is less than and/or equal to the second threshold L1.

In some examples, the thresholds L and L1 are different for bi-directional and uni-directional prediction blocks.

In some examples, the rounding process is performed on the MV when the ratio between the width and the height is greater than a third threshold L3 or less than a fourth threshold L4.

In some examples, the rounding process is performed on the MV when both the horizontal and vertical components of the MV are fractional.

In some examples, whether to perform the rounding process on the MV depends on the prediction parameters.

In some examples, the rounding process is performed on the MV only when the first block is bi-directionally predicted.

In some examples, the characteristic of the first block includes an MV indicating a prediction direction and/or association from list 0 or list 1.

In some examples, whether to perform the rounding process on the MV depends on the prediction direction of the first block and/or the MV.

In some examples, whether to perform the rounding process on the MVs is different for different prediction directions in case the first block is a bi-prediction block.

In some examples, if the MV in the prediction direction X (X is 0 or 1) has fractional components in both the horizontal and vertical directions, a rounding process is performed on N MV components for the prediction direction X, N being an integer in the range from 0 to 2; otherwise, the rounding process is not executed.

In some examples, if N1 MV components have fractional precision, a rounding process is performed on M MV components of N1 MV components, where N1, M are integers, and 0< ═ M < ═ N1.

In some examples, N1 and M are different for bi-directional and uni-directional prediction blocks.

In some examples, for bi-prediction blocks,

n1 equals 4 and M equals 4, or

N1 equals 4 and M equals 3, or

N1 equal to 4 and M equal to 2, or

N1 equals 4 and M equals 1, or

N1 equals 3 and M equals 3, or

N1 equal to 3 and M equal to 2, or

N1 equals 3 and M equals 1, or

N1 equals 2 and M equals 2, or

N1 equals 2 and M equals 1, or

N1 equals 1 and M equals 1.

In some examples, for a uni-directional prediction block,

n1 equals 2 and M equals 2, or

N1 equals 2 and M equals 1, or

N1 equals 1 and M equals 1.

In some examples, N1 and M are different for different size parameters, wherein the size parameters include at least one of a width, a height, a ratio of width to height, a dimension of width x height of the first block.

In some examples, K MV components of the M MV components are rounded to integer-pixel precision and M-K MV components are rounded to half-pixel precision, where K is an integer ranging from 0 to M-1.

In some examples, whether to perform the rounding process on the MV is different for different color components of the first block.

In some examples, the color components include Y, Cb and Cr.

In some examples, whether to perform the rounding process on the MV depends on the color format of the first block.

In some examples, the color format includes 4:2:0, 4:2:2, or 4:4: 4.

In some examples, whether and/or how the rounding process is performed on the MVs depends on the characteristics of the block.

In some examples, one or more MV components of a 4 × 16 or/and 16 × 4 bi-directionally predicted luma block or/and a uni-directionally predicted luma block are rounded to half-pixel precision.

In some examples, one or more MV components of a 4 × 16 or/and 16 × 4 bi-directionally predicted luma block or/and a uni-directionally predicted luma block are rounded to integer-pixel precision.

In some examples, one or more MV components of a 4 x 4 uni-directionally predicted luma block or/and a bi-directionally predicted luma block are rounded to integer-pixel precision.

In some examples, one or more MV components of a 4 × 8 or/and 8 × 4 bi-directionally predicted luma block or/and a uni-directionally predicted luma block are rounded to integer-pixel precision.

In some examples, the characteristic of the first block includes whether the first block is coded using a sub-block based prediction method that includes an affine prediction mode and a sub-block based temporal motion vector prediction (SbTMVP) mode.

In some examples, if the first block is coded in affine prediction mode, the rounding process for the MV is not applied.

In some examples, if the first block is codec in SbTMVP mode, a rounding process for MVs is applied and a rounding process is performed on each sub-block of the first block.

In some examples, performing a rounding process on a Motion Vector (MV) of a first block based on a characteristic of the first block includes: determining whether at least one MV of the first block is a fractional precision when the size parameter of the first block satisfies a predetermined rule; and in response to determining the at least one MV of the first block to be a fractional precision, performing a rounding process on the at least one MV to generate a rounded MV with integer precision.

In some examples, the bitstream representation of the first block follows a rule depending on a size parameter of the first block, wherein only integer-pixel MVs are allowed for the bi-predictive codec block.

In some examples, the size parameter of the first block is 4 × 16, 16 × 4, 4 × 8, 8 × 4, or 4 × 4.

In some examples, performing the conversion by using the rounded MV includes: motion compensation is performed on the first block by using the rounded MV.

Fig. 27 is a flow chart of a method 2700 of video processing. The method 2700 includes: determining (2702) characteristics of a first block for a transition between the first block of video and a bitstream representation of the first block; performing (2704) motion compensation on the first block using the MVs having the first precision; and storing (2706) for the first block the MVs with the second precision; wherein the first precision is different from the second precision.

In some examples, the first precision is an integer precision and the second precision is a fractional precision.

Fig. 28 is a flow diagram of a method 2800 of video processing. The method 2800 includes: determining (2802) a codec mode for a first block of video for a transition between the first block and a bitstream representation of the first block; performing (2804) a rounding procedure on a Motion Vector (MV) of the first block if the codec mode of the first block satisfies a predetermined rule; and performing (2806) motion compensation of the first block by using the rounded MV.

In some examples, the predetermined rules include: the first block is coded in a Merge mode, a non-intra mode, or a non-Advanced Motion Vector Prediction (AMVP) mode.

Fig. 29 is a flow diagram of a method 2900 of video processing. The method 2900 includes: generating (2902) a first Motion Vector (MV) candidate list for a first block for a transition between the first block and a bitstream representation of the first block; performing (2904) a rounding procedure on the MVs of the at least one candidate before adding the at least one candidate to the first MV candidate list; and performing (2906) the conversion by using the first MV candidate list.

In some examples, the first block is coded in a Merge mode, a non-intra mode, or a non-Advanced Motion Vector Prediction (AMVP) mode, and the MV candidate list includes a Merge candidate list and a non-Merge candidate list.

In some examples, candidates with a fractional MV are excluded from the first MV candidate list.

In some examples, the at least one candidate includes: candidates derived from spatial domain blocks, candidates derived from temporal domain blocks, candidates derived from a Historical Motion Vector Prediction (HMVP) table, or pairwise bidirectional prediction Merge candidates.

In some examples, the method further comprises: a separate HMVP table is provided to store candidates for MVs with integer precision.

In some examples, the method further comprises: based on the characteristics of the first block, a rounding process is performed on the MVs or on the MV candidates in the candidate list.

In some examples, the size parameter includes at least one of 4 × 16, 16 × 4, 4 × 8, 8 × 4, 4 × 4.

In some examples, the characteristic of the first block includes a prediction parameter indicating whether the first block is bi-directional predicted or uni-directional predicted, and performing the rounding process on the MV includes: the rounding process is performed on the MVs or on the candidate MVs in the candidate list only if the candidate is a bi-directional prediction candidate.

In some examples, the first block is coded in AMVP mode and the candidate is an AMVP candidate.

In some examples, the first block is a non-affine pattern.

Fig. 30 is a flow chart of a method 3000 of video processing. The method 3000 includes: determining (3002) characteristics of a first block for a transition between the first block of video and a bitstream representation of the first block; determining (3004) a constraint parameter to be applied to the first block based on a characteristic of the first block, wherein the constraint parameter constrains a maximum number of fractional Motion Vector (MV) components of the first block; and performing (3006) the conversion by using the constraint parameter.

In some examples, the MV components include at least one of horizontal MV components and/or vertical MV components, and the fractional MV components include at least one of half-pixel MV components, quarter-pixel MV components, MV components having finer precision than quarter-pixels.

In some examples, the constraint parameters are different for bi-directional prediction and unidirectional prediction.

In some examples, the constraint parameters are not applied in unidirectional prediction.

In some examples, the constraint parameter is applied when the first block is a bi-directionally predicted 4 x 8, 8 x 4, 4 x 16, or 16 x 4 block.

In some examples, the constraint parameter is not applied when the first block is a uni-directionally predicted 4 x 8, 8 x 4, 4 x 16, or 16 x 4 block.

In some examples, the constraint parameter is applied when the first block is a uni-directionally predicted 4 x 4 block or a bi-directionally predicted 4 x 4 block.

In some examples, for bi-predicted blocks, the maximum number of fractional MV components is 3, 2, 1, or 0.

In some examples, the maximum number of fractional MV components is 1 or 0 for a uni-directional prediction block.

In some examples, for bi-predicted blocks, the maximum number of quarter-pixel MV components is 3, 2, 1, or 0.

In some examples, the maximum number of quarter-pixel MV components is 1 or 0 for a uni-directional prediction block.

In some examples, the characteristic of the first block includes at least one of a shape and a size parameter, wherein the size parameter includes at least one of a width, a height, a ratio of the width and the height, a dimension of the width by the height, and a shape of the first block.

In some examples, the constraint parameter is different for different sizes or shapes of the first block.

In some examples, the characteristic of the first block includes a mode parameter indicating a codec mode of the first block.

In some examples, the codec mode includes a triangle mode in which it is currently divided into two partitions, where each partition has at least one MV.

In some examples, the constraint parameter is applied when the first block is a 4 x 16 or 16 x 4 block that is coded in triangle mode.

Fig. 31 is a flow diagram of a method 3100 of video processing. The method 3100 comprises: obtaining (3102) an indication that no signaling of at least one of bi-directional prediction and uni-directional prediction is allowed when a characteristic of the block satisfies a predetermined rule; determining (3104) characteristics of a first block for a transition between the first block of video and a bitstream representation of the first block; and performing (3106) the conversion by using the indication when the characteristic of the first block satisfies a predetermined rule.

Fig. 32 is a flow diagram of a method 3200 of video processing. The method 3200 includes: signaling (3202) an indication that at least one of bi-directional prediction and uni-directional prediction is not allowed when a characteristic of the block satisfies a predetermined rule; determining (3204) characteristics of a first block for a transition between the first block of video and a bitstream representation of the first block; the conversion is performed (3206) based on characteristics of the first block, wherein during the conversion, at least one of bi-directional prediction and uni-directional prediction is disabled when the characteristics of the first block satisfy a predetermined rule.

In some examples, the indication is signaled in a sequence parameter set/picture parameter set/sequence header/picture header/slice header/group of slices header/Codec Tree Unit (CTU) row/region/other high level syntax.

In some examples, the characteristic of the first block includes a size parameter, wherein the size parameter includes at least one of a width, a height, a ratio of the width and the height, a dimension of the width by the height, and a shape of the first block.

In some examples, the predetermined rules include: the first block has some block size.

In some examples, the predetermined rules include: the first block is coded in a non-affine mode.

In some examples, when at least one of uni-directional prediction and bi-directional prediction is not allowed for the first block, signaling of an Advanced Motion Vector Resolution (AMVR) parameter of the first block is modified accordingly.

In some examples, signaling of an Advanced Motion Vector Resolution (AMVR) parameter is modified such that only integer pixel precision is allowed for the first block.

In some examples, signaling of the Advanced Motion Vector Resolution (AMVR) parameter is modified such that different Motion Vector (MV) accuracies are utilized.

In some examples, the block size of the first block is at least one of 4 × 16, 16 × 4, 4 × 8, 8 × 4, 4 × 4.

Fig. 33 is a flow diagram of a method 3300 of video processing. Method 3300 includes: determining (3302) for a conversion between a first block of video and a bitstream representation of the first block whether fractional Motion Vector (MV) or Motion Vector Difference (MVD) precision is allowed for the first block; signaling (3304) an Advanced Motion Vector Resolution (AMVR) parameter for the first block based on the determination; and performing (3306) the conversion by using the AMVR parameters.

Fig. 34 is a flow diagram of a method 3400 of video processing. The method 3400 comprises: determining (3402) for a conversion between a first block of video and a bitstream representation of the first block whether fractional Motion Vector (MV) or Motion Vector Difference (MVD) precision is allowed for the first block; based on the determination, obtaining (3404) an Advanced Motion Vector Resolution (AMVR) parameter for the first block; and performing (3406) the conversion by using the AMVR parameters.

In some examples, if fractional MV or MVD precision is not allowed for the first block, the AMVR parameter indicating whether the MV/MVD precision of the current block is fractional precision is skipped and implicitly derived as false.

5. Examples of the embodiments

In the following embodiments, PW and PH are designed for 4 × 16, 16 × 4, 4 × 4, 8 × 4, and 4 × 8 blocks.

Assume that the MV of a block in reference list X is MVX, and the horizontal and vertical components of MVX are MVX [0] and MVX [1], respectively, and the integer parts of MVX [0] and MVX [1] are MVXInt [0] and MVXInt [1], respectively, where X is 0 or 1. Assume that the interpolation filter tap (in motion compensation) is N (e.g., 8, 6, 4, or 2), and the current block size is W × H, and the position of the current block (i.e., the position of the upper-left pixel) is (x, y). The row and column indices start at 1, e.g., the H rows include row 1, …, (H-1).

The following boundary pixel repetition process is performed only when MVX [0] and MVX [1] are both fractional.

5.1 example

For both 4 × 16 and 16 × 4 uni-directional and bi-directional prediction blocks, PW and PH are both set equal to 1 for the prediction direction X. First, (W + N-2) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + x-N/2 +1, MVXINt [1] + y-N/2 + 1). Then, the (W + N-1) th column is generated by copying the (W + N-2) th column. Finally, the (H + N-1) th row is generated by copying the (H + N-2) th row.

For a 4 × 4 uni-directional prediction block, PW and PH are set equal to 0 and 1, respectively. First, (W + N-1) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + x-N/2 +1, MVXINt [1] + y-N/2 + 1). Then, the (H + N-1) th line is generated by copying the (H + N-2) th line.

For the 4 × 8 and 8 × 4 unidirectional prediction blocks and bi-directional prediction blocks, PW and PH are set equal to 2 and 3, respectively. First, (W + N-3) ((H + N-4)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + x-N/2 +2, MVXINt [1] + y-N/2 + 2). Then, the 1 st column is copied to the left side thereof to obtain W + N-2 columns, and thereafter, the (W + N-1) th column is generated by copying the (W + N-2) th column. Finally, the 1 st line is copied to the upper side thereof to obtain H + N-3 lines, and thereafter, the (H + N-2) th line and the (H + N-1) th line are generated by copying the (H + N-3) th line.

5.2 examples

For both 4 × 16 and 16 × 4 uni-directional and bi-directional prediction blocks, PW and PH are both set equal to 1 for the prediction direction X. First, (W + N-2) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + x-N/2 +2, MVXINt [1] + y-N/2 + 2). Then, column 1 is copied to its left to obtain W + N-1 columns. Finally, line 1 is copied to its upper side to obtain H + N-1 lines.

For a 4 × 4 uni-directional prediction block, PW and PH are set equal to 0 and 1, respectively. First, (W + N-1) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + x-N/2 +1, MVXINt [1] + y-N/2 + 2). Then, line 1 is copied to its upper side to obtain H + N-1 lines.

It should be appreciated that the disclosed techniques may be embodied in a video encoder or decoder to improve compression efficiency when the shape of the compressed codec unit is significantly different from a conventional square block or a half-square rectangular block. For example, new codec tools using long or tall codec units, such as units of 4 x 32 or 32 x 4 size, may benefit from the disclosed techniques.

The disclosed and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only a few embodiments and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A method of video processing, comprising:

determining a characteristic of a first block of video for a transition between the first block and a bitstream representation of the first block;

performing a rounding process on a Motion Vector (MV) of the first block based on a characteristic of the first block; and

the conversion is performed by using the rounded MV.

2. The method of claim 1, wherein performing a rounding process on the MV comprises rounding the MV to integer-pixel precision or half-pixel precision.

3. The method of claim 2, wherein the MV is rounded to the nearest integer-pel precision MV or half-pel precision MV.

4. The method of claim 1 or 2, wherein performing a rounding process on the MV comprises rounding up, rounding down, rounding to zero, or far zero rounding of the MV.

5. The method according to any of claims 1-4, wherein the characteristic of the first block comprises a size parameter, wherein the size parameter comprises at least one of a width, a height, a ratio of width and height, a dimension of width by height of the first block.

6. The method of claim 5, wherein a rounding process is performed on horizontal or/and vertical components of the MV when the size of the first block is less than and/or equal to a threshold L.

7. The method of claim 5, wherein a rounding process is performed on horizontal or/and vertical components of the MV when the size of the first block is greater than and/or equal to a threshold L.

8. The method of claim 5, wherein a rounding process is performed on the horizontal component of the MV when the width of the first block is less than and/or equal to a second threshold L1, or

When the height of the first block is less than and/or equal to a second threshold L1, a rounding process is performed on the vertical component of the MV.

9. The method of any of claims 6-8, wherein the thresholds L and L1 are different for bi-directional and uni-directional prediction blocks.

10. The method of claim 5, wherein a rounding process is performed on the MV when a ratio between width and height is greater than a third threshold L3 or less than a fourth threshold L4.

11. The method of any of claims 1-10, wherein a rounding process is performed on the MV when both a horizontal component and a vertical component of the MV are fractional.

12. The method of any of claims 1-4, wherein the characteristic of the first block comprises a prediction parameter indicating whether the first block is bi-directional predicted or uni-directional predicted.

13. The method of claim 12, wherein whether to perform a rounding process on the MV depends on the prediction parameters.

14. The method of claim 13, wherein a rounding process is performed on the MV only when the first block is bi-directionally predicted.

15. The method according to any of claims 1-4, wherein the characteristic of the first block comprises an MV indicating a prediction direction and/or an associated MV from either List 0 or List 1.

16. The method of claim 15, wherein whether to perform a rounding process on the MV depends on a prediction direction of the first block and/or the MV.

17. The method of claim 16, wherein, in case the first block is a bi-prediction block, whether to perform a rounding process on the MVs is different for different prediction directions.

18. The method of claim 17, wherein if the MV of the prediction direction X has fractional components in both horizontal and vertical directions, X being 0 or 1, a rounding process is performed on N MV components for the prediction direction X, N being an integer in the range from 0 to 2;

otherwise, the rounding process is not executed.

19. The method of claim 16 wherein if N1 MV components have fractional precision, then performing a rounding process on M MV components of N1 MV components, where N1, M are integers and 0< ═ M < ═ N1.

20. The method of claim 19, wherein N1 and M are different for bi-directional and uni-directional prediction blocks.

21. The method of claim 20, wherein, for a bi-prediction block,

n1 equals 4 and M equals 4, or

N1 equals 4 and M equals 3, or

N1 equal to 4 and M equal to 2, or

N1 equals 4 and M equals 1, or

N1 equals 3 and M equals 3, or

N1 equal to 3 and M equal to 2, or

N1 equals 3 and M equals 1, or

N1 equals 2 and M equals 2, or

N1 equals 2 and M equals 1, or

N1 equals 1 and M equals 1.

22. The method of claim 20, wherein, for a uni-directional prediction block,

n1 equals 2 and M equals 2, or

N1 equals 2 and M equals 1, or

N1 equals 1 and M equals 1.

23. The method of claim 19, wherein N1 and M are different for different size parameters, wherein the size parameters include at least one of a width, a height, a ratio of width to height, and dimensions of width by height of the first block.

24. The method of claim 19, wherein K MV components of the M MV components are rounded to integer pixel precision and M-K MV components are rounded to half pixel precision, where K is an integer ranging from 0 to M-1.

25. The method of any of claims 1-4, wherein the characteristic of the first block comprises a color component of the first block.

26. The method of claim 25, wherein whether to perform a rounding process on the MV is different for different color components of the first block.

27. The method of claim 26, wherein the color components comprise Y, Cb and Cr.

28. The method of any of claims 1-4, wherein the characteristic of the first block comprises a color format of the first block.

29. The method of claim 28 wherein whether to perform a rounding process on the MV depends on the color format of the first block.

30. The method of claim 29, wherein the color format comprises 4:2:0, 4:2:2, or 4:4: 4.

31. The method according to any of claims 1-30, wherein whether and/or how a rounding process is performed on the MV depends on the characteristics of the block.

32. The method of claim 31, wherein one or more MV components of the 4 x 16 or/and 16 x 4 bi-directionally predicted luma block or/and uni-directionally predicted luma block are rounded to half-pixel precision.

33. The method of claim 31, wherein one or more MV components of the 4 x 16 or/and 16 x 4 bi-directionally predicted luma block or/and uni-directionally predicted luma block are rounded to integer-pel precision.

34. The method of claim 31, wherein one or more MV components of the 4 x 4 uni-directionally predicted luma block or/and the bi-directionally predicted luma block are rounded to integer-pel precision.

35. The method of claim 31, wherein one or more MV components of a 4 x 8 or/and 8 x 4 bi-directionally predicted luma block or/and a uni-directionally predicted luma block are rounded to integer-pel precision.

36. The method of any of claims 1-4, wherein the characteristic of the first block comprises whether the first block is coded with a sub-block based prediction method comprising an affine prediction mode and a sub-block based temporal motion vector prediction (SbTMVP) mode.

37. The method of claim 36, wherein if the first block is coded in affine prediction mode, then no rounding of the MV is applied.

38. The method of claim 36, wherein if the first block is coded in SbTMVP mode, a rounding process is applied for the MV and performed for each sub-block of the first block.

39. The method of claim 5, wherein performing a rounding process on a Motion Vector (MV) of the first block based on the characteristic of the first block comprises:

determining whether at least one MV of the first block is a fractional precision when a size parameter of the first block satisfies a predetermined rule; and

in response to determining that at least one MV of the first block is of fractional precision, a rounding process is performed on the at least one MV to generate a rounded MV with integer precision.

40. The method of claim 5, wherein the bitstream representation of the first block follows a rule depending on a size parameter of the first block, wherein only integer-pixel MVs are allowed for bi-directionally predicted codec blocks.

41. The method of claim 5 or 40, wherein the size parameter of the first block is 4 x 16, 16 x 4, 4 x 8, 8 x 4 or 4 x 4.

42. The method of claims 1-41, wherein performing the conversion using the rounded MVs comprises:

performing motion compensation on the first block by using the rounded MVs.

43. A method of video processing, comprising:

performing motion compensation on the first block using MVs having a first precision; and

storing the MVs with the second precision for the first block;

wherein the first precision is different from the second precision.

44. The method of any one of claims 43, wherein the characteristic of the first block comprises a size parameter, wherein the size parameter comprises at least one of a width, a height, a ratio of width and height, a dimension of width by height of the first block.

45. The method of any one of claims 43, wherein the first precision is an integer precision and the second precision is a fractional precision.

46. A method of video processing, comprising:

determining a codec mode for a first block of video for a transition between the first block and a bitstream representation of the first block;

performing a rounding process on a Motion Vector (MV) of the first block if a codec mode of the first block satisfies a predetermined rule; and

performing motion compensation of the first block by using the rounded MVs.

47. The method of claim 46, wherein the predetermined rule comprises: the first block is coded in a Merge mode, a non-intra mode, or a non-Advanced Motion Vector Prediction (AMVP) mode.

48. A method of video processing, comprising:

generating a first Motion Vector (MV) candidate list for a first block of video for conversion between the first block and a bitstream representation of the first block;

performing a rounding process on MVs of at least one candidate prior to adding the at least one candidate to the first MV candidate list; and

performing the conversion by using the first MV candidate list.

49. The method of claim 48, wherein the first block is coded in Merge mode, non-intra mode, or non-Advanced Motion Vector Prediction (AMVP) mode, and the MV candidate lists comprise a Merge candidate list and a non-Merge candidate list.

50. The method of claim 48 or 49, wherein candidates with a fractional MV are excluded from the first MV candidate list.

51. The method of claim 50, wherein the at least one candidate comprises: candidates derived from spatial domain blocks, candidates derived from temporal domain blocks, candidates derived from a Historical Motion Vector Prediction (HMVP) table, or pairwise bidirectional prediction Merge candidates.

52. The method of claim 46 or 48, further comprising:

a separate HMVP table is provided to store candidates for MVs with integer precision.

53. The method of any one of claims 46 to 49, further comprising:

performing a rounding process on the MVs or on candidate MVs in the candidate list based on a characteristic of the first block.

54. The method of claim 53, wherein the characteristic of the first block comprises a size parameter, wherein the size parameter comprises at least one of a width, a height, a ratio of width to height, a dimension of width by height of the first block.

55. The method of claim 54, wherein the size parameter comprises at least one of 4 x 16, 16 x 4, 4 x 8, 8 x 4, 4 x 4.

56. The method of claim 53, wherein the characteristic of the first block comprises a prediction parameter indicating whether the first block is bi-directionally predicted or uni-directionally predicted, and

the rounding process performed on the MV comprises: performing a rounding process on the MVs or on candidate MVs in the candidate list only if the candidate is a bi-directional prediction candidate.

57. The method of claim 46 or 48, wherein the first block is coded in AMVP mode and the candidate is an AMVP candidate.

58. The method of claim 46 or 48, wherein the first block is a non-affine pattern.

59. The method of any of claims 1-58, wherein the converting generates a first block of video from the bitstream representation.

60. The method of any of claims 1 to 58, wherein the converting generates the bitstream representation from a first block of video.

61. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-60.

62. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any of claims 1-60.