CN110719475B

CN110719475B - Shape dependent interpolation order

Info

Publication number: CN110719475B
Application number: CN201910637388.9A
Authority: CN
Inventors: 刘鸿彬; 张莉; 张凯; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2018-07-13
Filing date: 2019-07-15
Publication date: 2022-12-09
Anticipated expiration: 2039-07-15
Also published as: WO2020012449A1; TWI722486B; CN110719466A; WO2020012448A3; TWI704799B; WO2020012448A2; CN110719466B; TW202013960A; TW202023276A; CN110719475A

Abstract

The application provides a video bit stream processing method, a video decoding device and a video encoding device, wherein the method comprises the following steps: determining a shape of a video block; determining an interpolation order based on the shape of the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and performing horizontal and vertical interpolation on the video blocks in a sequence indicated by the interpolation order to reconstruct a decoded representation of the video blocks.

Description

Shape dependent interpolation order

Cross Reference to Related Applications

This application claims timely priority and benefit from international patent application No. PCT/CN2018/095576, filed on 13.7.2018, in accordance with the applicable provisions of the patent laws and/or paris convention. The entire disclosure of international patent application No. PCT/CN2018/095576 is incorporated herein by reference as part of the disclosure of the present application.

Technical Field

This document relates to video encoding techniques, devices and systems.

Background

Despite advances in video compression, digital video still uses the largest amount of bandwidth on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements to pre-count digital video usage will continue to grow.

Disclosure of Invention

The disclosed techniques may be used by a video decoder or encoder embodiment, where a block-wise interpolation order technique is used to improve interpolation.

In one example aspect, a video bitstream processing method is disclosed. The method comprises the following steps: determining a shape of a video block; determining an interpolation order based on the shape of the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and performing horizontal and vertical interpolation on the video blocks in the sequence indicated by the interpolation order to reconstruct a decoded representation of the video blocks.

In another exemplary aspect, a video bitstream processing method includes: determining a feature of a motion vector associated with a video block; determining an interpolation order indicating a sequence of performing horizontal interpolation and vertical interpolation based on the feature of the motion vector; and performing horizontal and vertical interpolation on the video blocks in the sequence indicated by the interpolation order to reconstruct a decoded representation of the video blocks.

In another example aspect, a video bitstream processing method is disclosed. The method comprises the following steps: determining a shape of a video block; determining an interpolation order based on the shape of the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and performing horizontal and vertical interpolation on the video blocks in the sequence indicated by the interpolation order to construct an encoded representation of the video blocks.

In another example aspect, a video bitstream processing method is disclosed. The method comprises the following steps: determining a feature of a motion vector associated with a video block; determining an interpolation order indicating a sequence of performing horizontal interpolation and vertical interpolation based on the feature of the motion vector; and performing horizontal and vertical interpolation on the video blocks in the sequence indicated by the interpolation order to construct an encoded representation of the video blocks.

In one example aspect, a video processing method is disclosed. The method comprises the following steps: determining a first prediction mode to apply to the first video block; determining a second prediction mode to apply to the second video block by performing a first conversion between the first video block and the encoded representation of the first video block by applying horizontal interpolation and/or vertical interpolation to the first video block; a second conversion is performed between the second video block and the encoded representation of the second video block by applying horizontal interpolation and/or vertical interpolation to the second video block, wherein one or both of the horizontal interpolation and the vertical interpolation of the first video block uses a shorter tap filter than the filter used for the second video block based on a determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode.

In another example aspect, a video decoding apparatus implementing the video processing method described herein is disclosed.

In yet another example aspect, a video encoding apparatus implementing the video processing method described herein is disclosed.

In yet another exemplary aspect, the various techniques described herein may be implemented as a computer program product stored on a non-transitory computer-readable medium. The computer program product comprises program code for performing the methods described herein.

In yet another example aspect, an apparatus in a video system is disclosed. The apparatus includes a processor and a non-transitory memory having instructions thereon, wherein execution of the instructions by the processor causes the processor to implement the method described above.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a diagram of a binary Quadtree (QTBT) structure.

Figure 2 shows an example derivation process for the Merge candidate list construction.

Fig. 3 shows example positions of spatial Merge candidates.

Fig. 4 shows an example of candidate pairs considered for redundancy checking of spatial Merge candidates.

Fig. 5 shows an example of the location of a second Prediction Unit (PU) of Nx2N and 2NxN partitions.

Figure 6 is a graphical representation of the motion vector scaling of the temporal Merge candidate.

FIG. 7 shows example candidate locations for time domain Merge candidates C0 and C1.

Fig. 8 shows an example of combined bidirectional predictive Merge candidates.

Fig. 9 shows an example of a derivation process of a motion vector prediction candidate.

Fig. 10 is a diagram of motion vector scaling of spatial motion vector candidates.

Fig. 11 illustrates an example of Advanced Temporal Motion Vector Prediction (ATMVP) motion prediction of a Coding Unit (CU).

Fig. 12 shows an example of one CU with four sub-blocks (a-D) and their neighboring blocks (a-D).

Fig. 13 shows a non-adjacent Merge candidate proposed in J0021.

Fig. 14 shows the non-adjacent Merge candidate proposed in J0058.

Fig. 15 shows a non-adjacent Merge candidate proposed in J0059.

Fig. 16 shows an example of integer sample and fractional sample positions for quarter sample luminance interpolation.

Fig. 17 is a block diagram of an example of a video processing apparatus.

Fig. 18 shows a block diagram of an example implementation of a video encoder.

Fig. 19 is a flowchart of an example of a video bitstream processing method.

Fig. 20 is a flowchart of an example of a video bitstream processing method.

Fig. 21 is a flowchart of an example of a video processing method.

Fig. 22 is a flowchart of an example of a video bitstream processing method.

Fig. 23 is a flowchart of an example of a video bitstream processing method.

Detailed Description

This document provides various techniques that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. Moreover, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

Section headings are used in this document to facilitate understanding, and do not limit the embodiments and techniques to the corresponding sections. As such, embodiments from one section may be combined with embodiments from other sections.

1. Summary of the invention

The present invention relates to video coding techniques. And in particular to interpolation in video coding. It can be applied to existing video coding standards, such as HEVC, or to the standard to be finalized (multi-functional video coding). But may also be applicable to future video coding standards or video codecs.

2. Background of the invention

Video coding standards have been developed primarily through the development of the well-known ITU-T and ISO/IEC standards. ITU-T developed H.261 and H.263, ISO/IEC developed MPEG-1 and MPEG-4 vision, and both organizations jointly developed the H.262/MPEG-2 video, H.264/MPEG-4 Advanced Video Coding (AVC), and H.265/HEVC standards. Since h.262, the video coding standard was based on a hybrid video coding structure, in which temporal prediction plus transform coding was employed. In order to explore future video coding techniques beyond HEVC, the joint video exploration team (jfet) was jointly established by VCEG and MPEG in 2015. Since then, JVT adopted many new approaches and introduced it into a reference software named Joint Exploration Model (JEM). In month 4 of 2018, a joint video experts group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) was created to study the VVC standard with the goal of reducing the bit rate by 50% compared to HEVC.

Fig. 18 is a block diagram of an example implementation of a video encoder.

2.1 quad Tree plus binary Tree (QTBT) Block Structure with larger CTU

In HEVC, various local characteristics are accommodated by dividing the CTUs into CUs using a quadtree structure (denoted as coding tree). It is decided at the CU level whether to encode a picture region using inter (temporal) prediction or intra (spatial) prediction. Each CU may be further divided into one, two, or four PUs depending on the partition type of the PU. In one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After a residual block is obtained by applying a prediction process based on a PU partition type, a CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to a coding tree of the CU. An important feature of the HEVC structure is that it has multiple partitioning concepts, including CU, PU, and TU.

The QTBT structure eliminates the concept of multiple partition types, i.e. the QTBT structure eliminates the separation of CU, PU and TU concepts and supports more flexibility of CU partition shapes. In the QTBT block structure, a CU may be square or rectangular. As shown in fig. 1, a Coding Tree Unit (CTU) is first partitioned using a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree structure. There are two types of partitioning in binary tree partitioning: a symmetrical horizontal division and a symmetrical vertical division. The binary tree leaf nodes are called Coding Units (CUs), and the partitioning is used for prediction and conversion processes without further partitioning. This means that CU, PU and TU have the same block size in the QTBT coding block structure. In JEM, a CU sometimes consists of Coding Blocks (CBs) of different color components, for example, in a P-slice and a B-slice of a 4.

The following parameters are defined for the QTBT segmentation scheme.

-CTU size: the root node size of the quadtree is the same as the concept in HEVC.

-miniqtsize: minimum allowed quadtree leaf node size

-MaxBTSize: maximum allowed binary tree root node size

-MaxBTDePTh: maximum allowed binary tree depth

-MiNBTSize: minimum allowed binary tree leaf node size

In one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 luma samples with two corresponding 64 × 64 chroma sample blocks, the mitqtsize is set to 16 × 16, the maxbtsize is set to 64 × 64, the ginbtsize (width and height) is set to 4 × 4, and the maxbtsize is set to 4. Quadtree partitioning is first applied to CTUs to generate quadtree leaf nodes. The sizes of the leaf nodes of the quadtree may have sizes from 16 × 16 (i.e., miniqtsize) to 128 × 128 (i.e., CTU size). If the leaf quadtree node is 128 x 128, it is not further partitioned by the binary tree because its size exceeds the MaxBTSize (e.g., 64 x 64). Otherwise, the leaf quadtree nodes may be further partitioned by the binary tree. Thus, the leaf nodes of the quadtree are also the root nodes of the binary tree, and their binary tree depth is 0. When the binary tree depth reaches MaxBTDePTh (i.e., 4), no further partitioning is considered. When the width of the binary tree node is equal to MiNBTSize (i.e., 4), no further horizontal partitioning is considered. Likewise, when the height of the binary tree nodes is equal to the MiNBTSize, no further vertical partitioning is considered. The leaf nodes of the binary tree are further processed by prediction and transformation processes without further partitioning. In JEM, the maximum CTU size is 256 × 256 luma samples.

Fig. 1 (left) illustrates an example of block partitioning by using QTBT, and fig. 1 (right) illustrates the corresponding tree representation. The solid lines represent quadtree partitions and the dashed lines represent binary tree partitions. In each partition (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which partition type (i.e., horizontal or vertical) to use, where 0 represents horizontal partition and 1 represents vertical partition. For quadtree partitioning, there is no need to specify the partition type, because quadtree partitioning always divides one block horizontally and vertically to generate 4 sub-blocks of the same size.

Furthermore, the QTBT scheme supports the ability for luma and chroma to have separate QTBT structures. Currently, luminance and chrominance CTBs in one CTU share the same QTBT structure for P-stripes and B-stripes. However, for I-slices, the luminance CTB is partitioned into CUs with a QTBT structure and the chrominance CTB is partitioned into chrominance CUs with another QTBT structure. This means that a CU in an I-slice consists of coded blocks for the luma component or two chroma components, and a CU in a P-slice or B-slice consists of coded blocks for all three color components.

In HEVC, to reduce memory access for motion compensation, inter prediction of small blocks is restricted such that 4 × 8 and 8 × 4 blocks do not support bi-prediction and 4 × 4 blocks do not support inter prediction. In the QTBT of JEM, these restrictions are removed.

2.2 Inter prediction in HEVC/H.265

Each inter-predicted PU has motion parameters of one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. The use of one of the two reference picture lists may also be signaled using inter _ pred _ idc. Motion vectors can be explicitly coded as deltas with respect to the predictor.

When a CU is coded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no motion vector delta coded or reference picture indices. A Merge mode is specified by which the motion parameters of a current PU may be obtained from neighboring PUs (including spatial and temporal candidates). The Merge mode may be applied to any inter-predicted PU, not just the skip mode. Another option for the Merge mode is the explicit transmission of motion parameters, where the motion vectors (more precisely, the motion vector difference compared to the motion vector predictor), the reference picture index corresponding to each reference picture list and the use of reference picture lists are all explicitly signaled in each PU. In this document, this mode is referred to as Advanced Motion Vector Prediction (AMVP).

When the signaling indicates that one of the two reference picture lists is to be used, the PU is generated from one sample block. This is called "one-way prediction". Unidirectional prediction is available for both P-slices and B-slices.

When the signaling indicates that two reference picture lists are to be used, the PU is generated from two sample blocks. This is called "bi-prediction". Bi-directional prediction is available only for B slices.

The following text provides details of the inter prediction modes specified in HEVC. The description will start with the Merge mode.

2.2.1 Merge mode

Derivation of candidates for 2.2.1.1Merge mode

When predicting a PU using the Merge mode, the index pointing to an entry in the Merge candidate list is analyzed from the bitstream and used to retrieve motion information. The structure of this list is specified in the HEVC standard and can be summarized in the following order of steps:

step 1: initial candidate derivation

Step 1.1: spatial domain candidate derivation

Step 1.2: redundancy checking of spatial domain candidates

Step 1.3: time domain candidate derivation

And 2, step: additional candidate insertions

Step 2.1: creation of bi-directional prediction candidates

Step 2.2: insertion of zero motion candidates

These steps are also schematically depicted in fig. 2. For spatial Merge candidate derivation, a maximum of four Merge candidates are selected among the candidates located at five different positions. For time-domain Merge candidate derivation, at most one Merge candidate is selected among the two candidates. Since the number of candidates per PU is assumed to be constant at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum large candidate (MaxNumMergeCand) signaled in the slice header. Since the number of candidates is constant, the index of the optimal Merge candidate is encoded using truncated unary binarization (TU). If the size of the CU is equal to 8, all PUs of the current CU share one Merge candidate list, which is the same as the Merge candidate list of the 2N × 2N prediction unit.

The operations associated with the above steps are described in detail below.

2.2.1.2 spatial domain candidate derivation

In the derivation of the spatial Merge candidates, a maximum of four Merge candidates are selected among the candidates located at the positions shown in fig. 3. The derivation order is A1, B0, A0, and B2. Position B2 is considered only if any PU of position A1, B0, A0 is unavailable (e.g., because it belongs to another slice or slice) or intra-coded. After adding the candidates for the A1 position, redundancy checks are performed on the additions of the remaining candidates, which ensures that candidates with the same motion information are excluded from the list, thereby improving coding efficiency. In order to reduce the computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only pairs linked with arrows in fig. 4 will be considered, and only if the corresponding candidates for redundancy check do not have the same motion information, the candidates are added to the list. Another source of duplicate motion information is the "second PU" associated with a 2nx 2N different partition. For example, FIG. 5 depicts a second PU in the N2N and 2N cases, respectively. When the current PU is divided into nx2N, candidates for the A1 position are not considered for list construction. In some embodiments, adding this candidate may result in two prediction units with the same motion information, which is redundant for having only one PU in the coding unit. Likewise, when the current PU is divided into 2N × N, the position B1 is not considered.

2.2.1.3 time-domain candidate derivation

In this step, only one candidate is added to the list. In particular, in the derivation of this temporal domain Merge candidate, the scaled motion vector is derived based on the collocated PU having the smallest picture order count POC difference from the current picture in a given reference picture list. The reference picture lists used to derive the collocated PUs are explicitly signaled in the slice header. The dashed line in fig. 6 shows the derivation of a scaled motion vector for the temporal Merge candidate, which is scaled from the motion vector of the collocated PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal Merge candidate is set to zero. The actual implementation of the scaling process is described in the HEVC specification. For B slices, two motion vectors (one for reference picture list 0 and the other for reference picture list 1) are obtained and combined to be a bi-predictive Merge candidate.

Figure 6 is an illustration of motion vector scaling for temporal Merge candidates.

In collocated PU (Y) belonging to a reference frame, in candidate C ₀ And C ₁ The location of the time domain candidate is selected as shown in fig. 7. If at position C ₀ Is unavailable, intra-coded or out of the current CTU row, then position C is used ₁ . Otherwise, position C ₀ Is used for the derivation of the time domain Merge candidate.

2.2.1.4 additional candidate insertions

In addition to spatial and temporal Merge candidates, there are two additional types of Merge candidates: the bidirectional predictive Merge candidate and the zero Merge candidate are combined. The combined bidirectional predictive Merge candidate is generated using spatial and temporal Merge candidates. The combined bi-directional predicted Merge candidates are only for B slices. A combined bi-directional prediction candidate is generated by combining the first reference picture list motion parameters of the initial candidate with the second reference picture list motion parameters of the other candidate. If these two tuples provide different motion hypotheses they will form new bi-directional prediction candidates. As an example, fig. 8 shows the case where two candidates in the original list (on the left) are used to create a combined bi-directional prediction Merge candidate added to the final list (on the right), with two candidates of MvL0 and refIdxL0 or MvL1 and refIdxL 1. A number of rules for combining are defined in the prior art that need to be considered to generate these additional Merge candidates.

Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list to reach the capacity of MaxumMergeCand. These candidates have zero spatial displacement and reference picture indices that start from zero and increase each time a new zero motion candidate is added to the list. The number of reference frames that these candidates use is 1 frame and 2 frames for unidirectional prediction and bidirectional prediction, respectively. Finally, no redundancy check is performed on these candidates.

2.2.1.5 parallel-processed motion estimation regions

To speed up the encoding process, motion estimation may be performed in parallel, thereby deriving motion vectors for all prediction units within a given region simultaneously. Deriving the Merge candidate from the spatial neighborhood may interfere with parallel processing because a prediction unit cannot derive motion parameters from neighboring PUs before performing the associated motion estimation. To mitigate the balance between coding efficiency and processing delay, HEVC defines a Motion Estimation Region (MER), which may be signaled in the picture parameter set for the size of the MER using the syntax element "log2_ parallel _ merge _ level _ minus 2". When defining MER, the Merge candidates falling into the same region are marked as unavailable and are therefore not considered in the list construction.

2.2.2 AMVP

AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is first constructed by checking the availability of temporally neighboring PU locations to the upper left, removing redundant candidate locations, and adding a zero vector to make the candidate list length constant. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, the index of the best motion vector candidate is encoded using a truncated unary. The maximum value to be encoded in this case is 2 (refer to fig. 9). In the following sections, the derivation process of the motion vector prediction candidates will be described in detail.

Derivation of 2.2.2.1AMVP candidates

Fig. 9 summarizes the derivation of motion vector prediction candidates.

In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For the derivation of spatial motion vector candidates, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions as shown in fig. 3.

For the derivation of temporal motion vector candidates, one motion vector candidate is selected from two candidates, which are derived based on two different collocated positions. After the first list of spatio-temporal candidates is made, the duplicate motion vector candidates in the list are removed. If the number of potential candidates is greater than two, then the motion vector candidate with a reference picture index greater than 1 in the associated reference picture list is removed from the list. If the number of spatial-temporal motion vector candidates is less than two, additional zero motion vector candidates are added to the list.

2.2.2.2 spatial motion vector candidates

In deriving the spatial motion vector candidate, a maximum of two candidates are considered among the five potential candidates, which are from PUs at the positions depicted in fig. 3, which are the same as the position of the motion Merge. The derivation order on the left side of the current PU is defined as A ₀ 、A ₁ And scaled A ₀ Zoom, A ₁ . The derivation order above the current PU is defined as B ₀ 、B ₁ ,B ₂ Zoomed B ₀ Zoomed B ₁ Zoomed B ₂ . Thus, four cases per side can be used as motion vector candidates, where two cases do not require the use of spatial scaling and two cases use spatial scaling. Four different cases are summarized as follows:

-no spatial scaling

(1) Same reference picture list, and same reference picture index (same POC)

(2) Different reference picture lists, but the same reference picture index (same POC)

-spatial scaling

(3) Same reference picture list, but different reference picture indices (different POCs)

(4) Different reference picture lists, and different reference picture indices (different POCs)

The case of no spatial scaling is checked first and then the spatial scaling is checked. Spatial scaling is considered when POC is different between the reference picture of the neighboring PU and the reference picture of the current PU, regardless of the reference picture list. If all PUs of the left candidate are not available or intra coded, the motion vectors are allowed to be scaled to facilitate parallel derivation of left and top MV candidates. Otherwise, spatial scaling of the motion vectors is not allowed.

Fig. 10 is an illustration of motion vector scaling of spatial motion vector candidates.

In the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner to the temporal scaling, as shown in fig. 10. The main difference is that given the reference picture list and index of the current PU as input, the actual scaling process is the same as the temporal scaling process.

2.2.2.3 temporal motion vector candidates

All derivation processes of the temporal domain Merge candidate are the same as those of the spatial motion vector candidate except for the derivation of the reference picture index (see fig. 7). Signaling the reference picture index to the decoder.

2.3 New interframe Merge candidate in JEM

2.3.1 sub-CU-based motion vector prediction

In JEM with QTBT, each CU may have at most one set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by partitioning a large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. An Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to obtain multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In the spatial-temporal motion vector prediction (STMVP) method, the motion vector of a sub-CU is recursively derived by using a temporal motion vector predictor and a spatial neighboring motion vector.

In order to maintain a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame is currently disabled.

2.3.1.1 optional temporal motion vector prediction

In an Alternative Temporal Motion Vector Prediction (ATMVP) method, the motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. As shown in fig. 11, the sub-CU is a square N × N block (default N is set to 4).

ATMVP predicts motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture with a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain the motion vector and the reference index of each sub-CU from the corresponding block of each sub-CU, as shown in fig. 11.

In a first step, the reference picture and the corresponding block are determined from motion information of spatially neighboring blocks of the current CU. To avoid repeated scanning processes of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set as the indices of the temporal vector and the motion source picture. As such, in ATMVP, the corresponding block can be identified more accurately than the TMVP, where the corresponding block (sometimes referred to as a collocated block) is always located in the lower right corner or center position with respect to the current CU.

In a second step, the corresponding block of the sub-CU is identified by a temporal vector in the motion source picture by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of the sub-CU is derived using the motion information of its corresponding block (the minimum motion grid covering the central samples). After identifying the motion information corresponding to the nxn block, it is converted into a motion vector and reference index of the current sub-CU, as in the TMVP method of HEVC, where motion scaling and other processing is applied. For example, the decoder checks whether a low delay condition is met (e.g., POC of all reference pictures of the current picture is smaller than POC of the current picture) and predicts a motion vector MVy (X equals 0 or 1 and Y equals 1-X) for each sub-CU, possibly using a motion vector MVx (motion vector corresponding to reference picture list X).

2.3.1.2 spatio-temporal motion vector prediction

In this method, the motion vectors of the sub-CUs are recursively derived in raster scan order. Fig. 12 illustrates this concept. We consider an 8 × 8 CU, which contains four 4 × 4 sub-CUs a, B, C and D. The neighboring 4 x 4 blocks in the current frame are labeled a, b, c, and d.

The motion derivation of sub-CU a starts by identifying its two spatial neighbors. The first neighbor is the nxn block (block c) above the sub-CU a. If this block c is not available or intra coded, the other nxn blocks above the sub-CU a are examined (from left to right, starting at block c). The second neighbor is a block to the left of sub-CU a (block b). If block b is not available or intra coded, the other blocks to the left of sub-CU a are checked (from top to bottom, starting at block b). The motion information obtained by each list from the neighboring blocks is scaled to the first reference frame of the given list. Next, the Temporal Motion Vector Prediction (TMVP) of sub-block a is derived following the same procedure as the TMVP specified in HEVC. The motion information of the collocated block at location D is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged for each reference list, respectively. The average motion vector is assigned as the motion vector of the current sub-CU.

2.3.1.3 sub-CU motion prediction mode signaling

The sub-CU mode is enabled as an additional merge candidate and no additional syntax element is needed to signal this mode. Two additional merge candidates are added to the merge candidate list for each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates ATMVP and STMVP are enabled, up to seven merge candidates are used. The encoding logic of the additional merge candidates is the same as the merge candidate in HM, which means that for each CU in a P or B slice, two more RD checks are needed for two additional merge candidates.

In JEM, CABAC context-codes all the binary bits of the merge index. Whereas in HEVC only the first bin is context coded and the remaining bins are context bypass coded.

2.3.2 non-neighboring Merge candidates

In J0021, the high-pass proposes to derive the additional space Merge candidate from non-adjacent neighboring locations as labeled 6 to 49 in fig. 13. The derived candidate is added after the TMVP candidate in the Merge candidate list.

In J0058, lofting proposes to derive an additional space Merge candidate from a position in an external reference region with an offset (-96 ) with respect to the current block.

As shown in FIG. 14, the locations are labeled A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidate. Each candidate a (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous a or D candidate. Each E (i, j) has an offset of 16 in the horizontal and vertical directions compared to its previous E candidate. The candidates are checked from the inside out. And the order of candidates is a (i, j), B (i, j), C (i, j), D (i, j) and E (i, j). Further study was made whether the number of merge candidates could be further reduced. The candidates are added after the TMVP candidates in the merge candidate list.

In J0059, the extended spatial positions from 6 to 27 in fig. 15 are checked according to their numerical order after the time domain candidates. To save MV line buffering, all spatial candidates are restricted to two CTU lines.

2.4 Intra prediction in JEM

2.4.1 Intra mode coding with 67 Intra prediction modes

For the luminance interpolation filtering, an 8-tap separable DCT-based interpolation filter is used for 2/4 precision samples, and a 7-tap separable DCT-based interpolation filter is used for 1/4 precision samples, as shown in Table 1.

Table 1: 8-tap DCT-IF coefficients for 1/4 luminance interpolation.

Position of	Filter coefficient
		1/4	{-1,4,-10,58,17,-5,1}
2/4	{-1,4,-11,40,40,-11,4,-1}
		3/4	{1,-5,17,58,-10,4,-1}

Similarly, a 4-tap separable DCT-based interpolation filter is used for the chrominance interpolation filter, as shown in table 2.

Table 2: 4-tap DCT-IF coefficients for 1/8 chroma interpolation.

For the vertical interpolation of 4.

For bi-directional prediction, the bit depth of the output of the interpolation filter is maintained to 14 bits of precision, regardless of the source bit depth, before averaging the two prediction signals. The actual averaging process is done implicitly by the bit depth reduction process:

predSamples[x,y]＝predSamplesL0[x,y]+predSamplesL1[x,y]+offset)>>shift

wherein shift = (15-BitDepth) and offset =1< (shift-1)

If both the horizontal and vertical components of the motion vector point to sub-pixel locations, then horizontal interpolation is always performed first, followed by vertical interpolation. For example, to interpolate the sub-pixel j0,0 shown in fig. 16, first, b0, k (k = -3, -2.. 3) is interpolated according to equation 2-1, and then j0,0 is interpolated according to equation 2-2. Here, shift1= Min (4,bitdepthy-8), and shift2=6, where BitDepthY is the bit depth of a video block, more specifically, the bit depth of the luminance component of the video block.

b0,k＝(-A-3,k+4*A-2,k-11*A–1,k+40*A0,k+40*A1,k-11*A2,k+4*A3,k-A4,k)>>shift1 (2-1)

j0,0＝(-b0,-3+4*b0,-2-11*b0,-1+40*b0,0+40*b0,1-11*b0,2+4*b0,3-b0,4)>>shift2 (2-2)

Alternatively, we can perform vertical interpolation first, and then perform horizontal interpolation. In this case, to interpolate j0, first, hk,0 (k = -3, -2.. 3) is interpolated according to equations 2-3, and then j0,0 is interpolated according to equations 2-4. When BitDepthY is less than or equal to 8, shift1 is 0, there is no loss in the first interpolation stage, and therefore, the final interpolation result is not changed by the interpolation order. However, when BitDepthY is greater than 8, shift1 is greater than 0. In this case, when a different interpolation order is applied, the final interpolation result may be different.

hk,0＝(-Ak,-3+4*Ak,-2-11*Ak，-1+40*Ak,0+40*Ak,1-11*Ak,2+4*Ak,3–Ak,4)>>shift1 (2-3)

j0,0＝(-h-3,0+4*h-2,0-11*h-1,0+40*h0,0+40*h1,0-11*h2,0+4*h3,0-h4,0)>>shift2 (2-4)

3. Example of problems addressed by embodiments

For the luma block size WxH, if we always perform horizontal interpolation first, the required interpolation (per pixel) is shown in table 3.

Table 3: interpolation required by HEVC/JEM on WxH luminance component

On the other hand, if we first perform vertical interpolation, the required interpolation is shown in table 4. Obviously, the optimal interpolation order is an interpolation order requiring a smaller number of interpolations between table 3 and table 4.

Table 4: interpolation required for WxH luminance component when the interpolation order is reversed

For the chroma components, if we always perform horizontal interpolation first, the required interpolation is ((H + 3) x W + W x H)/(W x H) =2+3/H. If we always perform vertical interpolation first, the required interpolation is ((W + 3) x H + W x H)/(W x H) =2+3/W.

As described above, when the bit depth of the input video is greater than 8, different interpolation orders may result in different interpolation results. Therefore, the interpolation order should be implicitly defined in the encoder and decoder.

4. Examples of the embodiments

To address these problems and provide other benefits, we propose a shape dependent interpolation order.

The following detailed examples should be considered as examples to explain the general concept. These inventions should not be construed in a narrow manner. Furthermore, these inventions may be combined in any manner.

1. It is proposed that the interpolation order depends on the current coding block shape (e.g. the coding block is a CU).

a. In one example, for width>High-level blocks, such as the CU, PU or subblock used in subblock-based prediction (e.g., affine, ATMVP, or BIO), perform vertical interpolation first and then horizontal interpolation, e.g., first on pixel d _k,0 ，h _k,0 And n _k,0 Interpolation is carried out, then e is carried out _0,0 To r _0,0 And (6) carrying out interpolation. J is shown in equations 2-3 and 2-4 _0,0 Examples of (2).

i. Alternatively, for blocks of width > = height (such as CU, PU or sub-block used in sub-block based prediction (such as affine, ATMVP or BIO), vertical interpolation is performed first, followed by horizontal interpolation.

b. In one example, for a block of width < = height, such as a CU, PU, or sub-block used in sub-block based prediction (e.g., affine, ATMVP, or BIO), horizontal interpolation is performed first, followed by vertical interpolation.

i. Alternatively, for a block of width < height, such as a CU, PU, or sub-block used in sub-block based prediction (e.g., affine, ATMVP, or BIO), horizontal interpolation is performed first, followed by vertical interpolation.

c. In one example, both the luma component and the chroma component follow the same interpolation order.

d. Alternatively, when one chroma coding block corresponds to multiple luma coding blocks (e.g., one chroma 4 × 4 block may correspond to two 8 × 4 or 4 × 8 luma blocks for a 4.

e. In one example, when different interpolation orders are utilized, the scaling factors (i.e., shift1 and shift 2) in the multiple stages may be further changed accordingly.

2. Alternatively, in addition, the order in which the interpolation of the luminance components is proposed may also depend on the MVs.

a. In one example, if the vertical MV component points to a quarter-pixel position and the horizontal MV component points to a half-pixel position, then horizontal interpolation is performed first, followed by vertical interpolation.

b. In one example, if the vertical MV component points to a half-pixel position and the horizontal MV component points to a quarter-pixel position, then vertical interpolation is performed first, followed by horizontal interpolation.

c. In one example, the proposed method is only applied to square coded blocks.

3. The proposed method may be applied to certain modes, block sizes/shapes and/or certain sub-block sizes.

a. The proposed method can be applied to certain modes, such as bi-predictive mode.

b. The proposed method can be applied to certain block sizes.

i. In one example, it only applies to blocks of w × h < = T1, where w and h are the width and height of the current block, and T1 is a first threshold, which may be a predefined value depending on design requirements, such as 16, 32 or 64.

in one example, it only applies to blocks of h < = T2, and T2 is a second threshold, which may be a predefined value depending on design requirements, such as 4 or 8.

c. The proposed method may be applied to certain color components (such as only the luminance component).

4. It is proposed that when applying multi-hypothesis prediction to a block, short taps or different interpolation filters may be applied compared to those applied to the normal prediction mode.

a. In one example, a bilinear filter may be used.

b. A short tap or second interpolation filter may be applied to a reference picture list involving multiple reference blocks, while for another reference picture having only one reference block, the same filter as used for the normal prediction mode may be applied.

c. The proposed method may be applied under certain conditions, such as certain temporal layer(s) containing the block, quantization parameters of the block/slice/picture being within a range (such as greater than a threshold).

Fig. 17 is a block diagram of the video processing apparatus 1700. The apparatus 1700 may be used to implement one or more of the methods described herein. The apparatus 1700 may be embedded in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1700 may include one or more processors 1702, one or more memories 1704, and video processing hardware 1706. The processor(s) 1702 may be configured to implement one or more of the methods described in this document. Memory(s) 1704 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 1706 may be used to implement some of the techniques described in this document in hardware circuits.

Fig. 19 is a flow chart of a method 1900 of video bitstream processing. Method 1900 includes determining (1905) a shape of a video block, determining (1910) an interpolation order based on the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed, and performing the horizontal interpolation and vertical interpolation according to the interpolation order of the video block to reconstruct (1915) a decoded representation of the video block.

Fig. 20 is a flow chart of a method 2000 of video bitstream processing. Method 2000 includes determining (2005) a characteristic of a motion vector associated with the video block, determining (2010) an interpolation order for the video block based on the characteristic of the motion vector, the interpolation order indicating a sequence in which horizontal and vertical interpolation are performed, and performing the horizontal and vertical interpolation according to the interpolation order for the video block to reconstruct (2015) a decoded representation of the video block.

Some examples of sequences that perform horizontal interpolation and vertical interpolation and their uses are described in section 4 of this document with reference to

methods

1900 and 2000. For example, as described in section 4, under different shapes of video blocks, one of horizontal interpolation or vertical interpolation may be performed first. In some embodiments, horizontal interpolation is performed prior to vertical interpolation, and in some embodiments, vertical interpolation is performed prior to horizontal interpolation.

Referring to

methods

1900 and 2000, video blocks may be encoded in a video bitstream, where bit efficiency may be achieved by using bitstream generation rules that relate to an interpolation order, which also depends on the shape of the video blocks.

It should be appreciated that the disclosed techniques may be embedded in a video encoder or decoder to improve compression efficiency when the compressed coding unit has a shape that is significantly different from a conventional square or half-square rectangular block. For example, new coding tools using long or high coding units, such as 4 x 32 or 32 x 4 size units, may benefit from the disclosed techniques.

Fig. 21 is a flow chart of an example of a video processing method 2100. The method 2100 includes: determining (2102) a first prediction mode to apply to the first video block; performing (2104) a first transformation between the first video block and the encoded representation of the first video block by applying horizontal interpolation and/or vertical interpolation to the first video block; determining (2106) a second prediction mode to apply to the second video block; performing (2108) a second conversion between the second video block and the encoded representation of the second video block by applying horizontal interpolation and/or vertical interpolation to the second video block, wherein one or both of the horizontal interpolation and the vertical interpolation of the first video block uses a shorter tap filter than the filter used for the second video block based on a determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode.

Fig. 22 is a flow chart of a method 2200 of video bitstream processing. The method comprises the following steps: determining (2205) a shape of the video block; an interpolation order is determined (2210) based on the shape of the video block, the interpolation order indicating a sequence in which horizontal and vertical interpolation is performed, and the horizontal and vertical interpolation is performed on the video block in the sequence indicated by the interpolation order to construct (2215) an encoded representation of the video block.

Fig. 23 is a flow chart of a method 2300 of video bitstream processing. The method comprises the following steps: determining (2305) characteristics of a motion vector associated with the video block; determining (2310) an interpolation order based on the feature of the motion vector, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and performing horizontal and vertical interpolation on the video blocks in the sequence indicated by the interpolation order to construct (2315) an encoded representation of the video block.

Various embodiments and techniques disclosed in this document may be described in the following list of embodiments.

1. A video processing method, comprising: determining a first prediction mode to apply to the first video block; performing a first conversion between the first video block and the encoded representation of the first video block by applying horizontal interpolation and/or vertical interpolation to the first video block; determining a second prediction mode to apply to the second video block; performing a second conversion between the second video block and the encoded representation of the second video block by applying horizontal interpolation and/or vertical interpolation to the second video block, wherein one or both of the horizontal interpolation and the vertical interpolation of the first video block uses a shorter tap filter than the filter used for the second video block based on a determination that the first prediction mode is a multi-hypothesis prediction mode and the second prediction mode is not a multi-hypothesis prediction mode.

2. The method of example 1, wherein the first video block is converted for bi-prediction using more than two reference blocks and uses more than two reference blocks, at least for one reference picture list.

3. The method of example 1, wherein the first video block is transformed with more than one reference block for uni-directional prediction.

4. The method according to any one of examples 1-3, wherein the shorter tap filter is a bilinear filter.

5. The method of any of examples 1-3, wherein one or both of the horizontal interpolation and the vertical interpolation uses shorter tap filters for reference picture lists associated with multiple reference blocks.

6. The method of any of examples 1-5, wherein one or both of the horizontal interpolation or the vertical interpolation uses the same filter as used for the normal prediction mode when the reference picture list relates to a single reference block.

7. The method of any of examples 1-6, wherein the method is applied based on a determination of one or more of: the usage of the temporal layer, the quantization parameter of one or more blocks, slices or pictures comprising the video block is within a threshold range.

8. The method of example 7, wherein the quantization parameter within the threshold range includes a quantization parameter greater than a threshold.

9. The method of example 6, wherein the normal prediction mode includes a uni-directional prediction that predicts sample values of samples in the block using inter-prediction having at most one motion vector and one reference index, or a bi-directional prediction inter-prediction mode that predicts sample values of samples in the block using inter-prediction having at most two motion vectors and one reference index.

10. A video decoding apparatus comprising a processor configured to implement the method of one or more of examples 1 to 9.

11. A video encoding apparatus comprising a processor configured to implement the method of one or more of examples 1 to 9.

12. A computer readable program medium having code stored thereon, the code comprising instructions which, when executed by a processor, cause the processor to carry out the method in one or more of examples 1 to 9.

13. A video bitstream processing method, comprising: determining a shape of a video block; determining an interpolation order based on the shape of the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and performing horizontal and vertical interpolation on the video blocks in the sequence indicated by the interpolation order to reconstruct a decoded representation of the video blocks.

14. The method of example 13, wherein the shape of the video block is represented by a width and a height of the video block, and the step of determining the interpolation order further comprises:

when the width of the video block is greater than the height of the video block, it is determined that vertical interpolation is performed before horizontal interpolation as an interpolation order.

15. The method of example 13, wherein the shape of the video block is represented by a width and a height, and the step of determining the interpolation order further comprises:

when the width of the video block is greater than or equal to the height of the video block, it is determined that vertical interpolation is performed before horizontal interpolation as an interpolation order.

16. The method of example 13, wherein the shape of the video block is represented by a width and a height, and the step of determining the interpolation order further comprises:

when the height of the video block is greater than or equal to the width of the video block, it is determined that horizontal interpolation is performed before vertical interpolation as an interpolation order.

17. The method of example 1, wherein the shape of the video block is represented by a width and a height, and the step of determining the interpolation order further comprises:

when the height of the video block is greater than the width of the video block, it is determined that horizontal interpolation is performed before vertical interpolation as an interpolation order.

18. The method of example 1, wherein the luminance component and the chrominance components of the video block are interpolated based on the interpolation order or based on a different interpolation order.

19. The method of example 1, wherein when each chroma block of the chroma component corresponds to multiple luma blocks of the luma component, the luma component and the chroma component of the video block are interpolated using different interpolation orders.

20. The method of example 13, wherein the luma component and the chroma component of the video block are interpolated using different interpolation orders, and wherein the scaling factors used in the horizontal interpolation and the vertical interpolation are different for the luma component and the chroma component.

21. A video bitstream processing method, comprising: determining a feature of a motion vector associated with a video block; determining an interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed, based on the feature of the motion vector; and performing horizontal and vertical interpolation on the video blocks in a sequence indicated by the interpolation order to reconstruct a decoded representation of the video blocks.

22. The method of example 21, wherein the feature of the motion vector is represented by a quarter-pixel position and a half-pixel position to which the motion vector points, the motion vector includes a vertical component and a horizontal component, and determining the interpolation order includes: when the vertical component points to a quarter-pixel position and the horizontal component points to a half-pixel position, it is determined that horizontal interpolation is performed before vertical interpolation as an interpolation order.

23. The method of example 21, wherein the feature of the motion vector is represented by a quarter-pixel position and a half-pixel position to which the motion vector points, the motion vector includes a vertical component and a horizontal component, and determining the interpolation order includes: when the vertical component points to a half-pixel position and the horizontal component points to a quarter-pixel position, it is determined that vertical interpolation is performed before horizontal interpolation.

24. The method of any of examples 21-23, wherein the shape of the video block is square.

25. The method of any of examples 21-24, wherein the method is applied to bi-predictive mode.

26. The method of any of examples 21-25, wherein the method is applied when a height of the video block multiplied by a width of the video block is less than or equal to T1, T1 being the first threshold.

27. The method of any of examples 21-25, wherein the method is applied when the video block has a height less than or equal to T2, T2 being a second threshold.

28. The method of any of examples 21-25, wherein the method is applied to a luma component of a video block.

29. A video bitstream processing method, comprising:

determining a shape of a video block;

determining an interpolation order based on the shape of the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and

horizontal and vertical interpolation are performed on the video blocks in a sequence indicated by the interpolation order to construct an encoded representation of the video blocks.

30. A video bitstream processing method, comprising:

determining a feature of a motion vector associated with a video block;

determining an interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed, based on the feature of the motion vector; and

31. A video decoding apparatus comprising a processor configured to implement the method of one or more of examples 21 to 28.

32. A video encoding apparatus comprising a processor configured to implement the method of example 29 or 30.

33. A computer program product having stored thereon computer code which, when executed by a processor, causes the processor to implement the method of any of examples 13 to 30.

34. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of examples 13 to 30.

From the foregoing, it will be appreciated that specific embodiments of the disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the disclosed technology is not limited except as by the appended claims.

The implementation and functional operations of the subject matter described in this patent document can be implemented in various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a transitory and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or groups of computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

It is intended that the specification and drawings be considered as exemplary, with an exemplary meaning being exemplary. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. In addition, the use of "or" is intended to include "and/or" unless the context clearly indicates otherwise.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claim combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Likewise, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A video bitstream processing method, comprising:

determining a shape of a video block;

determining an interpolation order based on a shape of the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed; and

performing the horizontal interpolation and the vertical interpolation on the video block in a sequence indicated by an interpolation order to reconstruct a decoded representation of the video block;

wherein the scaling factors used in the horizontal interpolation and the vertical interpolation are different when different interpolation orders are used.

2. The method of claim 1, wherein the shape of the video block is represented by a width and a height of the video block, and determining an interpolation order further comprises:

determining to perform the vertical interpolation prior to the horizontal interpolation as the interpolation order when the width of the video block is greater than the height of the video block.

3. The method of claim 1, wherein the shape of the video block is represented by a width and a height of the video block, and determining an interpolation order further comprises:

determining to perform the vertical interpolation prior to the horizontal interpolation as the interpolation order when the width of the video block is greater than or equal to the height of the video block.

4. The method of claim 1, wherein the shape of the video block is represented by a width and a height of the video block, and determining an interpolation order further comprises:

determining to perform the horizontal interpolation prior to the vertical interpolation as the interpolation order when the height of the video block is greater than or equal to the width of the video block.

5. The method of claim 1, wherein the shape of the video block is represented by a width and a height of the video block, and determining an interpolation order further comprises:

determining to perform the horizontal interpolation prior to the vertical interpolation as the interpolation order when the height of the video block is greater than the width of the video block.

6. The method of claim 1, wherein the luma component and the chroma components of the video block are interpolated based on the interpolation order or based on a different interpolation order.

7. The method of claim 6, wherein when each chroma block of the chroma component corresponds to multiple luma blocks of the luma component, the luma component and the chroma components of the video block are interpolated using different interpolation orders.

8. The method of claim 1, wherein, when the vertical interpolation is performed before the horizontal interpolation, the scaling factor in the vertical interpolation is equal to 0 when a bit depth of the video block is less than or equal to 8, and the scaling factor in the vertical interpolation is greater than 0 when the bit depth is greater than 8.

9. The method of claim 1, wherein, when the horizontal interpolation is performed before the vertical interpolation, a scaling factor in the horizontal interpolation is equal to Min (4, bitDepthY-8) and a scaling factor in the vertical interpolation is equal to 6, and wherein BitDepthY is a bit depth of the video block.

10. A video bitstream processing method, comprising:

determining a feature of a motion vector associated with a video block;

determining an interpolation order indicating a sequence of performing horizontal interpolation and vertical interpolation based on the feature of the motion vector; and

performing the horizontal interpolation and the vertical interpolation on the video block in a sequence indicated by the interpolation order to reconstruct a decoded representation of the video block;

wherein the feature of the motion vector is represented by a quarter-pixel position and a half-pixel position to which the motion vector points, the motion vector comprises a vertical component and a horizontal component, and determining the interpolation order comprises:

determining to perform the horizontal interpolation prior to the vertical interpolation as the interpolation order when the vertical component points to the quarter-pixel position and the horizontal component points to the half-pixel position.

11. The method of claim 10, wherein the feature of the motion vector is represented by a quarter-pixel position and a half-pixel position to which the motion vector points, the motion vector includes a vertical component and a horizontal component, and determining the interpolation order comprises:

determining to perform the vertical interpolation prior to the horizontal interpolation when the vertical component points to the half-pixel location and the horizontal component points to the quarter-pixel location.

12. The method of claim 10, wherein the shape of the video block is square.

13. The method according to any of claims 1-12, wherein the method is applied in bi-predictive mode.

14. The method of any one of claims 1-12, wherein the method is applied when a height of the video block multiplied by a width of the video block is less than or equal to T1, T1 being a first threshold.

15. The method of any one of claims 1-12, wherein the method is applied when the video block has a height less than or equal to T2, T2 being a second threshold.

16. The method of any one of claims 1-12, wherein the method is applied to a luma component of the video block.

17. A video bitstream processing method, comprising:

determining a shape of a video block;

performing the horizontal interpolation and the vertical interpolation on the video block in a sequence indicated by the interpolation order to construct an encoded representation of the video block;

18. A video bitstream processing method, comprising:

determining a feature of a motion vector associated with a video block;

19. A video decoding apparatus comprising a processor configured to implement the method of any of claims 1 to 16.

20. A video encoding apparatus comprising a processor configured to implement the method of claim 17 or 18.

21. A non-transitory computer readable medium having stored thereon a computer program product comprising computer code, which when executed by a processor, causes the processor to implement the method of any one of claims 1 to 18.

22. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-18.