WO2020143837A1 - Amélioration du mmvd - Google Patents

Amélioration du mmvd Download PDF

Info

Publication number
WO2020143837A1
WO2020143837A1 PCT/CN2020/071848 CN2020071848W WO2020143837A1 WO 2020143837 A1 WO2020143837 A1 WO 2020143837A1 CN 2020071848 W CN2020071848 W CN 2020071848W WO 2020143837 A1 WO2020143837 A1 WO 2020143837A1
Authority
WO
WIPO (PCT)
Prior art keywords
precision
video block
motion
block
current video
Prior art date
Application number
PCT/CN2020/071848
Other languages
English (en)
Inventor
Hongbin Liu
Li Zhang
Kai Zhang
Yue Wang
Original Assignee
Beijing Bytedance Network Technology Co., Ltd.
Bytedance Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Bytedance Network Technology Co., Ltd., Bytedance Inc. filed Critical Beijing Bytedance Network Technology Co., Ltd.
Priority to CN202080008062.0A priority Critical patent/CN113273216B/zh
Publication of WO2020143837A1 publication Critical patent/WO2020143837A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Definitions

  • This document is related to video coding technologies.
  • Digital video accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow.
  • the disclosed techniques may be used by video decoder or encoder embodiments for in which interpolation is improved using a block-shape interpolation order technique.
  • a method of video bitstream processing includes determining a shape of a first video block, determining an interpolation order based on the shape of the first video block, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation, and performing the horizontal interpolation and the vertical interpolation for the first video block in the sequence in accordance with the interpolation order to reconstruct a decoded representation of the first video block.
  • a method of video bitstream processing includes determining characteristics of a motion vector related to a first video block, determining an interpolation order based on the characteristics of the motion vector, the interpolation order indicative of a sequence of performing a horizontal interpolation and a vertical interpolation, and performing the horizontal interpolation and the vertical interpolation for the first video block in the sequence in accordance with the interpolation order to reconstruct a decoded representation of the first video block.
  • a method for video bitstream processing includes determining, by a processor, dimension characteristics of a first video block; determining, by the processor, that a first interpolation filter is to be applied to the first video block based on the determination of the dimension characteristics; and performing further processing of the first video block using the first interpolation filter.
  • a method for video bitstream processing includes determining, by a processor, first characteristics of a first video block; determining, by the processor, that a first interpolation filter is to be applied to the first video block based on the first characteristics; performing further processing of the first video block using the first interpolation filter; determining, by a processor, second characteristics of a second video block; determining, by the processor, that a second interpolation filter is to be applied to the second video block based on the second characteristics, the first interpolation filter and the second interpolation filter being different short-tap filters; and performing further processing of the second video block using the second interpolation filter.
  • a method for video bitstream processing includes determining, by a processor, characteristics of a first video block, the characteristics including one or more of: a dimension information of a first video block, a prediction direction of the first video block, or a motion information of the first video block; rounding motion vectors (MVs) related to the first video block to integer-pel precision or half-pel precision based on the determination of the characteristics of the first video block; and performing further processing of the first video block using the motion vectors that are rounded.
  • characteristics of a first video block including one or more of: a dimension information of a first video block, a prediction direction of the first video block, or a motion information of the first video block
  • MVs motion vectors
  • a method for video bitstream processing includes determining, by a processor, that a first video block is coded with a merge mode; rounding motion information related to the first video block to integer precision to generate modified motion information based on the determination that the first video block is coded with the merge mode; and performing a motion compensation process for the first video block using the modified motion information.
  • a method for video bitstream processing includes determining characteristics of a first video block, the characteristics being one or both of: a size of the first video block, or a shape of the first video block; modifying motion vectors related to the first video block to integer-pel precision or half-pel precision to generate modified motion vectors; and performing further processing of the first video block using the modified motion vectors.
  • a method for video bitstream processing includes determining characteristics of a first video block, the characteristics being one or both of: a size dimension of the first video block, or a prediction direction of the first video block; determining MMVD side information based on the determination of the characteristics of the first video block; and performing further processing of the first video block using the MMVD side information.
  • a method for video bitstream processing includes determining characteristics of a first video block, the characteristics being one or both of: a size of the first video block, or a shape of the first video block; modifying motion vectors related to the first video block to integer-pel precision or half-pel precision to generate modified motion vectors; and performing further processing of the first video block using the modified motion vectors.
  • a method for video bitstream processing includes determining characteristics of a first video block, the characteristics being one or both of: a size of the first video block, or a shape of the first video block; determining a threshold number of half-pel motion vector (MV) components or quarter-pel MV components to be constrained based on the determination of the characteristics of the first video block; and performing further processing of the first video block using the threshold number.
  • MV half-pel motion vector
  • a method for video bitstream processing includes determining characteristics of a first video block, the characteristics including a size of the first video block; modifying motion vectors (MVs) related to the first video block from fractional precision to integer precision based on the determination of the characteristics of the first video block; and performing motion compensation for the first video block using the modified MVs.
  • MVs motion vectors
  • a method for video bitstream processing includes determining a first dimension of a first video block; determining a first precision for motion vectors (MVs) related to the first video block based on the determination of the first dimension; determining a second dimension of a second video block, the first dimension and the second dimension being different dimensions; determining a second precision for MVs related to the second video block based on the determination of the second dimension, the first precision and the second precision being different precisions; and performing further processing of the first video block using the first dimension and of the second video block using the second dimension.
  • MVs motion vectors
  • a method for video bitstream processing includes determining, during a conversion between a current video block and a bitstream representation of the current video block, one or more parameters of the current video block, wherein the one or more parameters of the current video block comprise at least one of a dimension and a prediction direction of the current video block; determining MMVD (Merge mode with Motion Vector Difference) side information at least based on the one or more parameters of the current video block; and performing, at least based on the MMVD side information, the conversion; wherein the MMVD mode uses a motion vector expression that includes a starting point which is a base merge candidate, a motion magnitude distance and a motion direction for the current video block.
  • MMVD Motion mode with Motion Vector Difference
  • a method for video bitstream processing includes determining, during a conversion between a current video block and a bitstream representation of the current video block, one or more parameters of the current video block, wherein the one or more parameters of the current video block comprise at least one of a size of the current video block and a shape of the current video block; determining, at least based on the one or more parameters of the current video block, a motion vector precision for the current video block; and performing, based on the determined precision, the conversion; wherein the current video block is conversed in a MMVD mode, and the MMVD mode uses a motion vector expression that includes a starting point which is a base merge candidate, a motion magnitude distance and a motion direction for the current video block.
  • the above-described methods may be implemented by a video decoder apparatus that comprises a processor.
  • the above-described methods may be implemented by a video encoder apparatus comprising a processor for decoding encoded video during video encoding process.
  • these methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.
  • FIG. 1 is an illustration of a QUAD TREE BINARY TREE (QTBT) structure
  • FIG. 2 shows an example derivation process for merge candidates list construction.
  • FIG. 3 shows example positions of spatial merge candidates.
  • FIG. 4 shows an example of candidate pairs considered for redundancy check of spatial merge candidates.
  • FIG. 5A and 5B show examples of positions for the second prediction unit (PU) of N ⁇ 2N and 2N ⁇ N partitions.
  • FIG. 6 is an illustration of motion vector scaling for temporal merge candidate.
  • FIG. 7 shows example candidate positions for temporal merge candidate, C0 and C1.
  • FIG. 8 shows an example of combined bi-predictive merge candidate.
  • FIG. 9 shows an example of a derivation process for motion vector prediction candidates.
  • FIG. 10 is an illustration of motion vector scaling for spatial motion vector candidate.
  • FIG. 11 shows an example of advanced temporal motion vector prediction (ATMVP) motion prediction for a coding unit (CU) .
  • ATMVP advanced temporal motion vector prediction
  • FIG. 12 shows an example of one CU with four sub-blocks (A-D) and its neighbouring blocks (a–d) .
  • FIG. 13 illustrates proposed non-adjacent merge candidates in J0021.
  • FIG. 14 illustrates proposed non-adjacent merge candidates in J0058.
  • FIG. 15 illustrates proposed non-adjacent merge candidates in J0059.
  • FIG. 16 shows an example of integer samples and fractional sample positions for quarter sample luma interpolation.
  • FIG. 17 is a block diagram of an example of a video processing apparatus.
  • FIG. 18 shows a block diagram of an example implementation of a video encoder.
  • FIG. 19 is a flowchart for an example of a video bitstream processing method.
  • FIG. 20 is a flowchart for an example of a video bitstream processing method.
  • FIG. 21 shows an example of repeat boundary pixels of a reference block before interpolation.
  • FIG. 22 is a flowchart for an example of a video bitstream processing method.
  • FIG. 23 is a flowchart for an example of a video bitstream processing method.
  • FIG. 24 is a flowchart for an example of a video bitstream processing method.
  • FIG. 25 is a flowchart for an example of a video bitstream processing method.
  • the present document provides various techniques that can be used by a decoder of video bitstreams to improve the quality of decompressed or decoded digital video. Furthermore, a video encoder may also implement these techniques during the process of encoding in order to reconstruct decoded frames used for further encoding.
  • Section headings are used in the present document for ease of understanding and do not limit the embodiments and techniques to the corresponding sections. As such, embodiments from one section can be combined with embodiments from other sections.
  • This invention is related to video coding technologies. Specifically, it is related to interpolation in video coding. It may be applied to the existing video coding standard like HEVC, or the standard (Versatile Video Coding) to be finalized. It may be also applicable to future video coding standards or video codec.
  • Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards.
  • the ITU-T produced H. 261 and H. 263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly produced the H. 262/MPEG-2 Video and H. 264/MPEG-4 Advanced Video Coding (AVC) and H. 265/HEVC standards.
  • AVC H. 264/MPEG-4 Advanced Video Coding
  • H. 265/HEVC High Efficiency Video Coding
  • the video coding standards are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized.
  • Joint Video Exploration Team JVET was founded by VCEG and MPEG jointly in 2015.
  • JVET Joint Exploration Model
  • FIG. 18 is a block diagram of an example implementation of a video encoder.
  • Quadtree plus binary tree (QTBT) block structure with larger CTUs
  • a CTU is split into CUs by using a quadtree structure denoted as coding tree to adapt to various local characteristics.
  • the decision whether to code a picture area using inter-picture (temporal) or intra-picture (spatial) prediction is made at the CU level.
  • Each CU can be further split into one, two or four PUs according to the PU splitting type. Inside one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis.
  • a CU can be partitioned into transform units (TUs) according to another quadtree structure similar to the coding tree for the CU.
  • TUs transform units
  • the QTBT structure removes the concepts of multiple partition types, i.e. it removes the separation of the CU, PU and TU concepts, and supports more flexibility for CU partition shapes.
  • a CU can have either a square or rectangular shape.
  • a coding tree unit (CTU) is first partitioned by a quadtree structure.
  • the quadtree leaf nodes are further partitioned by a binary tree structure.
  • the binary tree leaf nodes are called coding units (CUs) , and that segmentation is used for prediction and transform processing without any further partitioning.
  • a CU sometimes consists of coding blocks (CBs) of different colour components, e.g. one CU contains one luma CB and two chroma CBs in the case of P and B slices of the 4: 2: 0 chroma format and sometimes consists of a CB of a single component, e.g., one CU contains only one luma CB or just two chroma CBs in the case of I slices.
  • CBs coding blocks
  • CTU size the root node size of a quadtree, the same concept as in HEVC
  • MinQTSize the minimum allowed quadtree leaf node size
  • MaxBTSize the maximum allowed binary tree root node size
  • MaxBTDepth the maximum allowed binary tree depth
  • MinBTSize the minimum allowed binary tree leaf node size
  • the CTU size is set as 128 ⁇ 128 luma samples with two corresponding 64 ⁇ 64 blocks of chroma samples
  • the MinQTSize is set as 16 ⁇ 16
  • the MaxBTSize is set as 64 ⁇ 64
  • the MinBTSize (for both width and height) is set as 4 ⁇ 4
  • the MaxBTDepth is set as 4.
  • the quadtree partitioning is applied to the CTU first to generate quadtree leaf nodes.
  • the quadtree leaf nodes may have a size from 16 ⁇ 16 (i.e., the MinQTSize) to 128 ⁇ 128 (i.e., the CTU size) .
  • the quadtree leaf node is also the root node for the binary tree and it has the binary tree depth as 0.
  • MaxBTDepth i.e., 4
  • no further splitting is considered.
  • MinBTSize i.e., 4
  • no further horizontal splitting is considered.
  • the binary tree node has height equal to MinBTSize
  • no further vertical splitting is considered.
  • the leaf nodes of the binary tree are further processed by prediction and transform processing without any further partitioning. In the JEM, the maximum CTU size is 256 ⁇ 256 luma samples.
  • FIG. 1 illustrates an example of block partitioning by using QTBT
  • FIG. 1 (right) illustrates the corresponding tree representation.
  • the solid lines indicate quadtree splitting and dotted lines indicate binary tree splitting.
  • each splitting (i.e., non-leaf) node of the binary tree one flag is signalled to indicate which splitting type (i.e., horizontal or vertical) is used, where 0 indicates horizontal splitting and 1 indicates vertical splitting.
  • the quadtree splitting there is no need to indicate the splitting type since quadtree splitting always splits a block both horizontally and vertically to produce 4 sub-blocks with an equal size.
  • the QTBT scheme supports the ability for the luma and chroma to have a separate QTBT structure.
  • the luma and chroma CTBs in one CTU share the same QTBT structure.
  • the luma CTB is partitioned into CUs by a QTBT structure
  • the chroma CTBs are partitioned into chroma CUs by another QTBT structure. This means that a CU in an I slice consists of a coding block of the luma component or coding blocks of two chroma components, and a CU in a P or B slice consists of coding blocks of all three colour components.
  • inter prediction for small blocks is restricted to reduce the memory access of motion compensation, such that bi-prediction is not supported for 4 ⁇ 8 and 8 ⁇ 4 blocks, and inter prediction is not supported for 4 ⁇ 4 blocks.
  • these restrictions are removed.
  • Each inter-predicted PU has motion parameters for one or two reference picture lists.
  • Motion parameters include a motion vector and a reference picture index. Usage of one of the two reference picture lists may also be signalled using inter_pred_idc. Motion vectors may be explicitly coded as deltas relative to predictors.
  • a merge mode is specified whereby the motion parameters for the current PU are obtained from neighbouring PUs, including spatial and temporal candidates.
  • the merge mode can be applied to any inter-predicted PU, not only for skip mode.
  • the alternative to merge mode is the explicit transmission of motion parameters, where motion vector (to be more precise, motion vector difference compared to a motion vector predictor) , corresponding reference picture index for each reference picture list and reference picture list usage are signalled explicitly per each PU.
  • Such mode is named Advanced motion vector prediction (AMVP) in this disclosure.
  • the PU When signalling indicates that one of the two reference picture lists is to be used, the PU is produced from one block of samples. This is referred to as ‘uni-prediction’ . Uni-prediction is available both for P-slices and B-slices.
  • Bi-prediction When signalling indicates that both of the reference picture lists are to be used, the PU is produced from two blocks of samples. This is referred to as ‘bi-prediction’ . Bi-prediction is available for B-slices only.
  • Step 1.2 Redundancy check for spatial candidates
  • a maximum of four merge candidates are selected among candidates that are located in five different positions.
  • a maximum of one merge candidate is selected among two candidates. Since constant number of candidates for each PU is assumed at decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of merge candidate (MaxNumMergeCand) which is signaled in slice header. Since the number of candidates is constant, index of best merge candidate is encoded using truncated unary binarization (TU) . If the size of CU is equal to 8, all the PUs of the current CU share a single merge candidate list, which is identical to the merge candidate list of the 2N ⁇ 2N prediction unit.
  • TU truncated unary binarization
  • a maximum of four merge candidates are selected among candidates located in the positions depicted in FIG. 3.
  • the order of derivation is A 1 , B 1 , B 0 , A 0 and B 2 .
  • Position B 2 is considered onlywhen any PU of position A 1 , B 1 , B 0 , A 0 is not available (e.g. because it belongs to another slice or tile) or is intra coded.
  • candidate at position A 1 is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved.
  • a redundancy check To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead only the pairs linked with an arrow in FIG.
  • FIG. 5 depicts the second PU for the case of N ⁇ 2N and 2N ⁇ N, respectively.
  • candidate at position A 1 is not considered for list construction. In fact, by adding this candidate will lead to two prediction units having the same motion information, which is redundant to just have one PU in a coding unit.
  • position B 1 is not considered when the current PU is partitioned as 2N ⁇ N.
  • a scaled motion vector is derived based on co-located PU belonging to the picture which has the smallest POC difference with current picture within the given reference picture list.
  • the reference picture list to be used for derivation of the co-located PU is explicitly signalled in the slice header.
  • the scaled motion vector for temporal merge candidate is obtained as illustrated by the dashed line in FIG.
  • tb is defined to be the POC difference between the reference picture of the current picture and the current picture
  • td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture.
  • the reference picture index of temporal merge candidate is set equal to zero.
  • FIG. 6 is an illustration of motion vector scaling for temporal merge candidate.
  • the position for the temporal candidate is selected between candidates C 0 and C 1 , as depicted in FIG. 7. If PU at position C 0 is not available, is intra coded, or is outside of the current CTU row, position C 1 is used. Otherwise, position C 0 is used in the derivation of the temporal merge candidate.
  • Zero merge candidate Combined bi-predictive merge candidates are generated by utilizing spatial and temporal merge candidates. Combined bi-predictive merge candidate is used for B-Slice only. The combined bi-predictive candidates are generated by combining the first reference picture list motion parameters of an initial candidate with the second reference picture list motion parameters of another. If these two tuples provide different motion hypotheses, they will form a new bi-predictive candidate. As an example, FIG.
  • Zero motion candidates are inserted to fill the remaining entries in the merge candidates list and therefore hit the MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index which starts from zero and increases every time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is one and two for uni and bi-directional prediction, respectively. Finally, no redundancy check is performed on these candidates.
  • HEVC defines the motion estimation region (MER) whose size is signalled in the picture parameter set using the “log2_parallel_merge_level_minus2” syntax element. When a MER is defined, merge candidates falling in the same region are marked as unavailable and therefore not considered in the list construction.
  • AMVP exploits spatio-temporal correlation of motion vector with neighboring PUs, which is used for explicit transmission of motion parameters.
  • a motion vector candidate list is constructed by firstly checking availability of left, above temporally neighboring PU positions, removing redundant candidates and adding zero vector to make the candidate list to be constant length. Then, the encoder can select the best predictor from the candidate list and transmit the corresponding index indicating the chosen candidate. Similarly with merge index signalling, the index of the best motion vector candidate is encoded using truncated unary. The maximum value to be encoded in this case is 2 (see FIG. 9) .
  • the maximum value to be encoded is 2 (see FIG. 9) .
  • FIG. 9 summarizes derivation process for motion vector prediction candidate.
  • motion vector candidate two types are considered: spatial motion vector candidate and temporal motion vector candidate.
  • spatial motion vector candidate derivation two motion vector candidates are eventually derived based on motion vectors of each PU located in five different positions as depicted in FIG. 3.
  • one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatio-temporal candidates is made, duplicated motion vector candidates in the list are removed. If the number of potential candidates is larger than two, motion vector candidates whose reference picture index within the associated reference picture list is larger than 1 are removed from the list. If the number of spatio-temporal motion vector candidates is smaller than two, additional zero motion vector candidates is added to the list.
  • a maximum of two candidates are considered among five potential candidates, which are derived from PUs located in positions as depicted in FIG. 3, those positions being the same as those of motion merge.
  • the order of derivation for the left side of the current PU is defined as A 0 , A 1 , and scaled A 0 , scaled A 1 .
  • the order of derivation for the above side of the current PU is defined as B 0 , B 1 , B 2 , scaled B 0 , scaled B 1 , scaled B 2 .
  • the no-spatial-scaling cases are checked first followed by the spatial scaling. Spatial scaling is considered when the POC is different between the reference picture of the neighboring PU and that of the current PU regardless of reference picture list. If all PUs of left candidates are not available or are intra coded, scaling for the above motion vector is allowed to help parallel derivation of left and above MV candidates. Otherwise, spatial scaling is not allowed for the above motion vector.
  • FIG. 10 is an illustration of motion vector scaling for spatial motion vector candidate.
  • the motion vector of the neighboring PU is scaled in a similar manner as for temporal scaling, as depicted as FIG. 10.
  • the main difference is that the reference picture list and index of current PU is given as input; the actual scaling process is the same as that of temporal scaling.
  • each CU can have at most one set of motion parameters for each prediction direction.
  • Two sub-CU level motion vector prediction methods are considered in the encoder by splitting a large CU into sub-CUs and deriving motion information for all the sub-CUs of the large CU.
  • Alternative temporal motion vector prediction (ATMVP) method allows each CU to fetch multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture.
  • STMVP spatial-temporal motion vector prediction
  • the motion compression for the reference frames is currently disabled.
  • the motion vectors temporal motion vector prediction is modified by fetching multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU.
  • the sub-CUs are square N ⁇ N blocks (N is set to 4 by default) .
  • ATMVP predicts the motion vectors of the sub-CUs within a CU in two steps.
  • the first step is to identify the corresponding block in a reference picture with a so-called temporal vector.
  • the reference picture is called the motion source picture.
  • the second step is to split the current CU into sub-CUs and obtain the motion vectors as well as the reference indices of each sub-CU from the block corresponding to each sub-CU, as shown in FIG. 11.
  • a reference picture and the corresponding block is determined by the motion information of the spatial neighbouring blocks of the current CU.
  • the first merge candidate in the merge candidate list of the current CU is used.
  • the first available motion vector as well as its associated reference index are set to be the temporal vector and the index to the motion source picture. This way, in ATMVP, the corresponding block may be more accurately identified, compared with TMVP, wherein the corresponding block (sometimes called collocated block) is always in a bottom-right or center position relative to the current CU.
  • a corresponding block of the sub-CU is identified by the temporal vector in the motion source picture, by adding to the coordinate of the current CU the temporal vector.
  • the motion information of its corresponding block (the smallest motion grid that covers the center sample) is used to derive the motion information for the sub-CU.
  • the motion information of a corresponding N ⁇ N block is identified, it is converted to the motion vectors and reference indices of the current sub-CU, in the same way as TMVP of HEVC, wherein motion scaling and other procedures apply.
  • the decoder checks whether the low-delay condition (i.e.
  • motion vector MV x the motion vector corresponding to reference picture list X
  • motion vector MV y the motion vector corresponding to 0 or 1 and Y being equal to 1-X
  • FIG. 12 illustrates this concept. Let us consider an 8 ⁇ 8 CU which contains four 4 ⁇ 4 sub-CUs A, B, C, and D. The neighbouring 4 ⁇ 4 blocks in the current frame are labelled as a, b, c, and d.
  • the motion derivation for sub-CU A starts by identifying its two spatial neighbours.
  • the first neighbour is the N ⁇ N block above sub-CU A (block c) . If this block c is not available or is intra coded the other N ⁇ N blocks above sub-CU A are checked (from left to right, starting at block c) .
  • the second neighbour is a block to the left of the sub-CU A (block b) . If block b is not available or is intra coded other blocks to the left of sub-CU A are checked (from top to bottom, staring at block b) .
  • the motion information obtained from the neighbouring blocks for each list is scaled to the first reference frame for a given list.
  • temporal motion vector predictor (TMVP) of sub-block A is derived by following the same procedure of TMVP derivation as specified in HEVC.
  • the motion information of the collocated block at location D is fetched and scaled accordingly.
  • all available motion vectors (up to 3) are averaged separately for each reference list. The averaged motion vector is assigned as the motion vector of the current sub-CU.
  • the sub-CU modes are enabled as additional merge candidates and there is no additional syntax element required to signal the modes.
  • Two additional merge candidates are added to merge candidates list of each CU to represent the ATMVP mode and STMVP mode. Up to seven merge candidates are used, if the sequence parameter set indicates that ATMVP and STMVP are enabled.
  • the encoding logic of the additional merge candidates is the same as for the merge candidates in the HM, which means, for each CU in P or B slice, two more RD checks is needed for the two additional merge candidates.
  • Tencent proposes to derive additional spatial merge candidates from positions in an outer reference area which has an offset of (-96, -96) to the current block.
  • each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidates.
  • Each candidate A (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous A or D candidates.
  • Each E (i, j) has an offset of 16 in both horizontal direction and vertical direction compared to its previous E candidates. The candidates are checked from inside to the outside.
  • the order of the candidates is A (i, j) , B (i, j) , C (i, j) , D (i, j) , and E (i, j) .
  • the candidates are added after TMVP candidates in the merge candidate list.
  • the extended spatial positions from 6 to 27 as in FIG. 15 are checked according to their numerical order after the temporal candidate.
  • all the spatial candidates are restricted within two CTU lines.
  • an 8-tap separable DCT-based interpolation filter is used for 2/4 precision samples and a 7-tap separable DCT-based interpolation filter is used for 1/4 precisions samples, as shown in Table 1.
  • Table 1 8-tap DCT-IF coefficients for 1/4th luma interpolation.
  • a 4-tap separable DCT-based interpolation filter is used for the chroma interpolation filter, as shown in Table 2.
  • Table 2 4-tap DCT-IF coefficients for 1/8th chroma interpolation.
  • bit-depth of the output of the interpolation filter is maintained to 14-bit accuracy, regardless of the source bit-depth, before the averaging of the two prediction signals.
  • the actual averaging process is done implicitly with the bit-depth reduction process as:
  • predSamples [x, y] (predSamplesL0 [x, y] + predSamplesL1 [x, y] + offset) >> shift
  • h k, 0 (-A k, -3 + 4 *A k, -2 -11 *A k, -1 + 40 *A k, 0 + 40 *A k, 1 -11 *A k, 2 + 4 *A k, 3 -A k, 4 ) >> shift1 (2-3)
  • Table 4 interpolation required for WxH luma component when the interpolation order is reversed.
  • interpolation order can lead to different interpolation result when bitdepth of the input video is greater than 8. Therefore, the interpolation order shall be defined implicitly in both encoder and decoder.
  • the interpolation filter tap in motion compensation
  • N for example, 8, 6, 4, or 2
  • WxH the current block size
  • MMVD the number of allowed MVD in MMVD (such as the number of entry to the distance table) is M.
  • the interpolation order depends on the current coding block shape (e.g., the coding block is a CU) .
  • a for block (such as CU, PU or sub-block used in sub-block based prediction like affine, ATMVP or BIO) with width > height, vertical interpolation is firstly performed, and then horizonal interpolation is performed, e.g., pixels d k, 0 , h k, 0 and n k, 0 are firstly interpolated and e 0, 0 to r 0, 0 are then interpolated.
  • An example of j 0, 0 is shown in equation 2-3 and 2-4.
  • a block such as CU, PU or sub-block used in sub-block based prediction like affine, ATMVP or BIO
  • a block such as CU, PU or sub-block used in sub-block based prediction like affine, ATMVP or BIO
  • horizonal interpolation is firstly performed, and then vertical interpolation is performed
  • both the luma component and the chroma components follow the same interpolation order.
  • one chroma coding block corresponds to multiple luma coding blocks (e.g., for 4: 2: 0 color format, one chroma 4x4 block may correspond to two 8x4 or 4x8 luma blocks)
  • luma and chroma may use different interpolation orders.
  • the scaling factors in the multiple stages may be further changed accordingly.
  • the interpolation order of luma component can further depend on the MV.
  • the proposed methods are only applied to square coding blocks.
  • the associated motion information may be modified to integer precision (e.g., via rounding) before invoking motion compensation process.
  • merge candidates with fractional merge candidates may be excluded from the merge list.
  • fractional motion vectors may be firstly modified to integer precision (e.g., via rounding) before being added to the merge list.
  • a separate HMVP table may be kept on-the-fly to store motion candidates with integer precisions.
  • the above methods may be only applied when the merge candidate is a bi-prediction candidate.
  • the above methods may be applied to certain block dimensions, such as 4x16, 16x4, 4x8, 8x4, 4x4.
  • the above methods may be applied to the AMVP coded blocks wherein the merge candidate may be replaced by an AMVP candidate.
  • the above methods may be applied to certain block modes, such as non-affine mode.
  • the MMVD side information (such as distance table, directions) may be dependent on block dimension and/or prediction direction (e.g., uni-prediction or bi-prediction) .
  • a distance table with all integer precisions may be defined or signaled.
  • the base merge candidate may be firstly modified (such as via rounding) to integer precision and then used to derive the final motion vectors for motion compensation.
  • MV in MMVD mode may be constrained to be with integer-pel precision or half-pel precision for some block sizes or block shapes.
  • the base merge candidates used in MMVD may be firstly modified to integer-pel precision (such as via rounding) .
  • the base merge candidates used in MMVD may be modified to half-pel precision (such as via rounding) .
  • rounding may be performed in the base merge list construction process, therefore, rounded MVs are used in pruning.
  • rounding may be performed after the base merge list construction process, therefore, unrounded MVs are used in pruning.
  • binarization of MVD index may be modified because the maximum MVD index is M –K –1 instead of M –1.
  • different context may be used in CABAC coding.
  • rounding may be performed after deriving the MV in MMVD mode.
  • the constraint may be different for bi-prediction and uni-prediction.
  • the constraint may be not applied in uni-prediction.
  • the constraint may be different for different block sizes or block shapes.
  • half-pel MV components or/and quarter-pel MV components may be constrained for some block sizes or block shapes.
  • bitstream shall conform to the constraint.
  • the constraint may be different for bi-prediction and uni-prediction.
  • the constraint may be not applied in uni-prediction.
  • the constraint may be different for different block sizes or block shapes.
  • some components of a MV may be rounded to integer-pel precision or half-pel precision depending on the dimension (e.g., width and/or height, ratios of width and height) , or/and prediction direction or/and motion information of a block.
  • MV is rounded to the nearest integer-pel precision MV or/and half-pel precision MV.
  • rounding down rounding up, rounding towards zero or rounding away from zero may be used.
  • MV rounding may be applied to horizonal or/and vertical MV component.
  • MV rounding may be applied to horizonal (or vertical) MV component.
  • thresholds L and L1 may be different for bi-predicted blocks and uni-predicted blocks. For example, smaller thresholds may be used for bi-predicted blocks.
  • MV rounding may be applied.
  • MV rounding may be applied only when both horizonal and vertical components of the MV are fractional, i.e., they point to fractional pixel position instead of integer pixel position.
  • MV rounding Whether MV rounding is applied or not may depend on whether the current block is bi-predicted or uni-predicted.
  • MV rounding may be applied only when the current block is bi-predicted.
  • MV rounding i. Whether MV rounding is applied or not may depend on the prediction direction (e.g., from List 0 or list 1) and/or the associated motion vectors. In one example, for bi-predicted blocks, whether MV rounding is applied or not may be different for different prediction directions.
  • MV rounding may be applied to N MV components for prediction direction X; otherwise, MV rounding may be not applied.
  • N 0, 1 or 2.
  • N and M may be different for bi-predicted blocks and uni-predicted blocks.
  • N and M may be different for different block sizes (width or/and height or/and width *height) .
  • N is equal to 4 and M is equal to 4.
  • N is equal to 4 and M is equal to 3.
  • N is equal to 4 and M is equal to 2.
  • N is equal to 4 and M is equal to 1.
  • N is equal to 3 and M is equal to 3.
  • N is equal to 3 and M is equal to 2.
  • N is equal to 3 and M is equal to 1.
  • N is equal to 2 and M is equal to 2.
  • N is equal to 2 and M is equal to 1.
  • N is equal to 1 and M is equal to 1.
  • N is equal to 2 and M is equal to 2.
  • N is equal to 2 and M is equal to 1.
  • N is equal to 1 and M is equal to 1.
  • MV rounding Whether MV rounding is applied or not may be different for different color components such as Y, Cb and Cr.
  • MV rounding may depend on color formats such as 4: 2: 0, 4: 2: 2 or 4: 4: 4.
  • MV rounding may depend on the block size (or width, height) , block shapes, prediction direction etc.
  • some MV components of 4x16 or/and 16x4 bi-predicted or/and uni-predicted luma blocks may be rounded to half-pel precision.
  • some MV components of of 4x4 uni-predicted or/and bi-predicted luma blocks may be rounded to integer-pel precision.
  • some MV components of 4x8 or/and 8x4 bi-predicted or/and uni-predicted luma blocks may be rounded to integer-pel precision.
  • the MV rounding may be not applied on sub-block prediction, such as affine prediction.
  • the MV rounding may be applied on sub-block prediction, such as ATMVP prediction.
  • each sub-block is treated as a coding block to judge whether and how to apply MV rounding.
  • motion vectors of one block shall be modified to integer precision before being utilized for motion compensation, for example, if they are fractional precisions.
  • the stored motion vectors and those utilized for motion compensation may be in different precisions.
  • sub-pel precision (a.k.a., fractional precision, such as 1/4-pel, 1/16-pel) may be stored for blocks with certain block dimensions, but the motion compensation process is based on integer version of those motion vectors (such as via rounding) .
  • an indication of disallowing bi-prediction for certain block dimensions may be signaled in sequence parameter set/picture parameter set/sequence header/picture header/tile header/tile group header/CTU rows/regions/other high-level syntax.
  • an indication of disallowing bi-prediction for certain block dimensions may be signaled in sequence parameter set/picture parameter set/sequence header/picture header/tile header/tile group header/CTU rows/regions/other high-level syntax.
  • an indication of disallowing bi-prediction and/or uni-prediction for certain block dimensions may be signaled in sequence parameter set/picture parameter set/sequence header/picture header/tile header/tile group header/CTU rows/regions/other high-level syntax.
  • such indications may be only applied to certain modes, such as non-affine mode.
  • the signaling of AMVR indices may be modified accordingly, such as only integer-pel precisions are allowed, or different MV precisions may be utilized instead.
  • a conformance bitstream shall follow the rule that for certain block dimensions, only integer-pel motion vectors are allowed for bi-prediction coded blocks.
  • block dimensions mentioned above are, for example, 4x16, 16x4, 4x8, 8x4, 4x4.
  • filters with different interpolation filters may be used in interpolation depending on the dimension (e.g., width and/or height, ratios of width and height) of a block.
  • Different filters may be used for vertical interpolation and horizontal interpolation. For example, shorter tap filter may be applied for vertical interpolation compared to that for horizontal interpolation.
  • interpolation filters with less taps than the interpolation filters in VTM-3.0 may be applied in some cases. These interpolation filters with less taps are also called “short-tap filters” .
  • different filters e.g., short-tap filters
  • different filters e.g., short-tap filters
  • a different filter from those used for other kinds of blocks may be selected.
  • the short-tap filters may be used only when both horizonal and vertical components of the MV are fractional, i.e., they point to fractional pixel position instead of integer pixel position.
  • Which filter to be used may depend on whether the current block is bi-predicted or uni-predicted.
  • the short-tap filters may be used only when the current block is bi-predicted.
  • Which filter to be used may depend on the prediction direction (e.g., from List 0 or list 1) and/or the associated motion vectors. In one example, for bi-predicted blocks, whether short-tap filters are used or not may be different for different prediction direction.
  • N and M may be different for bi-predicted blocks and uni-predicted blocks.
  • N and M may be different for different block sizes (width or/and height or/and width *height) .
  • N is equal to 4 and M is equal to 4.
  • N is equal to 4 and M is equal to 3.
  • N is equal to 4 and M is equal to 2.
  • N is equal to 4 and M is equal to 1.
  • N is equal to 3 and M is equal to 3.
  • N is equal to 3 and M is equal to 2.
  • N is equal to 3 and M is equal to 1.
  • N is equal to 2 and M is equal to 2.
  • N is equal to 2 and M is equal to 1.
  • N is equal to 1 and M is equal to 1.
  • N is equal to 2 and M is equal to 2.
  • N is equal to 2 and M is equal to 1.
  • N is equal to 1 and M is equal to 1.
  • K of the M MV components use S1-tap filter
  • S1 is equal to 6 and S2 is equal to 4.
  • different filters may be used only for some pixels. For example, they are used only for boundary pixels of the block.
  • short-tap filters may be different for uni-predicted blocks and bi-predicted blocks.
  • short-tap filters may be different for different color components such as Y, Cb and Cr.
  • whether to and how to apply short-tap filters may depend on color formats such as 4: 2: 0, 4: 2: 2 or 4: 4: 4.
  • Different short-tap filters may be used for different blocks.
  • the selected short-tap filters may depend on the block size (or width, height) , block shapes, prediction direction etc.
  • 7-tap filter is used for horizonal and vertical interpolation of 4x16 or/and 16x4 bi-predicted or/and uni-predicted luma blocks.
  • 7-tap filter is used for horizonal (or vertical) interpolation of 4x4 uni-predicted or/and bi-predicted luma blocks.
  • 6-tap filter is used for horizonal and vertical interpolation of 4x8 or/and 8x4 bi-predicted or/and uni-predicted luma blocks.
  • 6-tap filter and 5-tap filter are used in horizonal interpolation and vertical interpolation respectively for 4x8 or/and 8x4 bi-predicted or/and uni-predicted luma blocks.
  • Different short-tap filters may be used for different kinds of motion vectors.
  • longer tap length filters may be used for motion vectors that only have fractional components in one direction (i.e., either horizonal or vertical direction)
  • shorter tap length filters may be used for motion vectors that have fractional components in both horizonal and vertical directions.
  • 8-tap filter is used for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-predicted or/and uni-predicted blocks that only have fractional MV components in one direction, and short-tap filters described in bullet 3.
  • h is used for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-predicted or/and uni-predicted blocks that have fractional MV components in both directions.
  • interpolation filters used for affine motion may be different from that used for translational motion vectors.
  • short-tap interpolation filters may be used for affine motion compared to those used for translational motion vectors.
  • the short-tap filters may not be applied on sub-block prediction, such as affine prediction.
  • the short-tap filters may be applied on sub-block prediction, such as ATMVP prediction.
  • each sub-block is treated as a coding block to judge whether and how to apply short-tap filters.
  • whether to apply short-tap filters and/or how to apply short-tap filters may depend on the block dimension, coded information, etc. al.
  • short-tap filters may be applied.
  • padding or derivation from fetched reference samples may be applied.
  • pixels at the reference block boundaries are repeated to generate a (W + N –1) * (H + N –1) block, which is used for the final interpolation.
  • the fetched reference pixels may be identified by (x + MVXInt –N/2 + offSet1, y + MVYInt –N/2 + offSet2) , wherein (x, y) is the top-left position of the current block, (MVXInt, MVYInt) is the integer part of the MV, offSet1 and offSet2 are integers such as -2, -1, 0, 1, 2 etc.
  • PH is zero, and only left or/and right boundaries are repeated.
  • PW is zero, and only top or/and bottom boundaries are repeated.
  • both PW and PH are greater than zero, and first the left or/and the right boundaries are repeated, and then the top or/and bottom boundaries are repeated.
  • both PW and PH are greater than zero, and first the top or/and bottom boundaries are repeated, and then the left or/and right boundaries are repeated.
  • M1 (or PW –M1) is greater than 1, instead of repeating the first left (or right) column M1 times, multiple columns may be utilized, such as the M1 left columns (or PW –M1 right columns) may be repeated.
  • M2 (or PH –M2) is greater than 1, instead of repeating the first top (or bottom) row M2 times, multiple rows may be utilized, such as the M2 top rows (or PH –M2 bottom rows) may be repeated.
  • some default values may be used for boundary padding.
  • boundary pixels repeating method may be used only when both horizonal and vertical components of the MV are fractional, i.e., they point to fractional pixel position instead of integer pixel position.
  • boundary pixels repeating method may be applied to some of or all reference blocks.
  • N and M may be different for bi-predicted blocks and uni-predicted blocks.
  • N and M may be different for different block sizes (width or/and height or/and width *height) .
  • N is equal to 4 and M is equal to 4.
  • N is equal to 4 and M is equal to 3.
  • N is equal to 4 and M is equal to 2.
  • N is equal to 4 and M is equal to 1.
  • N is equal to 3 and M is equal to 3.
  • N is equal to 3 and M is equal to 2.
  • N is equal to 3 and M is equal to 1.
  • N is equal to 2 and M is equal to 2.
  • N is equal to 2 and M is equal to 1.
  • N is equal to 1 and M is equal to 1.
  • N is equal to 2 and M is equal to 2.
  • N is equal to 2 and M is equal to 1.
  • N is equal to 1 and M is equal to 1.
  • Different boundary pixel repeating method may be used for the M MV components.
  • m. PW and/or PH may be different for different color components such as Y, Cb and Cr.
  • boundary pixel repeating may depend on color formats such as 4: 2: 0, 4: 2: 2 or 4: 4: 4.
  • PW and/or PH may be different for different block size or shape.
  • PW and PH are set equal to 1 for 4x16 or/and 16x4 bi-predicted or/and uni-predicted blocks.
  • PW and PH are set equal to 0 and 1 (or 1 and 0) , respectively, for 4x4 bi-predicted or/and uni-predicted blocks.
  • PW and PH are set equal to 2 for 4x8 or/and 8x4 bi-predicted or/and uni-predicted blocks.
  • PW and PH are set equal to 2 and 3 (or 3 and 2) respectively for 4x8 or/and 8x4 bi-predicted or/and uni-predicted blocks.
  • PW and PH may be different for uni-prediction and bi-prediction.
  • p. PW and PH may be different for different kinds of motion vectors.
  • PW and PH may be smaller (or even zero) for motion vectors that only have fractional components in one direction (i.e., either horizonal or vertical direction) , and they may be larger for motion vectors that have fractional components in both horizonal and vertical directions.
  • PW and PH are set equal to 0 for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-predicted or/and uni-predicted blocks that only have fractional MV components in one direction, and PW and PH described bullet 4.
  • i is used for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-predicted or/and uni-predicted blocks that have fractional MV components in both direction.
  • Figure 21 shows an example of repeat boundary pixels of a reference block before interpolation.
  • the proposed methods may be applied to certain modes, block sizes/shapes, and/or certain sub-block sizes.
  • the proposed methods may be applied to certain modes, such as bi-predicted mode.
  • the proposed methods may be applied to certain block sizes.
  • the proposed methods may be applied to certain color component (such as only luma component) .
  • Shift (x, s) is defined as
  • off is an integer such as 0 or 2 s-1 .
  • c It may be defined as those used for motion vector rounding in the AMVR process, affine process or other process modules.
  • how to round the MVs may be dependent of MV components.
  • y-component of MV is rounded to integer pixel but x-component of MV is not rounded.
  • the MV may be rounded to integer pixels before motion compensation for luma component, but rounded to 2-pel pixels before motion compensation for chroma components when the color format is 4: 2: 0.
  • bi-linear filter is used to do interpolation filtering for one or multiple specific cases, such as:
  • bilinear filter may be used.
  • short-tap or a second interpolation filter may be applied to a reference picture list which involves multiple reference blocks while for another reference picture with only one reference block, the same filter as that used for normal prediction mode may be applied.
  • the proposed method may be applied under certain conditions, such as certain temporal layer (s) , quantization parameters of a block/a tile/a slice/a picture containing the block is within a range (such as larger than a threshold) .
  • certain temporal layer (s) quantization parameters of a block/a tile/a slice/a picture containing the block is within a range (such as larger than a threshold) .
  • FIG. 17 is a block diagram of a video processing apparatus 1700.
  • the apparatus 1700 may be used to implement one or more of the methods described herein.
  • the apparatus 1700 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, and so on.
  • the apparatus 1700 may include one or more processors 1702, one or more memories 1704 and video processing hardware 1706.
  • the processor (s) 1702 may be configured to implement one or more methods described in the present document.
  • the memory (memories) 1704 may be used for storing data and code used for implementing the methods and techniques described herein.
  • the video processing hardware 1706 may be used to implement, in hardware circuitry, some techniques described in the present document.
  • FIG. 19 is a flowchart for a method 1900 of video bitstream processing.
  • the method 1900 includes determining (1905) a shape of a video block, determining (1910) an interpolation order based on the video block, the interpolation order being indicative of a sequence of performing horizontal interpolation and vertical interpolation, and performing the horizontal interpolation and the vertical interpolation in accordance with the interpolation order for the video block to reconstruct (1915) a decoded representation of the video block.
  • FIG. 20 is a flowchart for a method 2000 of video bitstream processing.
  • the method 2000 includes determining (2005) characteristics of a motion vector related to a video block, determining (2010) an interpolation order of the video block based on the characteristics of the motion vector, the interpolation order being indicative of a sequence of performing horizontal interpolation and vertical interpolation, and performing the horizontal interpolation and the vertical interpolation in accordance with the interpolation order for the video block to reconstruct (2015) a decoded representation of the video block.
  • FIG. 22 is a flowchart for a method 2200 of video bitstream processing.
  • the method 2200 includes determining (2205) dimension characteristics of a first video block, determining (2210) that a first interpolation filter is to be applied to the first video block based on the determination of the dimension characteristics, and performing (2215) further processing of the first video block using the first interpolation filter.
  • FIG. 23 is a flowchart for a method 2300 of video bitstream processing.
  • the method 2300 includes determining (2305) first characteristics of a first video block, determining (2310) that a first interpolation filter is to be applied to the first video block based on the determination of the first characteristics, performing (2315) further processing of the first video block using the first interpolation filter, determining (2320) second characteristics of a second video block, determining (2325) that a second interpolation filter is to be applied to the first video block based on the second characteristics, the first interpolation filter and the second interpolation filter being different short-tap filters, and performing (2330) further processing of the second video block using the second interpolation filter.
  • FIG. 24 is a flowchart for a method 2400 of video bitstream processing.
  • the method 2400 includes: determining (2405) , during a conversion between a current video block and a bitstream representation of the current video block, one or more parameters of the current video block, wherein the one or more parameters of the current video block comprise at least one of a dimension and a prediction direction of the current video block; determining (2410) MMVD (Merge mode with Motion Vector Difference) side information at least based on the one or more parameters of the current video block; and performing (2415) , at least based on the MMVD side information, the conversion; wherein the MMVD mode uses a motion vector expression that includes a starting point which is a base merge candidate, a motion magnitude distance and a motion direction for the current video block.
  • MMVD Motion mode with Motion Vector Difference
  • FIG. 25 is a flowchart for a method 2500 of video bitstream processing.
  • the method 2500 includes: determining (2505) , during a conversion between a current video block and a bitstream representation of the current video block, one or more parameters of the current video block, wherein the one or more parameters of the current video block comprise at least one of a size of the current video block and a shape of the current video block; determining (2510) , at least based on the one or more parameters of the current video block, a motion vector precision for the current video block; and performing (2515) , based on the determined precision, the conversion; wherein the current video block is conversed in a MMVD mode, and the MMVD mode uses a motion vector expression that includes a starting point which is a base merge candidate, a motion magnitude distance and a motion direction for the current video block.
  • Section 4 For example, as described in Section 4, under different shapes of the video block, a preference may be given to performing one of the horizontal interpolation or vertical interpolation first.
  • the horizontal interpolation is performed before the vertical interpolation, and in some embodiments the vertical interpolation is performed before the horizontal interpolation.
  • the video block may be encoded in the video bitstream in which bit efficiency may be achieved by using a bitstream generation rule related to interpolation orders that also depends on the shape of the video block.
  • the methods can include wherein rounding the motion vectors includes one or more of: rounding to a nearest integer-pel precision MV, or rounding to a half-pel precision MV.
  • the methods can include wherein rounding the MVs includes one or more of: rounding down, rounding up, rounding towards zero, or rounding away from zero.
  • the methods can include wherein wherein the dimension information represents that a size of the first video block is less than a threshold value, and rounding the MVs is applied to one or both of a horizontal MV component or a vertical MV component based on the dimension information representing that the size of the first video block is less than the threshold value.
  • the methods can include wherein the dimension information represents that a width or a height of the first video block is less than a threshold value, and rounding the MVs is applied to one or both of a horizontal MV component or a vertical MV component based on the dimension information representing that the width or the height of the first video block is less than the threshold value.
  • the methods can include wherein the threshold value is different for bi-predicted blocks and uni-predicted blocks.
  • the methods can include wherein the dimension information represents a ratio between a width and a height of the first video block is larger than a first threshold value or smaller than a second threshold value, and wherein the rounding of the MVs is based on the determination of the dimension information.
  • the methods can include wherein rounding the MVs is further based on both horizontal and vertical components of the MVs being fractional.
  • the methods can include wherein rounding the MVs is further based on the first video block being bi-predicted or uni-predicted.
  • the methods can include wherein rounding the MVs is further based on a prediction direction related to the first video block.
  • the methods can include wherein rounding the MVs is further based on color components of the first video block.
  • the methods can include wherein rounding the MVs is further based on a size of the first video block, a shape of the first video block, or a prediction shape of the first video block.
  • the methods can include wherein rounding the MVs is applied on sub-block prediction.
  • the methods can include wherein a short-tap filter is applied to MV components based on the MV components having fractional precision.
  • the methods can include wherein short-tap filters are applied based on a dimension of the first video block, or coded information of the first video block.
  • the methods can include wherein short-tap filters are applied based on a mode of the first video block.
  • the methods can include wherein default values are used for boundary padding related to the first video block.
  • the methods can include wherein the merge mode is one or more of: a regular merge list, a triangular merge list, an affine merge list, or other non-intra or non-AMVP mode.
  • the merge mode is one or more of: a regular merge list, a triangular merge list, an affine merge list, or other non-intra or non-AMVP mode.
  • the methods can include wherein merge candidates with fractional merge candidates are excluded from a merge list.
  • the methods can include wherein rounding the motion information includes rounding a merge candidate associated with fractional motion vectors to integer precision, and the modified motion information is inserted into a merge list.
  • the methods can include wherein the motion information is a bi-prediction candidate.
  • the methods can include wherein MMVD is mean magnitude of vector difference.
  • the methods can include wherein the motion vectors are in MMVD mode.
  • the methods can include wherein the first video block is an MMVD coded block to be associated with integer-pel precision, and wherein base merge candidates used in MMVD are modified to integer-pel precision via rounding.
  • the methods can include wherein the first video block is an MMVD coded block to be associated with half-pel precision, and wherein base merge candidates used in MMVD are modified to half-pel precision via rounding.
  • the methods can include wherein the threshold number is a maximum number of allowed half-pel MV components or quarter-pel MV components.
  • the methods can include wherein the threshold number is different between bi-prediction and uni-prediction.
  • the methods can include wherein an indication disallowing bi-prediction is signaled in a sequence parameter set, a picture parameter set, a sequence header, a picture header, a tile header, a tile group header, a CTU row, a region, or other high-level syntax.
  • the methods can include wherein the methods are in conformance with a bitstream rule that allows for only integer-pel motion vectors for bi-prediction coded blocks having particular dimensions.
  • the methods can include wherein the first video block has a size of: 4x6, 16x4, 4x8, 8x4, or 4x4.
  • the methods can include wherein modifying or rounding the motion information includes modifying different MV components differently.
  • the methods can include wherein a y-component of a first MV is modified or rounded to integer-pixel, and an x-component of the first MV is not modified or rounded.
  • the methods can include wherein a luma component of a first MV is rounded to integer pixels, and a chroma component of the first MV is rounded to 2-pel pixels.
  • the methods can include wherein the first MV is related to a video block having a color format that is 4: 2: 0.
  • the methods can include wherein the bilateral filter is used for 4x4 uni-prediction, 4x8 bi-prediction, 8x4-bi-prediction, 4x16 bi-prediction, 16x4 bi-prediction, 8x8 bi-prediction, 8x4 uni-prediction, or 4x8 uni-prediction.
  • the disclosed techniques may be embodied in video encoders or decoders to improve compression efficiency when the coding units being compressed have shaped that are significantly different than the traditional square shaped blocks or rectangular blocks that are half-square shaped.
  • new coding tools that use long or tall coding units such as 4x32 or 32x4 sized units may benefit from the disclosed techniques.
  • the disclosed and other solutions, examples, embodiments, modules and the functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them.
  • the disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
  • the computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more them.
  • data processing apparatus encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
  • a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program does not necessarily correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document) , in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code) .
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit) .
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read only memory or a random-access memory or both.
  • the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • a computer need not have such devices.
  • Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
  • magnetic disks e.g., internal hard disks or removable disks
  • magneto optical disks e.g., CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • PW and PH are designed for 4x16, 16x4, 4x4, 8x4 and 4x8 blocks.
  • the MV of the block in reference list X is MVX
  • the interpolation filter tap (in motion compensation) is N (for example, 8, 6, 4, or 2)
  • the current block size is WxH
  • the position (i.e., position of the top-left pixel) of current block is (x, y) .
  • the index of the rows and columns start from 1, for example, H rows include the 1st, ..., (H –1) th row.
  • PW and PH are both set equal to 1 for prediction direction X.
  • (W + N –2) * (H + N –2) reference pixels are fetched from the reference picture, wherein the top-left position of the reference pixels is identified by (MVXInt [0] + x –N/2 + 1, MVXInt [1] + y –N/2 + 1) .
  • the (W + N –1) th column is generated by copying the (W + N –2) th column.
  • the (H + N –1) th row is generated by copying the (H + N –2) th row.
  • PW and PH are set equal to 0 and 1 respectively.
  • (W + N –1) * (H + N –2) reference pixels are fetched from the reference picture, wherein the top-left position of the reference pixels is identified by (MVXInt [0] + x –N/2 + 1, MVXInt [1] + y –N/2 + 1) .
  • the (H + N –1) th row is generated by copying the (H + N –2) th row.
  • PW and PH are set equal to 2 and 3 respectively.
  • (W + N –3) * (H + N –4) reference pixels are fetched from the reference picture, wherein the top-left position of the reference pixels is identified by (MVXInt [0] + x –N/2 + 2, MVXInt [1] + y –N/2 + 2) .
  • the 1st column is copied to its left side to obtain W + N –2 columns, after that, the (W + N –1) th column is generated by copying the (W + N –2) th column.
  • PW and PH are both set equal to 1 for prediction direction X.
  • (W + N –2) * (H + N –2) reference pixels are fetched from the reference picture, wherein the top-left position of the reference pixels is identified by (MVXInt [0] + x –N/2 + 2, MVXInt [1] + y –N/2 + 2) .
  • the 1st column is copied to its left side to obtain W + N –1 columns.
  • the 1st row is copied to its upside to obtain H + N –1 rows.
  • PW and PH are set equal to 0 and 1 respectively.
  • (W + N –1) * (H + N –2) reference pixels are fetched from the reference picture, wherein the top-left position of the reference pixels is identified by (MVXInt [0] + x –N/2 + 1, MVXInt [1] + y –N/2 + 2) .
  • the 1st row is copied to its upside to obtain H + N –1 rows.
  • PW and PH are set equal to 2 and 3 respectively.
  • (W + N –3) * (H + N –4) reference pixels are fetched from the reference picture, wherein the top-left position of the reference pixels is identified by (MVXInt [0] + x –N/2 + 2, MVXInt [1] + y –N/2 + 2) .
  • the 1st column is copied to its left side to obtain W + N –2 columns, after that, the (W + N –1) th column is generated by copying the (W + N –2) th column.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne une amélioration du MMVD (mode de fusion à différence de vecteur de mouvement). Un procédé de traitement vidéo comprend les étapes consistant à : pendant une conversion entre un bloc vidéo actuel et une représentation d'un flux binaire du bloc vidéo actuel, déterminer un ou plusieurs paramètres du bloc vidéo actuel, lesdits un ou plusieurs paramètres du bloc vidéo actuel comportant une dimension et/ou une direction de prédiction du bloc vidéo actuel ; déterminer des informations côté MMVD (mode de fusion à différence de vecteur de mouvement) au moins sur la base desdits un ou plusieurs paramètres du bloc vidéo actuel ; et effectuer la conversion au moins sur la base des informations côté MMVD. Le mode MMVD utilise une expression de vecteur de mouvement qui contient un point de départ qui est un candidat à une fusion de base, une distance d'amplitude de mouvement et une direction de mouvement se rapportant au bloc vidéo actuel.
PCT/CN2020/071848 2019-01-12 2020-01-13 Amélioration du mmvd WO2020143837A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080008062.0A CN113273216B (zh) 2019-01-12 2020-01-13 Mmvd改进

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2019/071503 2019-01-12
CN2019071503 2019-01-12

Publications (1)

Publication Number Publication Date
WO2020143837A1 true WO2020143837A1 (fr) 2020-07-16

Family

ID=71520954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071848 WO2020143837A1 (fr) 2019-01-12 2020-01-13 Amélioration du mmvd

Country Status (2)

Country Link
CN (1) CN113273216B (fr)
WO (1) WO2020143837A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023118273A1 (fr) * 2021-12-21 2023-06-29 Interdigital Vc Holdings France, Sas Mmvd (différence de vecteur de mouvement fusionné) faisant appel à une carte de profondeur et/ou une carte de mouvement

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9686559B2 (en) * 2012-07-03 2017-06-20 Sharp Kabushiki Kaisha Image decoding device, and image encoding device
US20180352246A1 (en) * 2011-03-09 2018-12-06 Canon Kabushiki Kaisha Video encoding and decoding

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188207B (zh) * 2014-10-31 2023-10-20 三星电子株式会社 使用高精度跳过编码的视频编码设备和视频解码设备及其方法
CN107113424B (zh) * 2014-11-18 2019-11-22 联发科技股份有限公司 以帧间预测模式编码的块的视频编码和解码方法
US10887597B2 (en) * 2015-06-09 2021-01-05 Qualcomm Incorporated Systems and methods of determining illumination compensation parameters for video coding
US10271064B2 (en) * 2015-06-11 2019-04-23 Qualcomm Incorporated Sub-prediction unit motion vector prediction using spatial and/or temporal motion information
EP3357245A4 (fr) * 2015-11-05 2019-03-13 MediaTek Inc. Procédé et appareil d'inter prédiction utilisant un vecteur de mouvement moyen pour le codage vidéo
US10560718B2 (en) * 2016-05-13 2020-02-11 Qualcomm Incorporated Merge candidates for motion vector prediction for video coding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180352246A1 (en) * 2011-03-09 2018-12-06 Canon Kabushiki Kaisha Video encoding and decoding
US9686559B2 (en) * 2012-07-03 2017-06-20 Sharp Kabushiki Kaisha Image decoding device, and image encoding device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TOMONORI HASHIMOTO , TOMOHIRO IKAI: "Crosscheck of JVET-M0314 (CE4-related: MMVD improving with signaling distance table)", JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, no. M0597, 18 January 2019 (2019-01-18), Marrakech MA, pages 1 - 2, XP030214316 *
XU CHEN, JIANHUA ZHENG: "CE 4: Extension on MMVD (Test 4.2.5)", JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, no. M0291, 18 January 2019 (2019-01-18), Marrakech MA, pages 1 - 4, XP030197896 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023118273A1 (fr) * 2021-12-21 2023-06-29 Interdigital Vc Holdings France, Sas Mmvd (différence de vecteur de mouvement fusionné) faisant appel à une carte de profondeur et/ou une carte de mouvement

Also Published As

Publication number Publication date
CN113273216A (zh) 2021-08-17
CN113273216B (zh) 2022-09-13

Similar Documents

Publication Publication Date Title
US11997253B2 (en) Conditions for starting checking HMVP candidates depend on total number minus K
US11070820B2 (en) Condition dependent inter prediction with geometric partitioning
US11909951B2 (en) Interaction between lut and shared merge list
US11616945B2 (en) Simplified history based motion vector prediction
US11146785B2 (en) Selection of coded motion information for LUT updating
US11589071B2 (en) Invoke of LUT updating
US11595641B2 (en) Alternative interpolation filters in video coding
US11641483B2 (en) Interaction between merge list construction and other tools
US11503288B2 (en) Selective use of alternative interpolation filters in video processing
WO2020140862A1 (fr) Application conditionnelle de la prédiction inter avec un partitionnement géométrique dans le traitement vidéo
WO2020125628A1 (fr) Filtre d'interpolation dépendant de la forme
WO2020156515A1 (fr) Étapes de quantification affinée dans un codage vidéo
WO2020143837A1 (fr) Amélioration du mmvd
WO2020143830A1 (fr) Compensation de mouvement avec mv entiers
WO2020143831A1 (fr) Contraintes de précision de mv
WO2020012448A2 (fr) Ordre d'interpolation dépendant de la forme

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20738362

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 20738362

Country of ref document: EP

Kind code of ref document: A1