CN114424530A - Skip mode signaling - Google Patents

Skip mode signaling Download PDF

Info

Publication number
CN114424530A
CN114424530A CN202080064305.2A CN202080064305A CN114424530A CN 114424530 A CN114424530 A CN 114424530A CN 202080064305 A CN202080064305 A CN 202080064305A CN 114424530 A CN114424530 A CN 114424530A
Authority
CN
China
Prior art keywords
video
block
video block
indication
codec
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080064305.2A
Other languages
Chinese (zh)
Inventor
王洋
刘鸿彬
张莉
张凯
许继征
王悦
张娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Publication of CN114424530A publication Critical patent/CN114424530A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method of video processing is described. The method comprises the following steps: performing a transformation between a video block of a video and a codec representation of the video, wherein the codec representation complies with a format rule that specifies selectively including in the codec representation an indication of a skip mode codec of the video block based on a dimension of the video block, wherein the skip mode codec allows the transformation to be performed without generating a residual of the video block or a residual of the codec video block.

Description

Skip mode signaling
Cross Reference to Related Applications
According to applicable patent laws and/or rules applicable to paris convention, the application timely requires priority and benefit of international patent application No. PCT/CN2019/105825 filed on 13 th 9/2019 and international patent application No. PCT/CN2019/107107 filed on 20 th 9/2019. The entire disclosure of the above application is incorporated by reference as part of the disclosure of the present application.
Technical Field
This document relates to video and picture coding and decoding techniques.
Background
Digital video occupies the greatest bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth requirements for digital video will continue to grow.
Disclosure of Invention
Devices, systems, and methods related to digital video coding and decoding, and in particular to collocated motion vectors in video coding and decoding, are described. The described methods may be applied to existing video codec standards (e.g., High Efficiency Video Codec (HEVC)) and future video codec standards (e.g., multifunctional video codec (VCC)) or codecs.
In one exemplary aspect, a video processing method is disclosed. The method includes performing a conversion between a video block of the video and a codec representation of the video, wherein the codec representation conforms to a format rule that specifies selectively including an indication of a skip mode codec of the video block in the codec representation based on a dimension of the video block, wherein the skip mode codec allows the conversion to be performed without generating or codec a residual of the video block.
In another exemplary aspect, a video processing method is disclosed. The method includes determining, for a transition between a video block of the video and a codec representation of the video, an applicability of a particular codec tool to the video block as the first color component of the video based on whether the particular codec tool is applied to one or more corresponding video blocks as the second color component of the video; and performing a conversion based on the determination.
In yet another exemplary aspect, a video processing method is disclosed. The method includes determining, for a transition between a chroma video block of the video and a codec representation of the video, a position-dependent intra prediction combining (PDPC) method that does not allow the chroma video block to be codec due to a corresponding luma block being codec using a particular codec mode; and performing a conversion based on the determination, wherein the PDPC method combines the neighboring samples with a prediction signal of the chroma video block to generate a refined prediction signal of the chroma video block.
In yet another exemplary aspect, a video processing method is disclosed. The method includes performing a conversion between a video block of the video and a codec representation of the video, wherein the codec representation complies with a format rule that specifies that a codec condition satisfied by the video block includes an indication that is set to false indicating that an optional half-pixel interpolation filter is used instead of a default half-pixel interpolation filter.
In yet another exemplary aspect, a video processing method is disclosed. The method comprises performing a conversion between a video block of the video and a codec representation of the video according to a rule, wherein prediction vectors of the video block are generated based on a weighted average of the prediction vectors of a plurality of partitions of the video block, wherein at least one partition is angularly divided, and wherein the rule specifies that a selectable half-pixel interpolation filter is used for interpolating sample values at half-pixel positions in determining the prediction vectors.
In yet another exemplary aspect, a video processing method is disclosed. The method comprises performing a conversion between a video block of the video and a codec representation of the video according to a rule, wherein the rule specifies an indication of a codec condition to apply to a motion vector prediction candidate to be added to a Merge list based on information associated with one or more motion candidates present in the Merge list.
In yet another exemplary aspect, a method of video processing is disclosed. The method includes for a conversion between a video block of a video and a codec representation of the video, determining a control point motion vector for a control point of the video block that was codec using an affine Merge mode based on motion information of neighboring blocks of the video block according to a rule, and performing the conversion based on the determination, wherein the rule specifies that an indication to use a selectable half-pixel interpolation filter for the video block is set equal to an indication of the motion vector in the neighboring blocks.
In yet another exemplary aspect, a video processing method is disclosed. The method includes performing a conversion between a video block of the video and a codec representation of the video according to a rule, wherein the rule specifies that if the video block is codec using an affine Merge mode inherited from a neighboring block of the video block, an indication to use a selectable half-pixel interpolation filter for the video block is inherited from the neighboring block.
In yet another exemplary aspect, a video method is disclosed. The method comprises deriving motion information for video blocks of the video by examining the Merge candidates according to rules during a Merge candidate construction process; and performing a conversion between the video block and a codec representation of the video, wherein the rule specifies that two Merge candidates being compared are considered to be different during the Merge candidate construction process if the indication of their use of the selectable half-pixel interpolation filter is different.
In yet another example aspect, the method described above may be implemented by a video encoder apparatus comprising a processor.
In yet another example aspect, the methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.
These and other aspects are further described in this document.
Drawings
Fig. 1 shows a block diagram of an example encoder.
Fig. 2 shows an example of 67 intra prediction modes.
Fig. 3 shows examples of horizontal and vertical traversal scans.
Fig. 4 shows an example of motion vector scaling of the temporal domain Merge candidate.
Fig. 5 shows the candidate positions of the time domain Merge candidates.
Fig. 6A shows an example of spatial neighboring blocks used by an Alternative Temporal Motion Vector Prediction (ATMVP).
Fig. 6B shows an example of deriving sub-CU motion fields.
Fig. 7 shows an example of a search point of Merge using a motion vector difference (MMVD) mode.
Fig. 8 is a flow chart of an example of a video processing method.
Fig. 9 is a flowchart of another example of a video processing method.
Fig. 10 is a flow chart of yet another example of a video processing method.
Fig. 11 is a flowchart of yet another example of a video processing method.
Fig. 12 is a flowchart of yet another example of a video processing method.
Fig. 13 is a flowchart of yet another example of a video processing method.
Fig. 14A and 14B are block diagrams of examples of hardware platforms for visual media decoding that implement the visual media encoding techniques described in this document.
Fig. 15 shows an example of inter prediction based triangulation.
Fig. 16 shows an example of uni-directional prediction MV selection for the triangulation mode.
Fig. 17 shows an example of weights used in the mixing process.
Fig. 18 shows an example of inheriting the position of an affine motion predictor.
Fig. 19 shows an example of control point motion vector inheritance.
Fig. 20 shows an example of constructing the localization of candidate positions of the affine Merge mode.
21A-21E illustrate a flow diagram of an example method of video processing based on some embodiments of the disclosed technology.
Detailed Description
This document provides various techniques that may be used by a decoder of a picture or video bitstream to improve the quality of decompressed or decoded digital video or pictures. For the sake of brevity, the term "video" is used herein to include a series of pictures (conventionally referred to as video) and a single picture. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.
For ease of understanding, section headings are used in this document and do not limit embodiments and techniques to corresponding sections. As such, embodiments from one section may be combined with embodiments from other sections.
1. Abstract
This document relates to video encoding and decoding techniques. In particular, the present document relates to collected motion vectors and other codec tools. The present document may be applied to existing video codec standards, such as HEVC, or to standards to be finalized (universal video codec). This document may also be applicable to future video codec standards or video codecs.
2. Preliminary discussion
Video codec standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. ITU-T makes the H.261 and H.263 standards, ISO/IEC makes MPEG-1 and MPEG-4 Visual, and these two organizations jointly make the H.262/MPEG-2 video standard and the H.264/MPEG-4 Advanced Video Codec (AVC) and H.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, in which temporal prediction plus transform coding is used. In order to explore future video codec technologies beyond HEVC, VCEG and MPEG united in 2015 to form the joint video exploration team (jfet). Thereafter, JVET adopted many new approaches and applied them to a reference software named Joint Exploration Model (JEM). In month 4 of 2018, a joint video experts group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) was established to address the VVC standard that aims at a 50% reduction in bitrate compared to HEVC.
2.1 color space and chroma subsampling
A color space, also called a color model (or color system), is an abstract mathematical model that simply describes a range of colors as a tuple of numbers, typically 3 or 4 values or color components (e.g., RGB). Basically, the color space domain is a refinement of the coordinate system and the sub-space domain.
For video compression, the most common color space domain is YCbCr and RGB.
YCbCr, Y 'CbCr or Y Pb/CbPr/Cr, also known as YCbCr or Y' CbCr, are a series of color spaces used as part of a color picture pipeline in video and digital picture systems. Y' is the luminance component and CB and CR are the blue-difference and red-difference chrominance components. Y' (with prime) is different from Y, which is luminance, meaning that the light intensity is non-linearly encoded based on the RGB primaries of the gamma correction.
Chroma sub-sampling is a practice of encoding a picture by implementing a lower chroma information resolution than luma information, taking advantage of the human visual system's sensitivity to color differences than luma.
2.1.1 4:4:4
Each of the three Y' CbCr components has the same sampling rate and therefore no chrominance subsampling. This scheme is sometimes used in high-end movie scanners and movie post-production.
2.1.2 4:2:2
Two chrominance components are sampled at half the sampling rate of the luminance: the horizontal chrominance resolution is halved. This reduces the bandwidth of the uncompressed video signal by one third with little to no visual difference.
2.1.3 4:2:0
In 4:2:0, the horizontal sampling is doubled compared to 4:1:1, but since in this scheme the Cb and Cr channels are sampled only on each alternate line, the vertical resolution is halved. Therefore, the data rates are the same. Cb and Cr are sub-sampled by a factor of 2 in both the horizontal and vertical directions. There are three variations of the 4:2:0 scheme, with different horizontal and vertical orientations.
In MPEG-2, Cb and Cr are horizontally co-located. Cb and Cr are located between the pixels in the vertical direction (in the middle).
In JPEG/JFIF, H.261, and MPEG-1, Cb and Cr are located midway between the alternating luminance samples.
In 4:2:0DV, Cb and Cr are co-located in the horizontal direction. In the vertical direction, they are located together on alternating lines.
2.2 codec flow for typical video codecs
Fig. 1 shows an example of an encoder block diagram for a VVC, which contains three loop filter blocks: deblocking Filter (DF), Sample Adaptive Offset (SAO), and ALF. Unlike DF using a predefined filter, SAO and ALF utilize the original samples of the current picture to reduce the mean square error between the original samples and the reconstructed samples by adding an offset and applying a Finite Impulse Response (FIR) filter, respectively, and signaling the offset and filter coefficients using codec side information. ALF is located at the final processing stage of each picture and can be viewed as a tool that attempts to capture and fix artifacts created by the previous stage.
2.3 Intra mode codec with 67 Intra prediction modes
To capture any edge direction present in natural video, the number of directional intra modes extends from 33 to 65 used in HEVC. The additional directional mode is depicted as a dashed arrow in fig. 2, and the planar mode and the DC mode remain the same. These dense directional intra prediction modes are applicable to all block sizes as well as both luma intra prediction and chroma intra prediction.
As shown in fig. 2, the conventional angular intra prediction direction is defined as 45 degrees to-135 degrees in the clockwise direction. In VTM6, for non-square blocks, several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes. The alternative mode is signaled using the original method and remapped to the index of the wide-angle mode after parsing. The total number of intra prediction modes is unchanged, 67, and the intra mode codec is unchanged.
In HEVC, each intra coded block has a square shape and the length of each side is a power of 2. Therefore, no division operation is required to generate the intra prediction value using the DC mode. In VVC, the chunks may have a rectangular shape, and in general, a division operation needs to be used for each chunk. To avoid division operations for DC prediction, only the longer edges are used to calculate the average of the non-square blocks.
2.4 inter prediction
For each inter-predicted CU, the motion parameters include a motion vector, a reference picture index, and a reference picture list use index, and additional information required for new coding features of VVC for inter prediction sample generation. The motion parameters may be signaled explicitly or implicitly. When a CU is coded using skip mode, the CU is associated with one PU and does not have significant residual coefficients, coded motion vector delta, or reference picture index. A Merge mode is specified in which the motion parameters of the current CU, including spatial and temporal candidates, are obtained from neighboring CUs, as well as additional scheduling introduced in the VVC. The Merge mode may be applied to any inter-predicted CU, not just the skip mode. An alternative to the Merge mode is the explicit transmission of motion parameters, where the motion vectors, the corresponding reference picture index for each reference picture list and the reference picture list usage flag, as well as other required information, are signaled explicitly per CU.
2.5 Intra Block Copy (IBC)
Intra Block Copy (IBC) is a tool employed in HEVC extensions on SCC. It is well known that it significantly improves the codec efficiency of screen content material. Since the IBC mode is implemented as a block-level codec mode, Block Matching (BM) is performed at the encoder to find a preferred block vector (or motion vector) for each CU. Here, the block vector is used to indicate a displacement from the current block to a reference block, which has been reconstructed within the current picture. The luma block vector of a CU encoded and decoded by IBC is integer precision. The chroma block vector is also rounded to integer precision. When used in conjunction with AMVR, IBC mode can switch between 1-pixel and 4-pixel motion vector precision. A CU that is IBC coded is considered as a third prediction mode other than the intra or inter prediction modes. The IBC mode is applicable to CUs having a width and height that are less than or equal to 64 luminance samples.
At the encoder side, hash-based motion estimation is performed on the IBC. The encoder performs an RD check on blocks that are not larger than 16 luminance samples in width or height. For non-Merge mode, a block vector search is first performed using a hash-based search. If the hash search does not return a valid candidate, a local search based on block matching will be performed.
In a hash-based search, hash key value matching (32-bit CRC) between the current block and the reference block is extended to all allowed block sizes. The hash key calculation for each position in the current picture is based on 4 x 4 sub-blocks. For a current block having a large size, when all hash keys of all 4 × 4 sub-blocks match the hash key in the corresponding reference position, it is determined that the hash key matches the hash key of the reference block. If the hash keys of the plurality of reference blocks are found to match the hash key of the current block, the block vector cost of each matching reference is calculated and the matching reference with the smallest block vector cost is selected.
In the block matching search, the search range is set to cover both the previous CTU and the current CTU.
At the CU level, the IBC mode is signaled by a flag and can be signaled as IBC AMVP mode or IBC skip/Merge mode as follows:
IBC skip/Merge mode: the Merge candidate index is used to indicate which block vectors from neighboring candidate IBC coded and decoded blocks in the list are used to predict the current block. The Merge list is composed of spatial domain candidates, HMVP candidates, and pair candidates.
-IBC AMVP mode: the block vector difference is coded in the same manner as the motion vector difference. The block vector prediction method uses two candidates as predictors, one from the left neighbor and one from the upper neighbor (if IBC coded). When none of the neighbors is available, the default block vector will be used as the predictor. Signaling a flag to indicate a block vector predictor index.
2.6 palette mode
For palette MODE signaling, the palette MODE is coded as a prediction MODE of the codec unit, i.e., the prediction MODE of the codec unit may be MODE _ INTRA, MODE _ INTER, MODE _ IBC, and MODE _ PLT. If a palette mode is used, the pixel values in the CU are represented by a small set of representative color values. This set is called a palette. For pixels whose value is close to the palette color, the palette index is signaled. For pixels with values outside the palette, the pixels are represented with escape symbols and the quantized pixel values are signaled directly.
To decode a palette coded block, the decoder needs to decode the palette color and the index. The palette colors are described by a palette table and encoded by a palette table codec tool. An escape flag is signaled for each CU to indicate whether an escape symbol is present in the current CU. If an escape symbol is present, the palette table will be incremented by one and the last index is assigned to the escape pattern. The palette indices for all pixels in the CU form a palette index map and are encoded by a palette index map coding/decoding tool.
For the encoding and decoding of the palette table, the palette predictor is maintained. A predictor is initialized at the beginning of each stripe, where the predictor is reset to 0. For each entry in the palette predictor, a reuse flag is signaled to indicate whether it is part of the current palette. The reuse flag is sent using run length coding of zeros. After this, the number of new palette bins is signaled using an exponential Golomb code of order 0. Finally, the new palette entry component value is signaled. After encoding the current CU, the palette predictor will be updated with the current palette, and entries in the previous palette predictor that are not reused in the current palette will be added at the end of the new palette predictor until the allowed maximum size (palette fill) is reached.
To encode the palette index map, the index is encoded using a horizontal traversal scan and a vertical traversal scan, as shown in fig. 3. The scanning order is explicitly signaled in the bitstream using a palette _ transpose _ flag.
Palette indices are coded using two main palette sample modes: "INDEX" and "COPY _ ABOVE". Except when the top row is used for horizontal scanning, the first column is used for vertical scanning, or when the previous pattern is "COPY _ ABOVE", the pattern is signaled using a flag. In the "COPY _ ABOVE" mode, the palette index of a sample in the upper line is copied. In "INDEX" mode, the palette INDEX is explicitly signaled. For both the "INDEX" and "COPY _ ABOVE" modes, a run value is signaled that specifies the number of pixels that are coded using the same mode.
The coding order of the index map is as follows: first, the number of index values of a CU is signaled. The actual index value of the entire CU is then signaled using truncated binary codec. Both the number of indices and the index values are coded in bypass mode. This groups together index-dependent bypass bits. The palette mode (INDEX or COPY _ ABOVE) and run are then signaled in a crossing manner. Finally, the component escape values corresponding to the escape samples for the entire CU are grouped together and coded in bypass mode. An additional syntax element, last _ run _ type _ flag, is signaled after the index value is signaled. This syntax element is used in conjunction with the index number without signaling the run value corresponding to the last run in the block.
In VTM5.0, dual trees are enabled for I stripes, which separate the codec unit partitions for luma and chroma. Thus, in this scheme, the palette is applied to luminance (Y component) and chrominance (Cb and Cr components), respectively. If dual trees are disabled, the palette will be applied jointly to the Y, Cb, Cr components, as in the HEVC palette.
2.7 Temporal Motion Vector Prediction (TMVP) in VVC
In this derivation process of the temporal-domain Merge candidate, a scalar motion vector is derived based on collocated CUs belonging to collocated reference pictures. The reference picture list used to derive the collocated CU is explicitly signaled in the slice header. As indicated by the dashed line in fig. 4, a scalar motion vector for the temporal domain Merge candidate is obtained, which is scaled according to the motion vector of the collocated CU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal region Merge candidate is set equal to zero.
As shown in fig. 5, the location of the time domain candidate is selected between candidates C0 and C1. Position C1 is used if the CU at position C0 is not available, is intra-coded, or is outside the current row of CTUs. Otherwise, position C0 is used for the derivation of the time domain Merge candidate.
Subblock-based temporal motion vector prediction (SbTMVP) in 2.8VVC
The VTM supports a subblock-based temporal motion vector prediction (SbTMVP) method. Similar to Temporal Motion Vector Prediction (TMVP) in HEVC, SbTMVP uses motion fields in collocated pictures to improve motion vector prediction and Merge mode for CUs in a current picture. The same collocated picture used by TMVP is used for SbTVMP. SbTMVP differs from TMVP in two main respects:
TMVP predicts motion at CU level, but SbTMVP predicts motion at sub-CU level;
2. although TMVP obtains a temporal motion vector from a collocated block in a collocated picture (the collocated block is the bottom-right block or center block relative to the current CU), SbTMVP applies motion displacement, which is obtained from a motion vector from one of the spatial neighboring blocks of the current CU, before obtaining temporal motion information from the collocated picture.
The SbTVMP process is shown in FIGS. 6A-6B. SbTMVP predicts the motion vectors of sub-CUs within the current CU in two steps. In a first step, the spatial domain neighbor a1 in fig. 6A is examined. If a1 has a motion vector that uses the collocated picture as its reference picture, then this motion vector is selected as the motion displacement to be applied. If no such motion is identified, the motion displacement is set to (0, 0).
In the second step, as shown in fig. 6B, the motion displacement identified in step 1 (i.e., added to the coordinates of the current block) is applied to obtain sub-CU level motion information (motion vector and reference index) from the collocated picture. The example in fig. 6B assumes that the motion displacement is set to the motion of block a 1. Then, for each sub-CU, the motion information of the sub-CU is derived using the motion information of its corresponding block (the minimum motion grid covering the central samples) in the collocated picture. After identifying the motion information of the collocated sub-CU, the motion information is converted into a motion vector and a reference index of the current sub-CU in a similar manner as the TMVP process of HEVC, where temporal motion scaling is applied to align the reference picture of the temporal motion vector with the reference picture of the current CU.
In VTM6, a combined sub-block-based Merge list containing both SbTVMP candidates and affine Merge candidates is used to signal the sub-block-based Merge mode. The SbTVMP mode is enabled/disabled by a Sequence Parameter Set (SPS) flag. If SbTMVP mode is enabled, an SbTMVP predictor is added as the first entry of the list of subblock-based Merge candidates, and then affine Merge candidates. The size of the subblock-based Merge list is signaled in the SPS, and the maximum allowable size of the subblock-based Merge list is 5 in the VTM 6.
The sub-CU size used in SbTMVP is fixed to 8 × 8, and the SbTMVP mode is applicable only to CUs having a width and height greater than or equal to 8, as in the affine Merge mode.
The encoding logic of additional SbTMVP Merge candidates is the same as that of other Merge candidates, i.e. for each CU in a P-slice or B-slice, an additional RD check is performed to decide whether to use an SbTMVP candidate.
TMVP and SbTMVP description in the working draft.
8.5.2.11 derivation of temporal luma motion vector prediction
The inputs to this process:
-a luma position of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current codec block in luminance samples,
a variable cbHeight specifying the height of the current codec block in the luminance samples,
-a reference index refIdxLX, wherein X is 0 or 1.
The output of this process:
-1/16 motion vector prediction mvLXClE of fractional sample precision,
the availability flag availableFlagLXCol.
The variable currCb specifies the current luma codec block at luma location (xCb, yCb).
The variables mvLXCol and availableFlagLXCol are derived as follows:
-if slice _ temporal _ mvp _ enabled _ flag is equal to 0 or (cbWidth × cbHeight) is less than or equal to 32, both components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
-otherwise (slice _ temporal _ mvp _ enabled _ flag equal to 1), the following sequential steps apply:
1. the bottom-right collocated motion vector and the right and bottom boundary sample point locations are derived as follows:
xColBr=xCb+cbWidth (8-421)
yColBr=yCb+cbHeight (8-422)
rightBoundaryPos=subpic_treated_as_pic_flag[SubPicIdx]?SubPicRightBoundaryPos:pic_width_in_luma_samples-1 (8-423)
botBoundaryPos=subpic_treated_as_pic_flag[SubPicIdx]?SubPicBotBoundaryPos:pic_height_in_luma_samples-1 (8-424)
-if yCb > > CtbLog2SizeY equals yColBr > > Ctblog2SizeY, yColBr is less than or equal to botBoudardyPos and xColBr is less than or equal to rightBoundaryPos, the following applies:
the variable colCb specifies the luma codec block that covers the modification position given by ((xColBr > >3) < <3, (yclabr > >3) < <3) within the collocated picture specified by ColPic.
-the luma position (xColCb, ycucb) is set equal to the luma position of the top left sample of the collocated luma codec block specified by colCb relative to the top left luma sample of the collocated picture specified by ColPic.
-invoking a derivation process of the collocated motion vector specified in section 8.5.2.12, wherein currCb, colCb, (xColCb, yclocb), refIdxLX, and sbFlag are set equal to 0 as inputs, and outputs are assigned to mvLXCol and availableFlagLXCol.
Otherwise, both components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
2. When availableFlagLXCol is equal to 0, the center-collocated motion vector is derived as follows:
xColCtr=xCb+(cbWidth>>1) (8-425)
yColCtr=yCb+(cbHeight>>1) (8-426)
the variable colCb specifies the luma codec block that covers the modification position given within the collocated picture ((xColCtr > >3) < <3, (ycerctr > >3) < <3) specified by ColPic.
-the luma position (xColCb, ycucb) is set equal to the luma position of the top left sample of the collocated luma codec block specified by colCb relative to the top left luma sample of the collocated picture specified by ColPic.
-invoking a derivation process of the collocated motion vector specified in section 8.5.2.12, wherein currCb, colCb, (xColCb, yclocb), refIdxLX, and sbFlag are set equal to 0 as inputs, and outputs are assigned to mvLXCol and availableFlagLXCol.
8.5.2.12 derivation of collocated motion vectors
The inputs to this process:
a variable currCb specifying the current codec block,
a variable colCb specifying the collocated codec block within the collocated picture specified by the ColPic,
-specifying a luma position (xColCb, yColCb) of a top left sample of a collocated luma coding block specified by colCb relative to a top left luma sample of a collocated picture specified by ColPic,
-a reference index refIdxLX, wherein X is 0 or 1,
a flag sbFlag indicating a subblock time-domain merge candidate.
The output of this process:
-1/16 motion vector prediction mvLXClE of fractional sample precision,
the availability flag availableFlagLXCol.
The variable currPic specifies the current picture.
The arrays predFlagL0Col [ x ] [ y ], mvL0Col [ x ] [ y ], and refIdxL0Col [ x ] [ y ] are set to predFlagL10[ x ] [ y ], MvDmvrL0[ x ] [ y ], and refIdxL10[ x ] [ y ] of the ColPic-specified collocated picture, respectively, and the arrays predFlagL1Col [ x ] [ y ], mvL1Col x ] [ y ], and refIdxL1Col x ] [ y ] are set to predFlagL1[ x ] [ y ], MvvvvrL 1[ x ] [ y ], and RefIdxL1[ x ] [ y ] of the ColPic-specified collocated picture, respectively.
The variables mvLXCol and availableFlagLXCol are derived as follows:
-if colCb is coded in intra or IBC prediction mode, both components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
Otherwise, the motion vector mvCol, the reference index refIdxCol and the reference list identifier listCol are derived as follows:
2.8.1 derivation of collocated motion vectors when encoding and decoding colCb in inter blocks
-if sbFlag is equal to 0, availableFlagLXCol is set to 1 and the following applies:
-if predFlagL0Col [ xColCb ] [ yColCb ] is equal to 0, then mvCol, refIdxCon, and listCol are set equal to mvL1Col [ xColCb ] [ yColCb ], refIdxL1Col [ xColCb ] [ yColCb ], and L1, respectively.
Otherwise, if predFlagL0Col [ xColCb ] [ yColCb ] equals 1 and predFlagL1Col [ xColCb ] [ yColCb ] equals 0, then mvCol, refIdxCon, and listCol are set equal to mvL0Col [ xColCb ] [ yColCb ], refIdxL0Col [ xColCb ] [ yColCb, and L0, respectively.
Else (predFlagL0Col [ xColCb ] [ yColCb ] equals 1, and predFlagL1Col [ xColCb ] [ yColCb ] equals 1), the following assignment is made:
-if NoBackwardPredFlag equals 1, then mvCol, refIdxClol and listCol are coded
Are set equal to mvLXClC [ xClb ] [ yColCb ], refIdxLXClC [ xClb ] [ yColCb ], and LX, respectively.
Otherwise, mvCol, refIdxCol, and listCol are set equal to mvLNCol [ xColCb ] [ yccb ], refIdxLNCol [ xColCb ] [ yccb ], and LN, respectively, where N is the value of collocated _ from _ l0_ flag.
Else (sbFlag equal to 1), the following applies:
-if PredFlagLXClC [ xColCb ] [ yColCb ] is equal to 1, then mvCol, refIdxCol and listCol are set equal to mvLXClC [ xColCb ] [ yColCb ], refIdxLXClC [ xColCb ] [ yColCb ] and LX, respectively, and availableFlagLXCL is set to 1.
Else (PredFlagLXClC [ xColCb ] [ yColCb ] equals 0), the following applies:
-if NoBackwardPredFlag equals 1 and PredFlagLYCol [ xColCb ] [ yColCb ] equals 1, then mvCol, refIdxCol and listCol are set to mvLYCol [ xColCb ] [ yColCb ], refIdxCol [ xColCb ] [ yColCb ] and LY, respectively, where Y equals! X, where X is the value of X that invokes this procedure, availableFlagLXCL is set equal to 1.
Otherwise, both components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
-when availableFlagLXCol is equal to TRUE, the derivation of mvLXCol and availableFlagLXCol is as follows:
-if longtermrfpic (currPic, currCb, refIdxLX, LX) is not equal to longtermrfpic (ColPic, colCb, refIdxCol, listCol), both components of mvLXCol are set equal to 0 and availableflagclxcol is set equal to 0.
Otherwise, the variable availableFlagLXCol is set equal to 1, refPicList [ listCol ] [ refIdxCol ] is set to the picture with reference index refIdxCol in the reference picture list listCol of the slice containing the coded block colCb in the collocated picture specified by ColPic, and the following applies:
colPocDiff=DiffPicOrderCnt(ColPic,refPicList[listCol][refIdxCol])(8-427)
currPocDiff=DiffPicOrderCnt(currPic,RefPicList[X][refIdxLX])(8-428)
-invoking a temporal motion buffer compression process of the collocated motion vector specified in section 8.5.2.15, with mvCol as input and modified mvCol as output.
-if RefPicList [ X ] [ refIdxLX ] is a long-term reference picture, or colPocDiff is equal to currPocDiff, then mvLXCol is derived as follows:
mvLXCol=mvCol(8-429)
otherwise, mvLXCol is derived as a scaled version of the motion vector mvCol, as follows:
tx=(16384+(Abs(td)>>1))/td (8-430)
distScaleFactor=Clip3(-4096,4095,(tb*tx+32)>>6) (8-431)
mvLXCol=Clip3(-131072,131071,(distScaleFactor*mvCol+128-(distScaleFactor*mvCol>=0))>>8)) (8-432)
where td and tb are derived as follows:
td=Clip3(-128,127,colPocDiff) (8-433)
tb=Clip3(-128,127,currPocDiff) (8-434)
2.9 Bi-prediction with CU level weights (BCW)
In HEVC, a bi-directional prediction signal is generated by averaging two prediction signals obtained from two different reference pictures and/or using two different motion vectors. In VTM6, the bi-directional prediction mode is extended beyond simple averaging to allow weighted averaging of the two prediction signals.
Pbi-pred=((8-w)*P0+w*P1+4)>>3 (3-19)
Five weights are allowed in weighted average bi-directional prediction, w ∈ { -2,3,4,5,10 }. For each bi-directionally predicted CU, the weight w is determined in one of two ways: 1) for non-Merge CUs, signaling a weight index after the motion vector difference; 2) for a Merge CU, a weight index is inferred from neighboring blocks based on the Merge candidate index. Weighted average bi-prediction is only applicable to CUs with 256 or more luma samples (i.e., CU width multiplied by CU height is greater than or equal to 256). For low delay pictures, all 5 weights are used. For non-low delay pictures, only 3 weights (w ∈ {3,4,5}) apply.
-applying a fast search algorithm at the encoder to find the weight index without significantly increasing the complexity of the encoder. These algorithms are summarized below. For more details, the reader is referred to the VTM software and the document JFET-L0646. When used in conjunction with AMVR, the unequal weights are conditionally checked only for 1-pixel and 4-pixel motion vector precision if the current picture is a low-delay picture.
When combined with affine, affine ME is performed on unequal weights if and only if the affine mode is selected as the current best mode.
-only conditionally checking for unequal weights when the two reference pictures in the bi-prediction are identical.
-not searching for unequal weights when certain conditions are met, depending on POC distance, codec QP and temporal level between the current picture and its reference picture.
The BCW weight index is coded using the context coded bin and the following bypass coded bin. The binary bits of the first context codec indicate whether equal weights are used; if unequal weights are used, additional bits are signaled using bypass coding to indicate which unequal weights are used.
Weighted Prediction (WP) is a codec tool supported by the h.264/AVC and HEVC standards to efficiently codec fading video content. Support for WP is also added to the VVC standard. WP allows signaling of weighting parameters (weights and offsets) for each reference picture in each reference picture list L0 and L1. Then, during motion compensation, the weights and offsets of the corresponding reference pictures are applied. WP and BCW are designed for different types of video content. To avoid interaction between WP and BCW, which would complicate the VVC decoder design, if a CU uses WP, the BCW weight index is not signaled and w is inferred to be 4 (i.e. equal weight is applied). For a Merge CU, a weight index is inferred from neighboring blocks based on the Merge candidate index. This can be applied to the normal Merge mode and the inherited affine Merge mode. For the constructed affine Merge mode, affine motion information is constructed based on motion information of at most 3 blocks. The following procedure is used to derive the BCW index of a CU using the constructed affine Merge mode.
1. The range of BCW indices 0,1,2,3,4 is divided into three groups 0,1,2,3, and 4. If the BCW indexes of all the control points are from the same group, deducing the BCW indexes according to the steps in the step 2; otherwise, the BCW index is set to 2.
Assigning a BCW index value to a candidate if at least two control points have the same BCW index; otherwise, the BCW index of the currently constructed candidate is set to 2.
2.10 Merge mode with MVD (MMVD)
In addition to the Merge mode (implicitly derived motion information is used directly for prediction sampling generation of the current CU), a Merge mode with motion vector difference (MMVD) is introduced in the VVC. Immediately signaling the MMVD flag after sending the skip flag and the Merge flag to specify whether the MMVD mode is used for the CU.
In MMVD, after selecting a Merge candidate, it is further refined by the signaled MVD information. The further information comprises a Merge candidate flag, an index for specifying the magnitude of the motion, and an index for an indication of the direction of the motion. In MMVD mode, one of the first two candidates in the Merge list is selected to be used as MV basis. The Merge candidate flag is signaled to specify which Merge candidate flag to use.
The distance index specifies motion magnitude information and indicates a predefined offset from the starting point. As shown in fig. 7, the offset is added to the horizontal component or the vertical component of the starting MV. Table 1 specifies the distance index versus the predefined offset.
Table 1: distance index versus predefined offset
Figure BDA0003544811710000171
The direction index indicates the direction of the MVD with respect to the starting point. The direction index may represent four directions shown in table 2. It is noted that the meaning of the MVD symbol may vary depending on the information of the starting MV. When the starting MV is a non-predicted MV or a bi-predicted MV, and both lists point to the same side of the current picture (i.e., both reference POCs are greater than, or both less than, the POC of the current picture), the symbols in table 2 specify the symbols of the MV offset added to the starting MV. When the starting MV is a bi-directionally predicted MV, and the two MVs point to different sides of the current picture (i.e., one referenced POC is greater than the POC of the current picture, and the other referenced POC is less than the POC of the current picture), the signs in table 2 specify the sign of the MV offset added to the list0 MV component of the starting MV, and the sign of the list1 MV has the opposite value.
Table 2: sign of MV offset specified by direction index
Direction IDX 00 01 10 11
X axis + N/A N/A
Y-axis N/A N/A +
2.11 selectable luminance half-pixel interpolation filter
In JFET-N0309, an optional half-pixel interpolation filter is proposed.
The completion of the switching of the half-pixel luminance interpolation filter depends on the motion vector precision. In addition to the existing quarter-pixel, full-pixel, and 4-pixel AMVR modes, a new half-pixel precision AMVR mode is introduced. Only in the case of half-pixel motion vector precision, an optional half-pixel luminance interpolation filter may be selected.
2.11.1 half-pixel AMVR mode
An additional AMVR mode for non-affine non-large inter-coded CUs is proposed that allows signaling motion vector differences with half-pixel accuracy. The existing AMVR solution of the current VVC draft is directly extended by: directly after the syntax element AMVR _ flag, if AMVR _ flag is 1, there is a binary syntax element hpel _ AMVR _ flag indicating that a new context modeling of the new half-pixel AMVR mode is used when hpel _ AMVR _ flag is 1. Otherwise, i.e. if hpel _ AMVR _ flag is 0, the selection between full-pixel and 4-pixel AMVR modes is indicated by the syntax element AMVR _ precision _ flag in accordance with the current VVC draft.
2.11.2 selectable luma half-pixel interpolation filter
For a non-affine non-Merge inter-coded CU using half-pel motion vector precision (i.e., half-pel AMVR mode), switching between the HEVC/VVC half-pel luma interpolation filter and one or more optional half-pel interpolations is performed based on the value of the new syntax element if _ idx. The syntax element if idx is signaled only in case of half-pixel AMVR mode. In case of the skip/Merge mode using the spatial merging candidate, the value of the syntax element if _ idx is inherited from the neighboring blocks.
2.11.2.1 test 1: selectable half-pixel interpolation filter
In this test case, there is a 6-tap interpolation filter as a substitute for the normal HEVC/VVC half-pixel interpolation filter. The following table shows the mapping between the value of the syntax element if _ idx and the selected half-pixel luma interpolation filter:
if_idx binarization method Filter Interpolation filter coefficient
0 0 Gauss (6 taps) [0,3,9,20,20,9,3,0]
1 1 HEVC/VVC (8 taps) [-1,4,-11,40,40,-11,4,-1]
2.11.2.1 test 2: two selectable half-pixel interpolation filters
In this test case, there are two 8-tap interpolation filters as a replacement for the normal HEVC/VVC half-pixel interpolation filter. The following table shows the mapping between the value of the syntax element if _ idx and the selected half-pixel luma interpolation filter:
if_idx binarization method Filter Interpolation filter coefficient
0 0 Filter 1(8 taps) [3,6,10,13,13,10,6,3]
1 10 Filter 2(8 taps) [-1,-1,9,25,25,9,-1,-1]
2 11 HEVC/VVC (8 taps) [-1,4,-11,40,40,-11,4,-1]
Amvr _ precision _ idx is signaled to indicate whether the current CU adopts 1/2-pixel MV precision, 1-pixel MV precision, or 4-pixel MV precision. There are 2 bits that need to be coded.
Hpel _ if _ idx is signaled to indicate whether a default half-pixel interpolation filter or an optional half-pixel interpolation filter is used. When 2 selectable half-pixel interpolation filters are used, there are 2 binary bits that need to be coded.
2.12 triangle partitioning for inter-frame prediction (TPM)
In VTM6, a triangle division mode is supported for inter prediction. The triangle division mode is applicable only to CUs of 8 × 8 or larger. The CU-level flag is used as a kind of Merge mode and other Merge modes including a normal Merge mode, MMVD mode, CIIP mode, and subblock Merge mode signal a triangle split mode.
Using this mode, a CU is uniformly divided into two triangular partitions using either diagonal or anti-diagonal partitioning (fig. 15). Each triangle partition in a CU uses its own motion for inter prediction; each partition allows only unidirectional prediction, i.e. each partition has one motion vector and one reference index. Unidirectional prediction motion constraints are applied to ensure that only two motion compensated predictions are needed per CU, as in conventional bi-directional prediction. The unidirectional predictive motion for each partition is derived using the process described in 2.12.1.
If the triangle split mode is used for the current CU, a flag indicating the triangle split direction (diagonal or anti-diagonal) and two Merge indices (one for each partition) are further signaled. The number of maximum TPM candidate sizes is explicitly signaled at the slice level and syntax binarization of the TMP Merge index is specified. After each triangle partition is predicted, sample values along the diagonal or anti-diagonal edges are adjusted using a blending process with adaptive weights. This is the prediction signal for the entire CU and, like other prediction modes, the transform and quantization process will be applied to the entire CU. Finally, the motion field of the CU predicted using the triangle partition mode is stored in 4 × 4 units, such as 2.12.3.
The triangle split mode is not used in conjunction with SBT, i.e. cu SBT flag is inferred to be 0 without signaling when the signaled triangle mode is equal to 1.
2.12.1 unidirectional prediction candidate list construction
The unidirectional prediction candidate list is derived directly from the Merge candidate list constructed according to the extended Merge prediction process. N is expressed as an index of a uni-directional predicted motion in the triangular uni-directional prediction candidate list. The LX motion vector of the nth extended Merge candidate (where X is equal to the parity of n) is used as the nth uni-directional prediction motion vector of the triangle partition mode. These motion vectors are marked with an "x" in fig. 16. If the corresponding LX motion vector of the nth extended Merge candidate does not exist, the L (1-X) motion vector of the same candidate is used as a uni-directional prediction motion vector of the triangle division mode.
2.12.2 blending along the triangle split edges
After each triangle partition is predicted using its own motion, a blend is applied to the two prediction signals to derive samples around diagonal or anti-diagonal edges. The following weights are used for the blending process: {7/8,6/8,5/8,4/8,3/8,2/8,1/8} for luminance, and {6/8,4/8,2/8} for chrominance, as shown in FIG. 17.
2.12.3 playground storage
The motion vectors of CUs coded in the triangle division mode are stored in 4 × 4 units. Depending on the location of each 4 × 4 unit, a uni-directional predicted motion vector or a bi-directional predicted motion vector is stored. Mv1 and Mv2 are denoted as uni-directional predicted motion vectors for partition 1 and partition 2, respectively. If the 4 × 4 cell is located in the unweighted region shown in the example of fig. 17, then either Mv1 or Mv2 is stored for the 4 × 4 cell. Otherwise, if the 4 × 4 unit is located in the weighting region, the bidirectional predictive motion vector is stored. Bi-predictive motion vectors are derived from Mv1 and Mv2 according to the following procedure:
(1) if Mv1 and Mv2 are from different reference picture lists (one from L0 and the other from L1), Mv1 and Mv2 are simply combined to form bi-predictive motion vectors.
(2) Otherwise, if Mv1 and Mv2 are from the same list and commonality is not lost, then it is assumed that they are both from L0. In this case, it is preferable that the air conditioner,
(2.a) if a reference picture of Mv2 (or Mv1) appears in L1, then the Mv2 (or Mv1) is converted to an L1 motion vector using the reference picture in L1. The two motion vectors are then combined to form a bi-directional predicted motion vector;
(2.b) otherwise, instead of bi-directionally predicting motion, only the uni-directionally predicted motion Mv1 is stored.
2.13 affine Merge prediction
The AF _ MERGE mode may be applied to CUs having a width and a height greater than or equal to 8. In this mode, the CPMV of the current CU is generated based on the motion information of spatially neighboring CUs. There may be at most five CPMVP candidates and the index is signaled to indicate which one to use for the current CU. The following three types of CPVM candidates are used to form the affine Merge candidate list:
1) inherited affine Merge candidates inferred from CPMVs of neighboring CUs
2) Constructive affine Merge candidate CPMVP derived using translated MVs of neighboring CUs
3) Zero MV
In VTM6, there are at most two inherited affine candidates derived from affine motion models of neighboring blocks, one from the left neighboring CU and the other from the above neighboring CU. The candidate blocks are shown in fig. 18. For the left predictor, the scan order is A0- > A1, and for the top predictor, the scan order is B0- > B1- > B2. Only the first inheritance candidate from each side is selected. No deduplication checking is performed between the two inherited candidates. When a neighboring affine CU is identified, its control point motion vector is used to derive the CPMVP candidate in the affine Merge list of the current CU. As shown in fig. 19, if the adjacent left bottom block a is coded in the affine mode, motion vectors v _2, v _3, and v _4 containing the left top corner, the right top corner, and the left bottom corner of the CU of the block a are obtained. When the block a is coded using the 4-parameter affine model, two CPMVs of the current CU are calculated from v _2 and v _ 3. If the block A is coded using a 6-parameter affine model, three CPMVs for the current CU are computed from v _2, v _3, and v _ 4.
Constructing affine candidates refers to constructing candidates by combining neighboring translational motion information of each control point. The motion information of the control points is derived from the specified spatial and temporal neighbors shown in fig. 20. CPMVk (k ═ 1,2,3,4) denotes the kth control point. For CPMV1, B2- > B3- > a2 blocks are checked and the MV of the first available block is used. For CPMV2, B1- > B0 blocks are checked, and for CPMV3, a1- > a0 blocks are checked. For TMVP, if available, it is used as CPMV 4.
After obtaining the MVs of the four control points, affine Merge candidates are constructed based on those motion information. The following combinations of control points MV are used for the sequential construction:
{CPMV1,CPMV2,CPMV3},{CPMV1,CPMV2,CPMV4},{CPMV1,CPMV3,CPMV4},
{CPMV2,CPMV3,CPMV4},{CPMV1,CPMV2},{CPMV1,CPMV3}
a combination of 3 CPMVs constitutes a 6-parameter affine Merge candidate, and a combination of 2 CPMVs constitutes a 4-parameter affine Merge candidate. To avoid the motion scaling process, relevant combinations of control points MV are discarded if the reference indices of the control points are different.
After checking the inherited affine Merge candidate and constructing the affine Merge candidate, if the list is still not full, a zero MV is inserted at the end of the list.
3. Disadvantages of the existing embodiments
The current design of the derivation of collocated motion vectors in inter prediction has the following problems:
derivation of the collocated motion vectors in TMVP and SbTMVP depends on the prediction mode of the collocated codec block. In the current VVC, if the collocated coded block is coded in intra prediction mode or IBC prediction mode, the collocated motion vector is set equal to a zero motion vector. Thus, even if the collocated block is palette coded, it may still return undefined collocated motion vectors because there are no motion vectors associated with the palette prediction mode.
2. In BCW, the derivation of weighted sample prediction may not be efficient.
3. In the current VVC, the MV is clipped to 18 bits. However, the Merge motion vector difference is clipped to 16 bits, which may result in accuracy loss.
4. The cu skip flag signaled may result in an overhead bit. In the current VVC, the maximum width and height of the IBC codec unit is 64. For blocks in the I-slice with a width or height greater than 64, cu _ skip _ flag does not need to be signaled.
5. The alternative luma half-pixel interpolation filter flag may be set equal to true even if the CU/PU/block has no half-pixel or coarser MV components.
6. An alternative luma half-pixel interpolation filter may be used for TPM mode.
7. An alternative luma half-pixel interpolation filter may be used in TMVP/SbTMVP/affine Merge.
BCW can be used for TMVP/SbTMVP
4. Example method of concatenating motion vectors in video coding and decoding
The following detailed description is to be taken as an example to illustrate the general concepts. These inventions should not be construed narrowly. Furthermore, these inventions may be combined in any manner.
Derivation of collocated motion vectors
1. How to derive the collocated motion vector and/or the availability of the collocated motion vector may depend on whether the prediction mode of the collocated coded block is inter, rather than checking whether the collocated coded block is intra coded or IBC coded. In this case, there is no need to store four prediction modes for each block, only one 1 bit is required to determine whether the block is inter or non-inter.
a. In one example, for the case of a collocated coded block coded in palette prediction mode, the availability of how to derive a collocated motion vector and/or collocated motion vector may be the same as for the case of a collocated coded block coded in intra/IBC prediction mode.
b. In one example, when a collocated coded block is coded in a non-inter prediction mode (e.g., intra, palette, or IBC), the collocated motion vector may be set to unavailable.
c. Alternatively, when the collocated coded block is coded in a non-inter prediction mode (e.g., intra, palette, or IBC), the collocated motion vector may be marked as available and the default motion vector may be designated as the collocated motion vector.
d. Alternatively, when a collocated codec block is coded in a non-inter prediction mode, other blocks may be examined (e.g., one adjacent inter codec block of the collocated codec block).
i. In one example, the neighboring block may be the nearest inter-coded block that juxtaposes the left/right/top/bottom of the coded block.
2. The determination of the collocated motion vector may depend on a reference list and/or a reference index of the collocated coded block.
a. In one example, when a reference index of a reference list X (e.g., L1) of a collocated codec block is not equal to a specific value (e.g., 0), a collocated motion vector may be derived (e.g., using the prior art described in 2.8.1).
b. In one example, the derivation of the collocated motion vector may be invoked when the reference index of the collocated codec block's reference list X (e.g., L1) is equal to a particular value (e.g., 0).
c. Alternatively, further, when the collocated codec block is codec in a non-inter prediction mode (including or not including IBC), or when its reference picture does not satisfy a given condition, the collocated MV is set to unavailable.
i. Alternatively, the collocated MV is set to a default value.
Sample prediction for BCW
3. The weight table applied in BCW codec mode may be asymmetric.
a. In one example, for a weight W that is an entry of a table, (1-W) may not be an entry of the table, where W is in the range of [ a, b ], where (a + b) equals 1.
b. In one example, for a bar as a tableDestination weight W, (2)N-W) may not be a table entry, assuming the final prediction block is composed of (W0P 0+ W1P 1)>>N generation, where W0 and W1 are two weights applied to two prediction blocks P0 and P1, respectively; and (W0+ W1) is equal to (1)<<N)。
4. The weights in the weight table applied in BCW codec mode may not be in monotonically increasing order.
a. In one example, the (i +1) th entry of the table may have a value less than the ith entry of the table.
5. In one example, the weighted sample prediction process of BCW may depend on different weight lookup tables.
a. In one example, {4,5,3,10,2}/{4,3,5,10,2}/{4,5,3,10,1}/{4,3,5,10,1}/{4,5,3,10, -1} may be used as a weight lookup table for the BCW.
6. In one example, in a weighted sample prediction process of BCW, the intermediate predicted samples in each prediction direction may be converted (e.g., when the first bit depth is not equal to the bit depth of the intermediate predicted samples, if necessary) to a first bit depth, then weighted prediction may be applied, and the final predicted samples may be converted to a second bit depth.
a. In one example, the second bit depth is the same as the input bit depth of the current color component.
b. In one example, in converting between different bit depths, a right shift (e.g., converting samples from a higher bit depth to a lower bit depth) or a left shift (e.g., converting samples from a lower bit depth to a higher bit depth) may be applied.
i. Alternatively, further, an offset may be added before the right shift or the left shift.
c. In one example, the first bit depth is the same as the bit depth of the intermediate predicted samples. The weighted sample prediction for BCW can be derived as:
pbSamples [ x ] [ y ] ═ Clip3(0, (1< < bitDepth) -1, (w0 predSamplesL0[ x ] [ y ] + w1 predSamplesL1[ x ] [ y ] + offset3 > (shift1+3), where shift1 is set equal to Max (2,14-bitDepth) and variable offset3 is set equal to 1< <shift 1+ 2).
d. In one example, the first bit depth is the same as the input bit depth of the current color component.
The weighted sample prediction for BCW can be derived as:
pbSamples [ x ] [ y ] ═ Clip3(0, (1< < bitDepth) -1, (w0 ((predSamplesL0[ x ] [ y ] + offset1) > > shift1) + w1 ((predSamplesL1[ x ] [ y ] + offset1) > > shift1) +4) > >3), where shift1 is set equal to Max (2,14-bitDepth) and variable offset1 is set equal to 1< < (shift 1-1).
MMVD Range
7. In one example, the Merge motion vector difference may be clipped to the same range as the motion vector.
a. In one example, the Merge motion vector difference may be clipped to 18 bits, e.g., [ -2 [ ]17,217–1]This is the same as the motion vector range in VVC.
b. In one example, the Merge motion vector difference may be tailored to [ -2]17+1,217–1]。
c. In one example, the Merge motion vector difference may not be clipped.
i. For example, after the Merge motion vector difference is added to the motion vector prediction candidate, the result of the motion vector is clipped to 18 bits.
Signaling of cu _ skip _ flag
8. Depending on the size of the codec block, an indication (e.g., CU skip flag) of whether or not the CU/PU/block is codec in skip mode may be conditionally signaled.
a. In one example, an indication (e.g., cu skip flag) may be signaled when the current slice type is an I slice and sps _ ibc _ enabled _ flag is equal to true and both the block width and block height are less than or equal to N (N is an integer). For example, N-64.
b. In one example, when coding a current block in IBC mode, an indication (e.g., cu skip flag) may be signaled when both the block width and the block height are less than or equal to N (N is an integer). For example, N-64.
Coding and decoding tool for chrominance component
9. Whether a codec tool X (e.g., X is TMVP/ATMVP/BCW/MMVD/PDPC) can be applied to one block of a first color component may depend on whether it is applied to one or more corresponding blocks in a second color component.
a. In one example, the use of codec tool X for a first color component (e.g., chroma) block may be disabled when the codec tool is applied to a corresponding second color component (e.g., luma) block.
b. In one example, when a codec tool Y (where Y is different from X) is applied to a corresponding second color component (e.g., luma) block, use of codec tool X for the first color component (e.g., chroma) block may be disabled.
c. In one example, a message (e.g., a flag or index) may be conditionally signaled to indicate whether codec tool X applies to the first color component of the block. The condition may be defined as whether it applies to the corresponding second color component block. Alternatively, further, if it is not applied to the corresponding second color component block, it is not applied to the first component of the block without signaling.
i. In one example, codec tool X may be applied to different color components in different ways.
1) It can be signaled how to apply codec tool X on the luma component and the chroma component, respectively.
d. In one example, the first color component is a chrominance component and the second color component is a luminance component.
e. In one example, the first color component is one chroma color component and the second color component is another chroma color component.
f. In one example, the first color component is a luma color component and the second color component is a chroma color component.
g. In the above discussion, a "corresponding second color component block" may refer to a second color component block that includes at least one "corresponding sample point" of a first color component block.
i. In one example, the first color component is a chrominance component and the second color component is a luminance component.
The locations of the spots may be scaled according to a color format (e.g., 4:4:4 or 4:2: 0). Assuming that the top left position of the chroma block is (x0, y0) and the width and height of the chroma block are W and H, all these sample positions are scaled to luma sample units.
in one example, the corresponding sample point may be located at (x0, y 0);
in one example, the corresponding sample may be located at (x0+ W-1, y0+ H-1);
in one example, the corresponding sample point may be located at (x0+ W/2-1, y0+ H/2-1);
in one example, the corresponding sampling points may be located at (x0+ W/2, y0+ H/2);
in one example, the corresponding sample point may be located at (x0+ W/2, y0+ H/2-1);
in one example, the corresponding sample point may be located at (x0+ W/2-1, y0+ H/2);
h. in the above discussion, "chroma component" may refer to "one or more chroma components.
10. It is proposed that when co-located luma blocks of chroma blocks are coded in some modes, the location-dependent intra prediction sample filtering process (also known as PDPC) may be disabled for some chroma components.
a. In one example, such a process may be disabled when a collocated luma block of chroma blocks is coded in MIP (matrix-based intra prediction) mode.
b. In one example, such a process may be disabled when a collocated luma block of a chroma block is coded in MRL (multiple reference line) mode.
11. When a CU/PU/block has only MV components with higher than half-pixel precision (e.g., 1/4 pixels, 1/8 pixels, etc.), the indication of whether to use an optional half-pixel interpolation filter may be set to false. That is, a default half-pixel interpolation filter may be used instead of an alternative half-pixel interpolation filter.
a. In one example, if the reconstructed MV has only MV components with a higher precision than half a pixel, such an indication may be set to false in MMVD mode.
b. In one example, for a paired Merge candidate, if it only has MV components with higher precision than half a pixel, such an indication may be set equal to false.
c. In one example, such an indication may be set to false when the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information having only MV components with a higher precision than half a pixel (e.g., applied to small blocks such as 4 x 8 blocks or/and 8 x 4 blocks).
d. In one example, an indication set to false may be stored and used for subsequent codecs of the CU/PU/block.
12. When a CU/PU/block has N (N is an integer, e.g., N ═ 1) or more than N MV components (including horizontal and vertical components) with higher precision than half-pixels (e.g., 1/4 pixels, 1/8 pixels, etc.), the indication of whether to use the optional half-pixel interpolation filter may be set equal to false. That is, a default half-pixel interpolation filter may be used instead of an alternative half-pixel interpolation filter.
a. In one example, if the reconstructed MV has N or more MV components with higher precision than half a pixel, then in MMVD mode, such an indication may be set equal to false.
b. In one example, if a paired Merge candidate has N or more MV components with higher precision than half a pixel, such an indication may be set equal to false.
c. In one example, such an indication may be set equal to false when the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information having N or more than N MV components with greater than half-pixel precision (e.g., applied to small blocks such as 4 x 8 blocks or/and 8 x 4 blocks).
d. In one example, N may depend on whether a CU/PU/block is uni-directionally predicted or bi-directionally predicted.
i. For example, bi-directionally predicted CUs/PUs/blocks may use a larger N than uni-directionally predicted CUs/PUs/blocks.
For example, the same N may be used for both uni-directionally predicted and bi-directionally predicted CUs/PUs/blocks.
e. In one example, an indication set equal to false may be stored and used for subsequent codecs of the CU/PU/block.
13. In some cases, the indication of whether to use the optional half-pixel interpolation filter may always be set equal to false. That is, in these cases, a default half-pixel interpolation filter may always be used.
a. In one example, such an indication may always be set equal to false in MMVD mode.
b. In one example, when certain MVDs are selected, such an indication may always be set equal to false in MMVD mode.
i. For example, if an MVD with 1/4 pixel accuracy is selected, such an indication may be set equal to false.
c. In one example, such an indication may always be set equal to false for paired Merge candidates.
d. In one example, such an indication may always be set equal to false when the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information (e.g., applied to small blocks such as 4 x 8 blocks or/and 8 x 4 blocks).
e. In one example, an indication set equal to false may be stored and used for subsequent codecs of the CU/PU/block.
14. It is suggested that an optional half-pixel interpolation filter may be used in TPM mode.
a. In one example, the indication of the optional half-pixel interpolation filter in TPM mode may be inherited from the Merge candidate.
i. In one example, the indication of the selectable half-pixel interpolation filter may be inherited from one or more spatial candidates, and/or TMVP candidates, and/or HMVP candidates, and/or pair candidates.
1) In one example, the indication of the selectable half-pixel interpolation filter may be derived from one of the spatial candidates (e.g., the first spatial candidate) of the TMVP candidate.
2) hPelIf1 and hPelIf2 are represented as indications of a first candidate and a second candidate in a pair of candidates. In one example, when the indication of the optional half-pixel interpolation filter inherits from the pair of candidates, it may depend on hPelIf1 or/and hPelIf 2.
a) In one example, the indication of the selectable half-pixel interpolation filter is set equal to true.
b) In one example, the indication of the selectable half-pixel interpolation filter is set equal to false.
c) In one example, the indication of the optional half-pixel interpolation filter is set equal to hPelIf 1.
d) In one example, the indication of the optional half-pixel interpolation filter is set equal to hPelIf 2.
e) In one example, the indication of the optional half-pixel interpolation filter is set equal to (hPelIf1& & hPelIf 2).
f) In one example, the indication of the optional half-pixel interpolation filter is set equal to (hPelIf1| | hPelIf 2).
in one example, the indication of the optional half-pixel interpolation filter in TPM mode may be set equal to true.
b. In one example, an optional half-pixel interpolation filter may be used during motion compensation of one or both partitions in TPM mode.
i. In one example, an optional half-pixel interpolation filter may be used for partition 1.
in one example, an optional half-pixel interpolation filter may be used for partition 2.
in one example, an optional half-pixel interpolation filter may be used for both partition 1 and partition 2.
c. In one example, an indication of whether to use the optional half-pixel interpolation filter in TPM mode may be stored with the motion field. hPelIfPart1 and hPelIfPart2 are represented as indications of half-pixel interpolation filters for partition 1 and partition 2 in TPM mode.
i. In one example, the same indication of whether the optional half-pixel interpolation filter is used in TPM mode may be stored for all 4 x 4 cells.
1) In one example, during the process of motion field storage, the indication of whether to use the optional half-pixel interpolation filter is set equal to false.
2) In one example, during the process of motion field storage, the indication of whether to use an optional half-pixel interpolation filter is set equal to (hPelIfPart1& & pelifart 2).
3) In one example, during the process of motion field storage, the indication of whether an optional half-pixel interpolation filter is used is set equal to (hpelfpart 1| | hpelfpart 2).
in one example, different indications of whether to use the optional half-pixel interpolation filter in TPM mode may be stored for 4 x 4 cells in different partitions (e.g., partition 1, partition 2, or weighted area).
1) In one example, for X replaced by 1 or 2, the indication of whether the optional half-pixel interpolation filter is used for 4X 4 cells in partition X is set equal to hPelIfPartX during the process of motion field storage.
2) In one example, the indication of whether the optional half-pixel interpolation filter is used for 4 × 4 cells in the weighting region is set equal to (hpelfpart 1& & hpelfpart 2).
3) In one example, the indication of whether the optional half-pixel interpolation filter is used for 4 × 4 cells in the weighted region is set equal to (hpelfpart 1| | hpelfpart 2).
4) In one example, the indication of whether the optional half-pixel interpolation filter is used for 4 × 4 cells in the weighting region is set equal to hPelIfPart 1.
5) In one example, the indication of whether the optional half-pixel interpolation filter is used for 4 × 4 cells in the weighting region is set equal to hPelIfPart 2.
15. In one example, one or more methods disclosed in bullet 13 may be used for indication of BCW (also referred to as generalized bi-directional prediction (GBi) index).
16. Whether to enable or disable the codec tool for the TMVP candidate may depend on information of all or part of the motion candidates (named selected motion candidates) in the Merge list before adding the TMVP candidate.
a. In one example, the selected motion candidates may be those spatial region Merge candidates;
b. in one example, the selected motion candidate may be a first spatial region Merge candidate;
i. in one example, the indication of whether the optional half-pixel interpolation filter is used for the TMVP candidate may be set equal to the indication of the first spatial region Merge candidate.
in one example, the BCW index of the TMVP candidate may be set equal to the BCW index of the first spatial region Merge candidate.
in one example, the first spatial domain Merge candidate may be a left spatial domain Merge candidate.
c. In one example, the indication of whether to use the optional half-pixel interpolation filter and/or the BCW index may depend on information associated with the selected motion candidate.
i. In one example, if there are more candidates than the remaining candidates, where the indication of whether to use the alternative half-pixel interpolation filter is equal to 1, then the indication of whether to use the alternative half-pixel interpolation filter may be set to 1 (or 0) for the TMVP candidate.
17. Whether to enable or disable the codec tool of the SbTMVP candidate may depend on the information of the first spatial region Merge candidate.
a. In one example, the indication of whether the optional half-pixel interpolation filter is used for SbTMVP candidates may be set equal to the indication of the first spatial region Merge candidate.
b. In one example, the BCW index of the SbTMVP candidate may be set equal to the BCW index of the first spatial region Merge candidate.
c. In one example, the first spatial domain Merge candidate may be a left spatial domain Merge candidate.
18. The indication of whether the optional half-pixel interpolation filter is used for a block using the constructed affine Merge mode may be set equal to the indication of MVs in neighboring blocks to generate a control point MV (cpmv).
a. In one example, MVs in neighboring blocks may generate CPMVs in the upper left corner.
b. In one example, MVs in neighboring blocks may generate CPMVs in the upper right corner.
c. In one example, MVs in neighboring blocks may generate CPMV in the lower left corner.
19. The indication of whether the optional half-pixel interpolation filter is used for a block using an affine Merge mode inherited from a neighboring block may be inherited from the neighboring block.
20. In one example, two Merge candidates are considered different when compared in the Merge candidate construction process if the indication of whether they use the selectable half-pixel interpolation filter is different.
Total assertion
21. Whether and/or how to apply the above disclosed method may be signaled in a sequence level/picture level/slice level/group of slices level, e.g., sequence header/picture header/SPS/VPS/DPS/PPS/APS/slice header/group of slices header.
The above examples may be incorporated in the context of methods (e.g., methods 800, 900, 1000, 1100, 1200, and 1300) described below, which may be implemented at a video decoder or a video encoder.
Fig. 8 shows a flow diagram of an exemplary method of video processing. Method 800 includes, at step 810, determining availability of one or more collocated motion vectors based on a prediction mode of a collocated video block relative to a current video block.
The method 800 includes, at step 820, performing a conversion between a current block and a bitstream representation of the current block based on one or more collocated motion vectors, the indication of prediction mode comprising a bit indicating whether the current video block is coded in inter mode or non-inter mode.
Fig. 9 shows a flow diagram of an exemplary method of video processing. Method 900 includes, at step 910, determining a type of a collocated video block of the video block for a transition between a codec representation of the video block and the video block, the type taking only one of two possible values.
The method 900 includes, at step 920, performing a conversion based on the determination.
Fig. 10 shows a flow diagram of an exemplary method of video processing. The method 1000 includes, at step 1010, deriving a prediction sample point for a current block based on a first weight table associated with a bi-directional prediction mode (BCW) process having Coding Unit (CU) level weights, the first weight table being asymmetric.
The method 1000 includes, at step 1020, performing a conversion between the current block and a bitstream representation of the current block based on the prediction samples.
Fig. 11 shows a flow diagram of an exemplary method of video processing. Method 1100 includes, at step 1110, determining selective signaling of an indication of a skip mode codec for a current video block in a bitstream representation of the current video block based on a dimension of the current video block.
The method 1100 includes, at step 1120, performing a conversion between the current block and a bitstream representation of the current block based on the determination.
Fig. 12 shows a flow diagram of an exemplary method of video processing. Method 1200 includes, at step 1210, determining a selective application of a second coding tool to at least one block of a second color component of a current video block based on an application of the first coding tool to one or more blocks of a first color component of the current video block.
The method 1200 includes, at step 1220, performing a conversion between the current block and a bitstream representation of the current block based on the determination.
Fig. 13 shows a flow diagram of an exemplary method of video processing. Method 1300 includes, at step 1310, determining a selective signaling of an indication to use an alternate half-pixel interpolation filter instead of a default half-pixel interpolation filter based on a precision of a motion vector in a current video block.
The method 1300 includes, at step 1320, performing a conversion between the current block and a bitstream representation of the current block based on the determination.
Some embodiments of the disclosed technology include making a decision or decision to enable a video processing tool or mode. In an example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of the video blocks, but does not necessarily modify the resulting bitstream based on the use of the tool or mode. That is, the conversion from video blocks to a bitstream representation of the video will use a video processing tool or mode when enabled based on the decision or decision. In another example, when a video processing tool or mode is enabled, the decoder will process the bitstream knowing that the bitstream has been modified based on the video processing tool or mode. That is, the conversion from a bitstream representation of the video to video blocks will be performed using a video processing tool or mode that is enabled based on the decision or the decision.
Some embodiments of the disclosed technology include making a decision or decision to disable a video processing tool or mode. In an example, when a video processing tool or mode is disabled, the encoder will not use that tool or mode when converting video blocks into a bitstream representation of the video. In another example, when a video processing tool or mode is disabled, the decoder will process the bitstream knowing that the bitstream has not been modified using a decision-based or decision-disabled video processing tool or mode.
In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during conversion from a pixel representation of a video to a corresponding bitstream representation, and vice versa. For example, the bitstream representation of the current video block may correspond to bits that are collocated or scattered at different locations within the bitstream as defined by the syntax. For example, a macroblock may be encoded according to transform and codec error residual values, and may also be encoded using bits in the header and other fields in the bitstream.
5. Exemplary embodiments of the disclosed technology
The changes are highlighted in bold and italics. Deleted text is marked with double brackets (e.g., [ [ a ] ] representing the deleted letter "a").
5.1 example #1
The working draft specified in JFET-O2001-vE may be changed as follows.
8.5.2.12 derivation of collocated motion vectors
The variables mvLXCol and availableFlagLXCol are derived as follows:
-if colCb is coded in intra prediction mode or palette prediction mode or IBC prediction mode, then both components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
Otherwise, the motion vector mvCol, the reference index refIdxCol and the reference list identifier istCol are derived as follows:
alternatively, the following may apply:
the variables mvLXCol and availableFlagLXCol are derived as follows:
-if colCb is not codec in [ [ intra or IBC ] ] inter prediction mode, then two components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
Otherwise, the motion vector mvCol, the reference index refIdxCol and the reference list identifier listCol are derived as follows:
5.2 example #2
The working draft specified in JFET-O2001-vE may be changed as follows.
8.5.1 general decoding procedure for codec units that are codec in inter prediction mode
3. For xsbdx ═ 0.. numbx-1, ysbdx ═ 0.. numSbY-1, the decoder-side motion vector thinned array of luma and chroma motion vectors (refMvLX [ xsbdx ] [ ysbdx ] and refMvCLX [ xsbdx ] [ ysbdx ], where X is 0 and 1) is derived as follows:
-if dmvrFlag is equal to 1, calling the derivation process of the chrominance motion vector that is found in section 8.5.2.13 using refMvLX [ xsbdx ] [ ysbdx ] and refIdxLX as inputs and refMvCLX [ xsbdx ] [ ysbdx ] as output, and the input refMvLX [ xsbdx ] [ ysbdx ] is derived as follows;
refMvLX[xSbIdx][ySbIdx]=mvLX[xSbIdx][ySbIdx]+dMvLX[xSbIdx][ySbIdx] (8-287)
refMvLX[xSbIdx][ySbIdx][0]=Clip3(-217,217-1,refMvLX[xSbIdx][ySbIdx][0]) (8-288)
refMvLX[xSbIdx][ySbIdx][1]=Clip3(-217,217-1,refMvLX[xSbIdx][ySbIdx][1]) (8-289)
else (dmvrFlag equal to 0), the following applies:
refMvLX[xSbIdx][ySbIdx]=mvLX[xSbIdx][ySbIdx] (8-290)
refMvCLX[xSbIdx][ySbIdx]=mvCLX[xSbIdx][ySbIdx] (8-291)
note that array refMvLX is stored in MvDmvrLX and used in the derivation process for the collocated motion vector in section 8.5.2.12. After decoding the slice, MvDmvrLX [ xsbIdx ] [ ySbIdx ] and the corresponding reference index are all set equal to-1 when the codec block Cb [ xsbIdx ] [ ySbIdx ] is codec in IBC prediction mode. The array of non-refined luma motion vectors MvLX is used in the spatial motion vector prediction and deblocking boundary strength derivation process.
8.5.2.12 derivation of collocated motion vectors
The variables mvLXCol and availableFlagLXCol are derived as follows:
-if colCb is coded in intra prediction mode or IBC prediction mode, two components of mvLXCol are set equal to 0 and availableFlagLXCol is set equal to 0.
Otherwise, the motion vector mvCol, the reference index refIdxCol and the reference list identifier listCol are derived as follows:
if predFlagL0Col [ xColCb ] [ yColCb ] is equal to 0 and predFlagL1Col [ xColCb ] [ yColCb ] is equal to 0, then two variables of mvLXCL are set equal to 0 and availableFlagLXCL is set equal to 0.
Otherwise, the following applies:
-if sbFlag is equal to 0, availableFlagLXCol is set equal to 1 and the following applies:
-if predFlagL0Col [ xColCb ] [ yColCb ] equals 0, mvCol, refIdxCon, and listCol are set equal to mvL1Col [ xColCb ] [ yColCb ], refIdxL1Col [ xColCb ] [ yColCb ], and L1, respectively.
Otherwise, if predFlagL0Col [ xColCb ] [ yColCb ] is set equal to 1 and predFlagL1Col [ xColCb ] [ yColCb ] is equal to 0, mvCol, refIdxCol, and listCol are set equal to mvL0Col [ xColCb ] [ yColCb ], refIdxL0Col [ xColCb ] [ yColCb ], and L0, respectively.
Else (predFlagL0Col [ xColCb ] [ yColCb ] equals 1 and
predFlagL1Col [ xColCb ] [ yColCb ] equals 1), the following allocations are made:
-if NoBackwardPredFlag equals 1, mvCol, refIdxCol and listCol are set equal to mvLXClC [ xColCb ] [ yColCb ], refIdxLXClCb [ xColCb ] [ yColCb ] and LX, respectively.
Otherwise, mvCol, refIdxCol, and listCol are set equal to mvLNCol [ xColCb ] [ yccb ], refIdxLNCol [ xColCb ] [ yccb ], and LN, respectively, where N is the value of collocated _ from _ l0_ flag.
Else (sbFlag equal to 1), the following applies:
-if PredFlagLXClC [ xColCb ] [ yColCb ] is equal to 1, mvCol, refIdxCol and listCol are set equal to mvLXClC [ xColCb ] [ yColCb ], refIdxLXClC [ xColCb ] [ yColCb ] and LX, respectively, availableFlagLXClC is set to 1.
Else (PredFlagLXClC [ xColCb ] [ yColCb ] equals 0), the following applies:
if NoBackwardPredFlag is equal to 1 and
PredFlaglLYCol [ xColCb ] [ yColCb ] is equal to 1, then mvCol, refIdxCol, and listCol are set to mvLYCol [ xColCb ] [ yColCb ], refIdxLYCol [ xColCb ] [ yColCb ], and LY, respectively, where Y is equal to! X, where X is the value of X called by this procedure.
availableFlagLXCol is set equal to 1.
-otherwise, both components of mvLXCol are set equal to 0 and
availableFlagLXCol is set equal to 0.
5.3 example #3
The working draft specified in JFET-O2001-vE may be changed as follows.
8.5.6.6.2 Default weighted sample prediction Process
The variables shift1, shift2, offset1, offset2, and offset3 are derived as follows:
variable shift1 is set equal to Max (2,14-bitDepth) and variable shift2 is set equal to Max (3, 15-bitDepth).
The variable offset1 is set equal to 1< < (shift 1-1).
The variable offset2 is set equal to 1< < (shift 2-1).
The variable offset3 is set equal to 1< < (shift2+1[ [2] ]).
Else (predflag l0 equals 1 and predflag l1 equals 1), the following applies:
-if bcwIdx is equal to 0 or ciip _ flag [ xCb ] [ yCb ] is equal to 1, the prediction sample value is derived as follows:
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(8-823)
(predSamplesL0[x][y]+predSamplesL1[x][y]+offset2)>>shift2)
else (bcwIdx not equal 0 and ciip _ flag [ xCb ] [ yCb ] equal 0), the following applies:
the variable w1 is set equal to bcwlut bcwIdx, where bcwlut k ═ 4,5,3,10, -2.
The variable w0 is set equal to (8-w 1).
The predicted sample values are derived as follows.
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(8-824)
(w0*predSamplesL0[x][y]+w1*predSamplesL1[x][y]+offset3)>>(shift2+[[3]]2))
Alternatively, the following may apply:
the variables shift1, shift2, offset1, offset2, and offset3 are derived as follows:
variable shift1 is set equal to Max (2,14-bitDepth) [ and variable shift2 is set equal to Max (3,15-bitDepth) ] ].
The variable offset1 is set equal to 1< < (shift 1-1).
The variable offset2 is set equal to 1< < (shift1+ [ [2- ] ] 1).
The variable offset3 is set equal to 1< < (shift [ [2] ]1+ 2).
Else (predflag l0 equals 1 and predflag l1 equals 1), the following applies:
if bcwIdx is equal to 0 or ciip _ flag [ xCb ] [ yCb ] is equal to 1, the prediction sample values are derived as follows:
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(8-823)
(predSamplesL0[x][y]+predSamplesL1[x][y]+offset2)>>(shift1+1)[[2]])
else (bcwIdx not equal 0 and ciip _ flag [ xCb ] [ yCb ] equal 0), the following applies:
the variable w1 is set equal to bcwlut bcwIdx, where bcwlut k ═ 4,5,3,10, -2.
The variable w0 is set equal to (8-w 1).
The predicted sample values are derived as follows.
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(8-824)
(w0*predSamplesL0[x][y]+w1*predSamplesL1[x][y]+offset3)>>(shift1+3[[2+3]]))
Alternatively, the following may apply:
the variables shift1, shift2, offset1, offset2, and offset3 are derived as follows:
variable shift1 is set equal to Max (2,14-bitDepth) and variable shift2 is set equal to Max (3, 15-bitDepth).
The variable offset1 is set equal to 1< < (shift 1-1).
The variable offset2 is set equal to 1< < (shift 2-1).
- [ [ variable offset3 is set equal to 1< < (shift2+ 2). ]]
Else (predflag l0 equals 1 and predflag l1 equals 1), the following applies:
-if bcwIdx is equal to 0 or ciip _ flag [ xCb ] [ yCb ] is equal to 1, the prediction sample value is derived as follows:
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(8-823)
(predSamplesL0[x][y]+predSamplesL1[x][y]+offset2)>>shift2)
else (bcwIdx not equal 0 and ciip _ flag [ xCb ] [ yCb ] equal 0), the following applies:
the variable w1 is set equal to bcwlut bcwIdx, where bcwlut k ═ 4,5,3,10, -2.
The variable w0 is set equal to (8-w 1).
The predicted sample values are derived as follows.
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(8-824)
(w0*((predSamplesL0[x][y]+offset1)>>shift1)+w1*((predSamplesL1[x][y]+offset1)>>shift1)+4)>>(8[[shift2+3)]])
5.4 example #4
The working draft specified in JFET-O2001-vE may be changed as follows.
8.5.6.6.2 Default weighted sample prediction Process
Else (bcwIdx not equal 0 and ciip _ flag [ xCb ] [ yCb ] equal 0), the following applies:
the variable w1 is set equal to bcwWLut [ bcwIdx ], where bcwWLut [ k ] ═ {4,5,3,10,2[ -2] ] }/{4,5,3,10,1}/{4,3,5,10,2}/{4,3,5,10,1}/{4,5,3,10, -1}.
5.5 example #5
The working draft specified in JFET-O2001-vE may be changed as follows.
8.5.2.7 derivation process of Merge motion vector difference
mMvdL1[0]=Clip3(-2[[15]]17,2[[15]]17-1,(distScaleFactor*mMvdL0[0]+(8-394)
128-(distScaleFactor*mMvdL0[0]>=0))>>8)
mMvdL1[1]=Clip3(-2[[15]]17,2[[15]]17-1,(distScaleFactor*mMvdL0[1]+(8-395)
128-(distScaleFactor*mMvdL0[1]>=0))>>8)
mMvdL0[0]=Clip3(-2[[15]]17,2[[15]]17-1,(distScaleFactor*mMvdL1[0]+(8-404)
128-(distScaleFactor*mMvdL1[0]>=0))>>8)
mMvdL0[1]=Clip3(-2[[15]]17,2[[15]]17-1,,(distScaleFactor*mMvdL1[1]+(8-405)
128-(distScaleFactor*mMvdL1[1]>=0))>>8))
5.6 example #6
The working draft specified in JFET-O2001-vE may be changed as follows.
7.3.8.5 codec Unit syntax
Figure BDA0003544811710000401
Figure BDA0003544811710000411
5.7 example #7
The working draft specified in JFET-O2001-vE may be changed as follows.
8.5.1 general decoding procedure for codec units coded in inter prediction mode the decoding procedure for a codec unit coded in inter prediction mode consists of the following sequential steps:
1. the variable dmvrFlag is set equal to 0 and the variable hpeliffidx is set equal to 0.
2. The motion vector component and the reference index of the current codec unit are derived as follows:
otherwise, if the mergetrieangleflag [ xCb ] [ yCb ] is equal to 1, and both inter _ affine _ flag [ xCb ] [ yCb ] and merge _ sublock _ flag [ xCb ] [ yCb ] are equal to 0, the derivation process of the triangular motion vector component and the reference index specified in section 8.5.4.1 is called using the luminance coding block position (xCb, yCb), the luminance coding block width cbWidth and the luminance coding block height cbHeight are input, and the luminance motion vectors mvA and mvB, the chrominance motion vectors mvCA and mvCB, the reference indexes refIdxA and refxb, and the prediction list flags predlistflag a and predlistflag, and the half sample point interpolation filter indexes idxa and hpelidxb are output.
3. For xsbdx ═ 0.. numbx-1, ysbdx ═ 0.. numSbY-1, the array of luma and chroma motion vectors after decoder-side motion vector refinement (refMvLX [ xsbdx ] [ ysbdx ] and refMvCLX [ xsbdx ] [ ysbdx ], where X is 0 and 1) is derived as follows:
4. the prediction sampling point of the current coding and decoding unit is derived as follows:
-if MergeTriangleFlag [ xCb ] [ yCb ] is equal to 0, the predicted samples for the current codec unit are derived as follows:
else (MergeTriangleFlag [ xCb)][yCb]Equal to 1), invokes the decoding process of the inter-triangle block specified in section 8.5.7.1 using the luma codec block position (xCb, yCb), the luma codec block width cbWidth and the luma codec block height cbHeight, the luma motion vectors mvA and mvB, the chroma motion vectors mvCA and mvCB, the reference indices refIdxA and refIdxB, the prediction index refIdxBList flags predListFlaga and predListFlagb, and half-sample interpolation filter indices hpelifIdxA and hpelifIdxB as inputs, and as (cbWidth) x (cbHeight) array predSamples that predict luma samplesLAnd two (cbWidth/SubWidth C) x (cbHeight/SubHeight C) arrays predSamples for predicting chroma sampling pointsCbAnd predSamplesCrAs an output, one for each of the chrominance components Cb and Cr.
8.5.4 derivation of triangular motion vector components and reference indices
8.5.4.1 general rule
The inputs to this process:
-the luminance position of the left top luma sample of the current luma codec block relative to the left top luma sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current codec block in luminance samples,
a variable cbHeight specifying the height of the current codec block in the luma samples.
The outputs of this process are:
1/16 fractional sample accuracy luminance motion vectors mvA and mvB,
1/32 fractional sample accuracy chroma motion vectors mvCA and mvCB,
reference indices refIdxA and refIdxB,
prediction list flags predlistflag a and predlistflag b [ ]),
half-sample interpolation filter indices hpeliffixa and hpeliffidxb.
The derivation process of the luminance motion vector of the triangle Merge mode specified in section 8.5.4.2 is called using the luminance position (xCb, yCb), with the variables cbWidth and cbHeight as inputs, and outputs as the luminance motion vectors mvA, mvB, the reference indices refIdxA, refIdxB, and the prediction list flags predlistflag a and predlistflag b.
The derivation process of the chroma motion vectors in section 8.5.2.13 is invoked, with mvA and refIdxA as inputs, and the output as mvCA.
The derivation process of the chroma motion vectors in section 8.5.2.13 is invoked, with mvB and refIdxBas as inputs, and the output as mvCB.
8.5.4.2 derivation process of brightness motion vector of Merge triangle mode
This process is invoked only when the MergeTriangleFlag [ xCb ] [ yCb ] is equal to 1, where (xCb, yCb) specifies the top-left samples of the current luma codec block relative to the top-left luma samples of the current picture.
The inputs to this process are:
-luminance position (xCb, yCb) of a top-left sample of the current luminance codec block relative to a top-left luminance sample of the current picture,
a variable cbWidth specifying the width of the current codec block in luminance samples,
a variable cbHeight specifying the height of the current codec block in the luma samples.
The outputs of this process are:
-sampling luminance motion vectors of mvA and mvB with 1/16 fractional sample accuracy,
reference indices refIdxA and refIdxB,
prediction list flags predlistflag a and predlistflag b [ ]),
half-sample interpolation filter indices hpeliffixa and hpeliffidxb.
The motion vectors mvA and mvB, the reference indices refIdxA and refIdxB, and the prediction list flags predlistflag a and predlistflag b are derived by the following sequential steps:
1. the derivation process of the luma motion vector of the Merge mode specified in section 8.5.2.2 is invoked, where luma position (xCb, yCb), variables cbWidth and cbHeight are input, and the outputs are mvL0[0] [0], mvL1[0] [0], reference indices refIdxL0, refIdxL1, prediction list use flags predFlagl0[0] [0] and predFlagl1[0] [0], bidirectional prediction weight index bcwIdx, and a prediction candidate list mergetCandIST.
2. The variables m and n, which are the merge indices for triangle partitions 0 and 1, respectively, are derived using merge _ idx0[ xCb ] [ yCb ] and merge _ idx1[ xCb ] [ yCb ], as follows:
m=merge_triangle_idx0[xCb][yCb] (8-475)
n=merge_triangle_idx1[xCb][yCb]+(merge_triangle_idx1[xCb][yCb]>=m)?1:0 (8-476)
3. let refIdxL0M and refIdxL1M, predflag l0M and predflag l1M, and mvL0M and mvL1M be reference indices, prediction list use flag, and motion vector of the merging candidate M at position M in the merging candidate list mergeCandList (M ═ mergeCandList [ M ]).
4. The variable X is set equal to (m &0X 01).
5. When predFlagLXM is equal to 0, X is set equal to (1-X).
6. The following applies:
mvA[0]=mvLXM[0] (8-477)
mvA[1]=mvLXM[1] (8-478)
refIdxA=refIdxLXM (8-479)
predListFlagA=X (8-480)
hpelIfIdxA=hpelIfIdxM
(8-xxx)
7. let refIdxL0N and refIdxL1N, predflag l0N and predflag l1N, and mvL0N and mvL1N be reference indices, prediction list use flag, and motion vector of the merging candidate N at position m in the merging candidate list mergeCandList (N ═ mergeCandList [ N ]).
8. The variable X is set equal to (n &0X 01).
9. When predFlagLXN is equal to 0, X is set equal to (1-X).
10. The following applies:
mvB[0]=mvLXN[0] (8-481)
mvB[1]=mvLXN[1] (8-482)
refIdxB=refIdxLXN (8-483)
predListFlagB=X (8-484)
hpelIfIdxB=hpelIfIdxN
(8-xxx)
8.5.7 decoding procedure for triangular inter blocks
8.5.7.1 general rule
This process is invoked when decoding a codec cell that MergeTriangleFlag [ xCb ] [ yCb ] equals 1.
The inputs to this process are:
-specifying a luminance position of top left samples of the current codec block relative to top left luminance samples of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current codec block in luminance samples,
a variable cbHeight specifying the height of the current codec block in the luminance samples,
-sampling luminance motion vectors of mvA and mvB with 1/16 fractional sample accuracy,
-chrominance motion vectors mvCA and mvCB,
reference indices refIdxA and refIdxB,
prediction list flags predlistflag a and predlistflag b [ ]),
half-sample interpolation filter indices hpeliffixa and hpeliffidxb.
The outputs of this process are:
-a (cbWidth) x (cbHeight) array of luma prediction samples predSamplesL
-an array of (cbWidth/sub width hc) x (cbHeight/sub height c) chroma prediction samples of the component Cb predSamplesCb
-an (cbWidth/SubWidth) x (cbHeight/Subheight C) array predSamples of component CrCr
Let predSamplesLALAnd predSamplesLBL(cbWidth) × (cbHeight) array as predicted luma sample values and predSamplesLACb、predSamplesLBCb、predSamplesLACrAnd predSamplesLBCr(cbWidth/SubWidthC) x (cbHeight/SubHeight C) array as the predicted chromaticity sample value.
Derivation of predSamples by the following sequential stepsL、predSamplesCbAnd predSamplesCr: 1. for each of a and B for N, the following applies:
deriving two ordered two-dimensional arrays refPicLN of intensity samples by calling the procedure specified in section 8.5.6.2LAnd two ordered two-dimensional arrays of chroma samples refPicLNCbAnd refPicLNCrA composed reference picture in which X set equal to predlistflag n and refIdxX set equal to refIdxN are input.
Deriving the array predSamplesLN by invoking the fractional sample interpolation procedure specified in section 8.5.6.3LWherein the luminance position (xCb, yCb), the luminance codec block width sbWidth set equal to cbWidth, the luminance codec block height sbHeight set equal to cbHeight, the motion vector offset mvOffset set equal to (0,0), the motion vector mvLX set equal to mvN, and the motion vector offset refPicLN set equal to refPicLNLReference array refPicLX ofLThe variable bdafflag set equal to false, hpelifdxn, and the variable cIdx set equal to 0 are inputs.
Deriving the array predSamplesLN by invoking the fractional sample interpolation procedure specified in section 8.5.6.3CbWherein the luminance position (xCb, yCb), the codec block width sbWidth set equal to cbWidth/sub width hc, the codec block height sbHeight set equal to cbHeight/sub height c, the motion vector offset mvOffset set equal to (0,0), the motion vector mvLX set equal to mvCN, and the motion vector offset refPicLN set equal to refPicLNCbReference array refPicLX ofCbAs inputs, a variable bdafflag set equal to false, hpelifdxn, and a variable cIdx set equal to 1.
Deriving the array predSamplesLN by invoking the fractional sample interpolation procedure specified in section 8.5.6.3CrWherein the luminance position (xCb, yCb), the codec block width sbWidth set equal to cbWidth/sub width hc, the codec block height sbHeight set equal to cbHeight/sub height c, the motion vector offset mvOffset set equal to (0,0), the motion vector mvLX set equal to mvCN, and the motion vector offset refPicLN set equal to refPicLNCrReference array refPicLX ofCrSet equal to a false variable bdafflag,hpelifdxn, and a variable cIdx set equal to 2 as inputs.
The split direction of the Merge triangle pattern variable triangleDir is set equal to Merge _ triangle _ split _ dir [ xCb ] [ yCb ].
3. Deriving predicted samples predSamples within the current luma codec block by a weighted sample prediction process that calls the triangle Merge mode specified in section 8.5.7.2L[xL][yL](wherein xLcbWidth-1 and yLcbHeight-1), where a codec block width nCbW set equal to cbWidth, a codec block height nCbH set equal to cbHeight, and a sample point array predSamplesLA are set equal to cbHeightLAnd predSamplesLBLAnd the variable triangleDir and the variable cIdx set equal to 0 as inputs.
4. Deriving predicted samples predSamples within a current chroma component Cb codec block by a weighted sample prediction process that invokes the triangle Merge mode specified in section 8.5.7.2Cb[xC][yC](wherein xCcbWidth/subwidth hc-1 and yCcbHeight/subheight c-1), where a codec block width nCbW set equal to cbWidth/subwidth hc, a codec block height nCbH set equal to cbHeight/subheight c, and a sample point array predSamplesLA are set equal to cbWidth/subheight cCbAnd predSamplesLBCbAnd the variable triangleDir and the variable cIdx set equal to 1 as inputs.
5. Deriving predicted samples predSamples within the current chroma component Cr codec block by a weighted sample prediction process that invokes the triangle Merge mode specified in section 8.5.7.2Cr[xC][yC](wherein xCcbWidth/subwidth hc-1 and yCcbHeight/subheight c-1), where a codec block width nCbW set equal to cbWidth/subwidth hc, a codec block height nCbH set equal to cbHeight/subheight c, and a sample point array predSamplesLA are set equal to cbWidth/subheight cCrAnd predSamplesLBCrAnd the variable triangleDir and the variable cIdx set equal to 2 as inputs.
6. The motion vector of the process of storing the Merge triangle pattern specified in section 8.5.7.3 is called, with the luminance coding block position (xCb, yCb), the luminance coding block width cbWidth, the luminance coding block height cbHeight, the dividing direction triangleDir, the luminance motion vectors mvA and mvB, the reference indexes refIdxA and refIdxB, and the prediction list flags predlistflag a and predlistflag b, and the half-sample interpolation filter indexes hpelifdxa and hpelifdidxb as inputs.
8.5.7.3 motion vector for storing procedure of triangle Merge mode
This process is invoked when decoding a codec cell that MergeTriangleFlag [ xCb ] [ yCb ] equals 1.
The inputs to this process are:
-specifying a luminance position of an upper left sample of the current codec block relative to an upper left top luminance sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current codec block in luminance samples,
a variable cbHeight specifying the height of the current codec block in the luminance samples,
-a variable triangleDir specifying the segmentation direction,
-sampling luminance motion vectors of mvA and mvB with 1/16 fractional sample accuracy,
reference indices refIdxA and refIdxB,
prediction list flags predlistflag a and predlistflag b [ ]),
half-sample interpolation filter indices hpeliffixa and hpeliffidxb.
Variables numSbX and numSbY that specify the number of 4 × 4 blocks of the current codec block in the horizontal and vertical directions are set equal to numSbX ═ cbWidth > >2 and numSbY ═ cbHeight > > 2.
The variable minSb is set equal to Min (numSbX, numSbY) -1.
The variable cbRatio is derived as follows:
cbRatio=(cbWidth>cbHeight)?(cbWidth/cbHeight):(cbHeight/cbWidth)(8-848)
for each 4 × 4 sub-block at the sub-block index (xsbdx, ysbdx) (where xsbdx ═ 0.. numbx-1, and ysbdx ═ 0.. numSbY-1), the following applies:
the variables xIdx and yIdx are derived as follows:
xIdx=(cbWidth>cbHeight)?(xSbIdx/cbRatio):xSbIdx (8-849)
yIdx=(cbWidth>cbHeight)?ySbIdx:(ySbIdx/cbRatio) (8-850)
the variable sType is derived as follows:
-if triangleDir is equal to 0, the following applies:
sType=(xIdx==yIdx)?2:((xIdx>yIdx)?0:1)(8-851)
else (triangleDir equals 1), the following applies:
sType=(xIdx+yIdx==minSb)?2:((xIdx+yIdx<minSb)?0:1)(8-852)
depending on the value of sType, the following allocations are made:
-if sType equals 0, then the following applies:
predFlagL0=(predListFlagA==0)?1:0
(8-853)
predFlagL1=(predListFlagA==0)?0:1
(8-854)
refIdxL0=(predListFlagA==0)?refIdxA:-1 (8-855)
refIdxL1=(predListFlagA==0)?-1:refIdxA (8-856)
mvL0[0]=(predListFlagA==0)?mvA[0]:0 (8-857)
mvL0[1]=(predListFlagA==0)?mvA[1]:0 (8-858)
mvL1[0]=(predListFlagA==0)?0:mvA[0] (8-859)
mvL1[1]=(predListFlagA==0)?0:mvA[1] (8-860)
hpelIfIdx=hpelIfIdxA (8-xxx)
otherwise, if sType equals 1 or (sType equals 2 and predlistflag a + predlistflag b does not equal 1), then the following applies:
predFlagL0=(predListFlagB==0)?1:0(8-861)
predFlagL1=(predListFlagB==0)?0:1(8-862)
refIdxL0=(predListFlagB==0)?refIdxB:-1 (8-863)
refIdxL1=(predListFlagB==0)?-1:refIdxB (8-864)
mvL0[0]=(predListFlagB==0)?mvB[0]:0 (8-865)
mvL0[1]=(predListFlagB==0)?mvB[1]:0 (8-866)
mvL1[0]=(predListFlagB==0)?0:mvB[0] (8-867)
mvL1[1]=(predListFlagB==0)?0:mvB[1] (8-868)
hpelIfIdx=hpelIfIdxB (8-xxx)
else (sType equals 2 and predlistflag a + predlistflag b equals 1), then the following applies:
predFlagL0=1 (8-869)
predFlagL1=1 (8-870)
refIdxL0=(predListFlagA==0)?refIdxA:refIdxB (8-871)
refIdxL1=(predListFlagA==0)?refIdxB:refIdxA (8-872)
mvL0[0]=(predListFlagA==0)?mvA[0]:mvB[0] (8-873)
mvL0[1]=(predListFlagA==0)?mvA[1]:mvB[1] (8-874)
mvL1[0]=(predListFlagA==0)?mvB[0]:mvA[0] (8-875)
mvL1[1]=(predListFlagA==0)?mvB[1]:mvA[1] (8-876)
hpelIfIdx=hpelIfIdxA&&hpelIfIdxB (8-xxx)
-for x 0..3 and y 0..3 the following assignments are made:
MvL0[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=mvL0 (8-877)
MvL1[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=mvL1 (8-878)
RefIdxL0[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=refIdxL0 (8-879)
RedIdxL1[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=refIdxL1 (8-880)
PredFlagL0[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=predFlagL0 (8-881)
PredFlagL1[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=predFlagL1 (8-882)
hpelIfIdx[(xSbIdx<<2)+x][(ySbIdx<<2)+y]=hpelIfIdx (8-xxx)
fig. 14A is a block diagram of the video processing apparatus 1400. Device 1400 may be used to implement one or more of the methods described herein. The device 1400 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. Device 1400 may include one or more processors 1402, one or more memories 1404, and video processing hardware 1406. Processor 1402 may be configured to implement one or more of the methods described in this document (including, but not limited to, methods 800, 900, 1000, 1100, 1200, and 1300). The one or more memories 1404 may be used to store data and codes for implementing the methods and techniques described herein. Video processing hardware 1406 may be used to implement some of the techniques described in this document in hardware circuitry.
Fig. 14B is a block diagram illustrating an example video processing system 1410 in which various techniques disclosed herein may be implemented. Various embodiments may include some or all of the components of system 1410. The system 1410 may include an input 1412 for receiving video content. The video content may be received in a raw or uncompressed format (e.g., 8 or 10 bit multi-component pixel values), or may be received in a compressed or encoded format. Input 1412 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces, such as ethernet, Passive Optical Networks (PONs), etc., and wireless interfaces, such as Wi-Fi or cellular interfaces.
System 1410 may include a codec component 1414 that may implement various codecs or encoding methods described in this document. The codec component 1414 may reduce the average bit rate of the video from the input 1412 to the output of the codec component 1414 to produce a codec representation of the video. Thus, codec techniques are sometimes referred to as video compression or video transcoding techniques. The output of the codec component 1414 may be stored or transmitted via a connected communication, as represented by component 1416. A stored or transmitted bitstream (or encoded) representation of video received at input 1412 may be used by component 1418 to generate pixel values or displayable video that is sent to display interface 1420. The process of generating user-viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "codec" operations or tools, it will be understood that codec tools or operations are used at the encoder and that corresponding decoding tools or operations that reverse the results of the encoding will be performed by the decoder.
Examples of a peripheral bus interface or display interface may include Universal Serial Bus (USB) or High Definition Multimedia Interface (HDMI) or Displayport, among others. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described in this document may be embodied in various electronic devices such as mobile phones, laptop computers, smart phones, or other devices 7 capable of performing digital data processing and/or video display.
In some embodiments, the video codec method may be implemented using an apparatus implemented on a hardware platform as described with respect to fig. 14A or fig. 14B.
The following list provides embodiments that may address the technical issues described in this document, as well as other issues. A first set of clauses describes certain features and aspects of the disclosed technology in the previous section.
1. A method for processing video, comprising: determining availability of one or more collocated motion vectors based on a prediction mode of a collocated video block relative to a current video block; and performing a conversion between the current block and a bitstream representation of the current block based on the one or more collocated motion vectors, wherein the indication of the prediction mode comprises a bit indicating whether the current video block was coded in inter mode or non-inter mode.
2.A method for processing video, comprising: determining a type of a collocated video block of the video block for a transition between a codec representation of the video block and the video block, wherein the type takes only one of two possible values; and performing the conversion based on the determination.
3. The method of clause 1, wherein the two possible values include a first value indicating that the collocated video block is inter-coded and a second value indicating that the collocated video block is coded using a mode other than inter-coded.
4. The method of any of clauses 1-3, wherein the determining is the same when the current video block is coded using the palette prediction mode and when the current video block is coded using the intra-block prediction mode or the intra-block copy (IBC) prediction mode.
5. The method of any of clauses 1-3, wherein one or more collocated motion vectors are determined to be unavailable when a collocated video block is coded using a non-inter prediction mode.
6. The method of any of clauses 1-3, wherein one or more collocated motion vectors are determined to be unavailable and include a default motion vector when a collocated video block is coded using a non-inter prediction mode.
7. The method of clause 5 or 6, wherein the non-inter prediction mode is an intra prediction mode, a palette prediction mode, or an Intra Block Copy (IBC) prediction mode.
8. The method of any of clauses 1-3, wherein determining is based on a reference index or a reference list of collocated video blocks.
9. The method of clause 8, wherein the reference index is a predetermined value.
10. The method of clause 8, wherein the reference index does not include a predetermined value.
11. The method of clause 9 or 10, wherein the predetermined value is 0.
12. A method for video processing, comprising: deriving a prediction sample point for a current block based on a first weight table associated with a bi-directional prediction mode (BCW) process having Coding Unit (CU) level weights, wherein the first weight table is asymmetric; and performing a conversion between the current block and a bitstream representation of the current block based on the prediction samples.
13. The method of clause 12, wherein the entries in the first weight table are non-monotonically increasing.
14. The method of clause 12, wherein the BCW procedure is further based on a second weight table different from the first weight table.
15. The method of clause 12, wherein deriving the predicted samples comprises: converting the intermediate predicted samples to a first bit depth; applying weights from the first weight table to the intermediate predicted samples to derive predicted samples; and converting the predicted samples to a second bit depth.
16. The method of clause 15, wherein the second bit depth is a bit depth of a color component of the current video block.
17. A method for video processing, comprising: based on the size of the current video block, making a decision regarding selective signaling of an indication of skip mode coding of the current video block in a bitstream representation of the current video block; and performing a conversion between the current block and the bitstream representation of the current block based on the decision.
18. The method of clause 17, wherein the indication is signaled when it is determined that the slice type that includes the current video block is an I slice, sps _ ibc _ enabled _ flag is set to true, and the height and width of the current video block are less than or equal to N.
19. The method of clause 17, wherein the indication is signaled when it is determined that the current video block is coded using an Intra Block Copy (IBC) mode and the height and width of the current video block is less than or equal to N.
20. The method of clause 18 or 19, wherein N-64.
21. A method for video processing, comprising: based on applying the first coding tool to one or more blocks in the first color component of the current video block, making a decision regarding selectively applying a second coding tool to at least one block of a second color component of the current video block; and performing a conversion between the current block and the bitstream representation of the current block based on the decision.
22. The method of clause 21, wherein the second codec tool is applied upon determining that the second codec tool is equal to the first codec tool.
23. The method of clause 21, wherein the second coding tool is not applied when it is determined that the second coding tool is different from the first coding tool.
24. The method of clause 21, wherein the decision is further based on an indication in the bitstream representation.
25. The method of any of clauses 21 to 24, wherein the first color component is a luma component and the second color component is a chroma component.
26. The method of any of clauses 21 to 24, wherein the first color component is a first chroma color component and the second color component is a second chroma color component.
27. The method according to any of clauses 21 to 26, wherein the first coding tool and the second coding tool are one of Temporal Motion Vector Prediction (TMVP), optional temporal motion vector prediction (ATMVP), bi-directional prediction (BCW) process with Coding Unit (CU) level weights, Merge mode with motion vector differences (MMVD), or position-family dependent prediction combining (PDPC) process.
28. A method for video processing, comprising: based on the decision of the motion vector in the current video block, making a decision regarding the alternative signaling for using the indication of the alternative half-pixel interpolation filter instead of the default half-pixel interpolation filter; and performing a conversion between the current block and the bitstream representation of the current block based on the decision.
29. The method of clause 28, wherein a default half-pixel interpolation filter is used when it is determined that the current video block is coded using Merge mode with motion vector differences (MMVD) and the accuracy of the reconstructed motion vector is less than half a pixel.
30. The method of any of clauses 1 to 9, wherein performing a transition is further based on signaling in: a Decoder Parameter Set (DPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), a Video Parameter Set (VPS), a sequence header, a picture header, a slice header, or a slice group header.
31. The method of any of clauses 1-30, wherein performing a conversion comprises generating a bitstream representation from a current video block.
32. The method of any of clauses 1-30, wherein performing a conversion comprises generating a current video block from a bitstream representation.
33. A video decoding apparatus comprising a processor configured to implement the method according to any of clauses 1 to 32.
34. A computer program product having a non-transitory computer readable medium stored thereon, the computer program product comprising program code for performing the method according to any of clauses 1 to 32.
The second set of clauses describes certain features and aspects of the techniques disclosed in the previous section (e.g., example items 8-13 and 21).
1. A method (e.g., method 2110 shown in fig. 21A) for video processing, comprising: performing (2112) a conversion between a video block of the video and a codec representation of the video, wherein the codec representation complies with a format rule that specifies selectively including in the codec representation an indication of a skip mode codec of the video block based on a size of the video block, wherein the skip mode codec allows the conversion to be performed without generating a residual of the video block or a residual of the non-codec video block.
2. The method of clause 1, wherein the dimensions comprise a height and a width of the video block.
3. The method of clause 1, wherein the format rule specification further selectively includes an indication based on at least one of: the slice type or sps _ ibc _ enabled _ flag of the slice that includes the video block.
4. The method of clause 1, wherein the rule specifies that the indication is included in the codec representation if at least one of the following conditions is satisfied: 1) a slice type of a slice including video blocks belongs to an I-slice, 2) sps _ ibc _ enabled _ flag is set to true, 3) a height and width of the video blocks are less than or equal to N, where N is an integer; or 4) video blocks are coded using an Intra Block Copy (IBC) mode.
5. The method of clause 4, wherein the at least one condition comprises: the slice type belongs to an I-slice, sps _ ibc _ enabled _ flag is set to true, or the height and width of the video block are less than or equal to N, where N is an integer.
6. The method of clause 4, wherein the at least one condition comprises: video blocks are coded using an Intra Block Copy (IBC) mode; or the height and width of the video block are less than or equal to N, where N is an integer.
7. The method of any of clauses 4-6, wherein N-64.
8. The method of clause 4, wherein the at least one condition does not include a condition corresponding to sps _ palette _ enabled _ flag.
9. A method for video processing (e.g., method 2120 shown in fig. 21B), comprising: determining (2122), for a conversion between a video block of a video and a codec representation of the video, an applicability of a particular codec tool to the video block as a first color component of the video based on whether the particular codec tool applies to one or more corresponding video blocks as a second color component of the video; and performing (2124) a transformation based on the determination.
10. The method of clause 9, wherein the determining determines to disable a particular codec tool if the particular codec tool applies to one or more corresponding video blocks.
11. The method of clause 9, wherein the determining determines to disable the particular coding tool if the particular coding tool is different from the coding tool applied to the one or more corresponding video blocks.
12. The method of clause 9, wherein the message indicating the applicability of the particular coding tool to the video block is selectively signaled based on whether the particular coding tool applies to one or more corresponding video blocks.
13. The method of clause 9, wherein the determining determines that the particular coding tool is disabled and no signaling is required if the particular coding tool applies to one or more corresponding video blocks.
14. The method of clause 9, wherein the particular coding tool is applied differently to different color components of the video.
15. The method of clause 14, wherein signaling is performed how to apply a particular codec tool to the luma component and the chroma component of the video.
16. The method of any of clauses 9 to 15, wherein the first color component is a chroma color component and the second color component is a luma color component.
17. The method of any of clauses 9 to 15, wherein the first color component is a first chroma color component and the second color component is a second chroma color component.
18. The method of any of clauses 9 to 15, wherein the first color component is a luma color component and the second color component is a chroma color component.
19. The method of any of clauses 9-18, wherein the one or more corresponding video blocks cover at least one corresponding sample point of the video block.
20. The method of clause 19, wherein the position of the at least one corresponding sample point is scaled according to the color format of the video.
21. The method of clause 19 or 20, wherein the at least one corresponding sample point is located at (x0, y0), (x0+ W-1, y0+ H-1), (x0+ W/2-1, y0+ H/2-1), (x0+ W/2, y0+ H/2), (x0+ W/2, y0+ H/2-1), or (x0+ W/2-1, y0+ H/2), and wherein the upper left position of the video block is (x0, y0), and the width and height of the video block are W and H, respectively.
22. The method of any of clauses 9-21, wherein the particular coding tool is one of: temporal Motion Vector Prediction (TMVP), Alternative Temporal Motion Vector Prediction (ATMVP), bi-directional prediction (BCW) process with Coding Unit (CU) level weighting, Merge mode with motion vector difference (MMVD), or Position Dependent Prediction Combining (PDPC) process.
23. A method (e.g., method 2130 shown in fig. 21C) for video processing, comprising: for a conversion between a chroma video block of a video and a codec representation of the video, determining (2132) a position-dependent intra prediction combination (PDPC) method that is not allowed for codec of the chroma video block due to a corresponding luma block being codec using a particular codec mode; and performing (2134) a conversion based on the determining, wherein the PDPC method combines the neighboring samples with the prediction signal of the chroma video block to generate a refined prediction signal of the chroma video block.
24. The method of clause 23, wherein the particular codec mode corresponds to a matrix-based intra prediction (MIP) mode.
25. The method of clause 23, wherein the particular codec mode corresponds to a plurality of reference line (MRL) modes.
26. A method for video processing, comprising: performing a conversion between a video block of video and a codec representation of the video, wherein the codec representation complies with a format rule that specifies a codec condition satisfied by the video block to include an indication set to false indicating that an optional half-pixel interpolation filter is used instead of a default half-pixel interpolation filter.
27. The method of clause 26, wherein if the codec representation includes an indication set to false, the default half-pixel interpolation filter is used for the conversion.
28. The method of clause 26, wherein the format rule specification indicates that the setting to false is based on the precision of the motion vector in the video block.
29. The method of any of claims 26 to 28, wherein the format rule specifies that the indication is set to false if the video block is coded using Merge mode with motion vector Difference (MMVD) and the reconstructed motion information has a motion vector component higher than half-pixel precision.
30. The method of any of clauses 26 to 28, wherein the format rule specifies that the indication is set to false for a pair of Merge candidates having a motion vector component with greater than half-pixel precision.
31. The method of any of clauses 26 to 28, wherein the indication is set to false if the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information having a motion vector component with greater than half-pixel precision.
32. The method of any of clauses 26 to 28, wherein the format rule specification indication is always false.
33. The method of any of clauses 26 to 28, wherein the format rule specifies that the indication is always set to false for video blocks coded using Merge mode with motion vector difference (MMVD).
34. The method of any of clauses 26 to 28, wherein the format rule specifies that if a particular motion vector difference is selected for conversion, the indication is always set to false for video blocks coded using Merge mode with motion vector differences (MMVD).
35. The method of clause 34, wherein the particular motion vector difference has an accuracy of 1/4 pixels.
36. The method of clause 26 or 27, wherein the format rule specifies that the indication is always set to false for the paired Merge candidates.
37. The method of clause 26 or 27, wherein the format rule specifies that the indication is always set to false if the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information.
38. The method of any of clauses 32-37, wherein the format rule is for indicating use of a bi-directional codec unit-level weighting (BCW) procedure.
39. The method of clause 26 or 27, wherein the format rule specifies that the indication is set to false if the video block has at least N motion vector components with greater than half-pixel precision, where N is an integer.
40. The method of clause 39, wherein the format rule specifies that the indication is set to false if the video block is coded using Merge mode with motion vector Difference (MMVD) and the reconstructed motion information has at least N motion vector components with a half-pixel precision higher.
41. The method of clause 39, wherein the format rule specifies that the indication is set to false for a pair of Merge candidates having at least N motion vector components with greater than half-pixel precision.
42. The method of clause 39, wherein the format rule specifies that the indication is set to false if the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information having at least N motion vector components with greater than half-pixel precision.
43. The method of any of clauses 39 to 42, wherein N depends on whether the video block is uni-directionally predicted or bi-directionally predicted.
44. The method of any of clauses 39-42, wherein N is the same regardless of whether the video block is uni-directionally predicted or bi-directionally predicted.
45. The method of any of clauses 26-44, wherein the indication set to false is stored and used for another video block.
46. A method for video processing, comprising: performing a conversion between a video block of a video and a codec representation of the video according to a rule, wherein a prediction vector of the video block is generated based on a weighted average of prediction vectors of a plurality of partitions of the video block, wherein at least one partition is angularly divided, and wherein the rule specifies that an optional half-pixel interpolation filter is used for interpolating sample values at half-pixel locations when determining the prediction vector.
47. The method of clause 46, wherein the indication indicative of the use of the optional half-pixel interpolation filter is inherited from the Merge candidate.
48. The method of clause 46, wherein the indication indicating the use of the optional half-pixel interpolation filter is inherited from: spatial domain candidates, TMVP (temporal motion vector prediction) candidates, HMVP (history-based motion vector prediction) candidates, and/or pair candidates.
49. The method of clause 48, wherein when the indication is inherited from the pair of candidates depends on hPelIf1 and/or hPelIf2, wherein hPelIf1 and hPelIf2 are indications of a first candidate and a second candidate, respectively, among the pair of candidates.
50. The method of clause 46, wherein the indication indicating use of the optional half-pixel interpolation filter is set to true for video blocks coded using geometric prediction mode.
51. The method of clause 46, wherein the rule further specifies that an optional half-pixel interpolation filter be used during motion compensation of the single partition or the multiple partitions.
52. The method of clause 46, wherein during the process of motion field storage, one or more indications are stored for one or more motion fields indicating the use of an optional half-pixel interpolation filter.
53. The method of clause 52, wherein the same indication of the use of the selectable half-pixel interpolation filter is stored for all 4 x 4 units of the video block.
54. The method of clause 53, wherein the same indication is set to false, (hPelIfPart1& & hPelIfPart part2) or (hPelIfPart1| | hPelIfPart part2), wherein hPelIfPart1 and hPelIfPart2 correspond to the indication of the use of the selectable half-pixel interpolation filter for the first partition and the second partition, respectively, of the video block.
55. The method of clause 52, wherein different indications of the use of the selectable half-pixel interpolation filter are stored for different partitions of the video block that comprise a 4 x 4 unit.
56. The method of clause 55, wherein the different indication comprises an indication that hPelIfPartX is set for some 4X 4 units in the X partition of the video block, wherein hPelIfPartX corresponds to an indication that an optional half-pixel interpolation filter is used for the X partition, and X is 1 or 2.
57. The method of clause 55, wherein the different indications include an indication that the 4 x 4 units in the weighting region for the video block are set to (hPelIfPart1& & hPelIfPart2), (hPelIfPart1| | hPelIfPart2), hPelIfPart1, or hPelIfPart2, wherein hPelIfPart1 and hPelIfPart2 correspond to indications of using the selectable half-pixel interpolation filter for the first partition and the second partition, respectively, of the video block.
58. A method for video processing, comprising: the conversion between video blocks of the video and a codec representation of the video is performed according to a rule, wherein the rule specifies an indication of a codec condition to apply to motion vector prediction candidates to be added to the Merge list based on information associated with one or more motion candidates present in the Merge list.
59. The method of clause 58, wherein the motion vector prediction candidate corresponds to a Temporal Motion Vector Prediction (TMVP) candidate for predicting motion information of the video block based on motion information of a collocated video block of the video.
60. The method of clause 58, wherein the motion vector prediction candidate corresponds to a sub-block based temporal motion vector prediction (SbTMVP) for predicting motion information of a sub-block of the video block based on motion information of a collocated video block of the video block.
61. The method of clause 58 or 59, wherein the one or more motion candidates correspond to one or more spatial domain Merge candidates.
62. The method of clause 59 or 60, wherein the one or more motion candidates comprise a first spatial Merge candidate, and wherein the rule specifies that the indication to use the selectable half-pixel interpolation filter for the TMVP candidate or the SbTMVP candidate is set equal to the indication to use the selectable half-pixel interpolation filter for the first spatial Merge candidate.
63. The method of clause 59 or 60, wherein the one or more motion candidates comprise a first spatial region Merge candidate, and wherein the rule specifies that a bi-directional codec unit-level weighting (BCW) index of the TMVP candidate or the SbTMVP candidate is set equal to the BCW index of the first spatial region Merge candidate.
64. The method of clause 59 or 60, wherein the one or more candidates comprise a first spatial domain Merge candidate that is a left spatial domain Merge candidate.
65. The method of clause 59, wherein the rule further specifies that the indication of using the optional half-pixel interpolation filter for the TMVP candidate and/or using a bi-directional codec unit-level weighting (BCW) index for the TMVP candidate depends on information associated with the one or more motion candidates.
66. A method (e.g., method 2140 of fig. 21D) for video processing, comprising: determining (2142) control point motion vectors for control points of video blocks that were coded using an affine Merge mode for a conversion between a video block of a video and a coded representation of the video according to a rule based on motion information of neighboring blocks of the video block; and performing a conversion based on the determination (2144), wherein the rule specifies that an indication to use the selectable half-pixel interpolation filter for the video block is set equal to an indication of a motion vector in a neighboring block.
67. The method of clause 66, wherein determining determines the control point motion vector located at the top left corner, the top right corner, or the bottom left corner of the video block based on the motion vectors in the neighboring blocks.
68. A method for video processing, comprising: the conversion between the video block of the video and the codec representation of the video is performed according to a rule, wherein the rule specifies that, if the video block is codec using an affine Merge mode inherited from a neighboring block of the video block, an indication that the selectable half-pixel interpolation filter is to be used for the video block is inherited from the neighboring block.
69. A method for video processing (e.g., method 2150 as shown in fig. 21E), comprising: deriving (2152) motion information of a video block of the video by checking the Merge candidates according to rules during a Merge candidate reconstruction process; and performing a conversion between the video block and a codec representation of the video, wherein the rule specifies that two Merge candidates are considered to be different during the candidate reconstruction process if the indications of the use of the selectable half-pixel interpolation filters of the two Merge candidates that are compared are not the same.
70. The method of any of clauses 1-69, wherein performing a conversion is further based on signaling in: a Decoder Parameter Set (DPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), a Video Parameter Set (VPS), a sequence header, a picture header, a slice header, or a slice group header.
71. The method of any of clauses 1-70, wherein converting comprises encoding the codec representation into a codec representation.
72. The method of any of clauses 1-70, wherein converting comprises decoding the codec representation to generate the video.
73. A video processing apparatus comprising a processor configured to implement the method of any one or more of clauses 1-72.
74. A computer-readable medium storing program code that, when executed, causes a processor to implement the method of any one or more of claims 1-72.
75. A computer readable medium storing a codec representation or bitstream representation generated according to any of the above described methods.
From the foregoing, it will be appreciated that specific embodiments of the presently disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the presently disclosed technology is not limited by the claims that follow.
Embodiments of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that contains other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
It is intended that the specification, together with the drawings, be considered exemplary only, with examples being meant. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Additionally, the use of "or" is intended to include "and/or" unless the context clearly indicates otherwise.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few embodiments and examples are described and other embodiments, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims (75)

1. A video processing method, comprising:
performing a conversion between a video block of a video and a codec representation of the video,
wherein the codec representation conforms to a format rule that specifies selectively including an indication of a skip mode codec for the video block in the codec representation based on a dimension of the video block,
wherein the skip mode coding allows the converting to be performed without generating a residual of the video block or coding a residual of the video block.
2. The method of claim 1, wherein the dimensions comprise a height and a width of the video block.
3. The method of claim 1, wherein the format rule specifies selectively including the indication based further on at least one of: a slice type of a slice that includes the video block, or sps _ ibc _ enabled _ flag.
4. The method of claim 1, wherein the rule specifies that the indication is included in the codec representation when at least one of the following conditions is satisfied: 1) a slice type of a slice that includes the video block is an I-slice, 2) sps _ ibc _ enabled _ flag is set to true, 3) a height and width of the video block is less than or equal to N, where N is an integer; or 4) the video block is coded using an Intra Block Copy (IBC) mode.
5. The method of claim 4, wherein the at least one condition comprises: the slice type is an I-slice, the sps _ ibc _ enabled _ flag is set to true, or the height and the width of the video block are less than or equal to N, where N is an integer.
6. The method of claim 4, wherein the at least one condition comprises: the video block is coded using an Intra Block Copy (IBC) mode; or the height and the width of the video block are less than or equal to N, where N is an integer.
7. The method of any one of claims 4-6, wherein N-64.
8. The method of claim 4, wherein the at least one condition does not include a condition corresponding to a sps _ palette _ enabled _ flag.
9. A video processing method, comprising:
for a conversion between a video block of a video and a codec representation of the video, determining applicability of a particular codec tool to the video block as a first color component of the video based on whether the particular codec tool applies to one or more corresponding video blocks as a second color component of the video; and
performing the conversion based on the determination.
10. The method of claim 9, wherein the determining determines to disable the particular coding tool if the particular coding tool applies to the one or more corresponding video blocks.
11. The method of claim 9, wherein the determining determines to disable the particular coding tool if the particular coding tool is different from a coding tool applied to the one or more corresponding video blocks.
12. The method of claim 9, wherein a message is selectively signaled indicating applicability of the particular coding tool to the video block based on whether the particular coding tool applies to the one or more corresponding video blocks.
13. The method of claim 9, wherein the determining determines to disable the particular coding tool if the particular coding tool applies to the one or more corresponding video blocks and does not require signaling.
14. The method of claim 9, wherein the particular codec tool is applied differently to different color components of the video.
15. The method of claim 14, wherein signaling how to apply the particular codec tool to luma and chroma components of the video.
16. The method of any of claims 9-15, wherein the first color component is a chroma color component and the second color component is a luma color component.
17. The method of any of claims 9-15, wherein the first color component is a first chroma color component and the second color component is a second chroma color component.
18. The method of any of claims 9-15, wherein the first color component is a luma color component and the second color component is a chroma color component.
19. The method of any of claims 9-18, wherein the one or more corresponding video blocks cover at least one corresponding sample point of the video block.
20. The method of claim 19, wherein the position of the at least one corresponding sample point is scaled according to a color format of the video.
21. The method of claim 19 or 20, wherein the at least one corresponding sample point is located at (x0, y0), (x0+ W-1, y0+ H-1), (x0+ W/2-1, y0+ H/2-1), (x0+ W/2, y0+ H/2), (x0+ W/2, y0+ H/2-1), or (x0+ W/2-1, y0+ H/2), and wherein the upper left position of the video block is (x0, y0) and the width and height of the video block are W and H, respectively.
22. The method of any of claims 9-21, wherein the particular codec tool is one of: temporal Motion Vector Prediction (TMVP), optional temporal motion vector prediction (ATMVP), bi-directional bi-prediction (BCW) process with Coding Unit (CU) level weighting, Merge mode with motion vector difference (MMVD), or Position Dependent Prediction Combining (PDPC) process.
23. A video processing method, comprising:
determining, for a transition between a chroma video block of a video and a codec representation of the video, that a position-dependent intra prediction combining (PDPC) method for encoding and decoding the chroma video block is not allowed due to a corresponding luma block being encoded and decoded using a particular codec mode; and
performing the conversion based on the determination,
wherein the PDPC method combines neighboring samples with the prediction signal of the chroma video block to generate a refined prediction signal of the chroma video block.
24. The method of claim 23, wherein the particular codec mode corresponds to a matrix-based intra prediction (MIP) mode.
25. The method of claim 23, wherein the particular codec mode corresponds to a Multiple Reference Line (MRL) mode.
26. A video processing method, comprising:
performing a conversion between video blocks of a video and a codec representation of the video,
wherein the codec representation complies with a format rule that specifies that the codec condition satisfied by the video block includes an indication set to false indicating that an optional half-pixel interpolation filter is used instead of a default half-pixel interpolation filter.
27. The method of claim 26, wherein the default half-pixel interpolation filter is used for the converting in the event that the codec representation includes the indication set to false.
28. The method of claim 26, wherein the format rule specifies that the indication is set to false based on a precision of a motion vector in the video block.
29. The method of any of claims 26-28, wherein the format rule specifies that the indication is set to false if the video block is coded using Merge mode with motion vector difference (MMVD) and the reconstructed motion information has a motion vector component with more than half-pixel precision.
30. The method of any of claims 26-28, wherein the format rule specifies that the indication is set to false for pairs of Merge candidates having motion vector components with more than half-pixel precision.
31. The method according to any of claims 26-28, wherein the indication is set to false in case the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information having a motion vector component with higher than half-pixel precision.
32. The method according to any of claims 26-28, wherein the format rule specifies that the indication is always false.
33. The method of any of claims 26-28, wherein the format rule specifies that the indication is always set to false for the video block that is coded using Merge mode with motion vector difference (MMVD).
34. The method of any of claims 26-28, wherein the format rule specifies that the indication is always set to false for the video block that is coded using Merge mode with motion vector difference (MMVD) if a particular motion vector difference is selected for the conversion.
35. The method of claim 34, wherein the particular motion vector difference value has a precision of 1/4 pixels.
36. The method of claim 26 or 27, wherein the format rule specifies that the indication is always set to false for a pair-wise Merge candidate.
37. The method according to claim 26 or 27, wherein the format rule specifies that the indication is always set to false in case the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information.
38. The method according to any of claims 32-37, wherein the format rule is for indicating that a bi-directional codec unit-level weighting (BCW) procedure is used.
39. The method of claim 26 or 27, wherein the format rule specifies that the indication is set to false if the video block has at least N motion vector components with greater than half-pixel precision, where N is an integer.
40. The method of claim 39, wherein the format rule specifies that the indication is set to false if the video block is coded using Merge mode with motion vector difference (MMVD) and reconstructed motion information has at least N motion vector components with greater than half-pixel precision.
41. The method of claim 39, wherein the format rule specifies that the indication is set to false for a pair of Merge candidates having at least N motion vector components with greater than half-pixel precision.
42. The method of claim 39, wherein the format rule specifies that the indication is set to false if the reconstructed bi-directionally predicted motion information is converted to uni-directionally predicted motion information having at least N motion vector components with greater than half-pixel precision.
43. The method of any of claims 39-42, wherein N depends on whether the video block is uni-directionally predicted or bi-directionally predicted.
44. The method of any of claims 39-42, wherein N is the same regardless of whether the video block is uni-directionally predicted or bi-directionally predicted.
45. The method of any of claims 26-44, wherein the indication set to false is stored and used for another video block.
46. A video processing method, comprising:
the conversion between video blocks of video and codec representations of the video is performed according to rules,
wherein a prediction vector of the video block is generated based on a weighted average of prediction vectors of a plurality of partitions of the video block, wherein at least one partition is angularly divided, and
wherein the rule provides for interpolating sample values at half-pixel locations using an optional half-pixel interpolation filter when determining the prediction vector.
47. The method of claim 46, wherein an indication indicating the use of the selectable half-pixel interpolation filter is inherited from Merge candidates.
48. The method of claim 46, wherein the indication indicating use of the selectable half-pixel interpolation filter is inherited from: spatial domain candidates, TMVP (temporal motion vector prediction) candidates, HMVP (history-based motion vector prediction) candidates, and/or pair candidates.
49. The method of claim 48, wherein when to inherit the indication from the pair of candidates depends on hPeIf 1 and/or hPeIf 2, wherein hPeIf 1 and hPeIf 2 are indications of a first candidate and a second candidate, respectively, among the pair of candidates.
50. The method of claim 46, wherein the indication indicating use of the selectable half-pixel interpolation filter is set to true for the video block coded using geometric prediction mode.
51. The method of claim 46, wherein the rule further specifies that the selectable half-pixel interpolation filter is to be used during motion compensation of a single partition or multiple partitions.
52. A method according to claim 46, wherein during a motion field storage procedure, one or more indications are stored for one or more motion fields indicating the use of the selectable half-pixel interpolation filter.
53. The method of claim 52, wherein the same indication of the use of the selectable half-pixel interpolation filter is stored for all 4 x 4 units of the video block.
54. The method of claim 53, wherein the same indication is set to false, (hPElfPart 1& & hPElfPart 2) or (hPElfPart 1| | -hPElfPart 2), wherein hPElfPart 1 and hPElfPart 2 correspond to indications to use the selectable half-pixel interpolation filter for first and second partitions of the video block, respectively.
55. The method of claim 52, wherein different indications of use of the selectable half-pixel interpolation filter are stored for different partitions of the video block that include 4 x 4 units.
56. The method of claim 55, wherein the different indication comprises an indication that hPelfPartX is set for some 4X 4 units in an X partition of the video block, wherein hPelfPartX corresponds to an indication that the selectable half-pixel interpolation filter is used for the X partition, and X is 1 or 2.
57. The method of claim 55, wherein the different indications comprise an indication that some 4 x 4 cells in a weighting region of the video block are set to (hPElfPart 1& & hPElfPart 2), (hPElfPart 1| | | hPElfPart 2), hPElfPart 1, or hPElfPart 2, wherein hPElfPart 1 and hPElfPart 2 correspond to indications of using the selectable half-pixel interpolation filter for first and second partitions of the video block, respectively.
58. A video processing method, comprising:
performing a conversion between a video block of a video and a codec representation of said video according to a rule,
wherein the rule specifies an indication of a codec condition to apply to a motion vector prediction candidate to be added to a Merge list based on information associated with one or more motion candidates present in the Merge list.
59. The method of claim 58, wherein the motion vector prediction candidate corresponds to a Temporal Motion Vector Prediction (TMVP) candidate used to predict motion information for a collocated video block of the video based on motion information of the video block.
60. The method of claim 58, wherein the motion vector prediction candidate corresponds to a sub-block based temporal motion vector prediction (SbTMVP) used to predict motion information for a sub-block of the video block based on motion information for a collocated video block of the video block.
61. The method of claim 58 or 59, wherein the one or more motion candidates correspond to one or more spatial Merge candidates.
62. The method of claim 59 or 60, wherein the one or more motion candidates comprise a first spatial Merge candidate, and wherein the rule specifies that the indication to use an optional half-pixel interpolation filter for the TMVP candidate or the SbTMVP candidate is set equal to an indication to use the optional half-pixel interpolation filter for the first spatial Merge candidate.
63. The method of claim 59 or 60, wherein the one or more motion candidates comprise a first spatial Merge candidate, and wherein the rule specifies that a bi-directional codec unit-level weighting (BCW) index of the TMVP candidate or the SbTMVP candidate is set equal to a BCW index of the first spatial Merge candidate.
64. The method of claim 59 or 60, wherein the one or more motion candidates comprise a first spatial domain Merge candidate which is a left spatial domain Merge candidate.
65. The method of claim 59, wherein the rule further specifies that the indication of using an optional half-pixel interpolation filter for the TMVP candidate and/or a bi-directional codec unit-level weighting (BCW) index for the TMVP candidate is dependent on the information associated with the one or more motion candidates.
66. A video processing method, comprising:
for a conversion between a video block of a video and a codec representation of the video, determining a control point motion vector for a control point of the video block that is codec using an affine Merge mode based on motion information of neighboring blocks of the video block according to a rule; and
performing the conversion based on the determination,
wherein the rule specifies that the indication to use the selectable half-pixel interpolation filter for the video block is set equal to the indication of the motion vector in the neighboring block.
67. The method of claim 66, wherein the determining determines the control point motion vector located at an upper left corner, an upper right corner, or a lower left corner of the video block based on the motion vectors in the neighboring blocks.
68. A video processing method, comprising:
performing a conversion between a video block of a video and a codec representation of said video according to a rule,
wherein the rule specifies that, if the video block is coded using an affine Merge mode inherited from a neighboring block of the video block, an indication to use a selectable half-pixel interpolation filter for the video block is inherited from the neighboring block.
69. A video processing method, comprising:
deriving motion information for video blocks of the video by examining the Merge candidates according to rules during a Merge candidate construction process; and the number of the first and second groups,
performing a conversion between the video block and a codec representation of the video,
wherein the rule specifies that two Merge candidates to be compared are considered to be not identical during the Merge candidate construction process if the indications of their use of selectable half-pixel interpolation filters are not identical.
70. The method of any of claims 1-69, wherein performing the conversion is further based on signaling in: a Decoder Parameter Set (DPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), a Video Parameter Set (VPS), a sequence header, a picture header, a slice header, or a slice group header.
71. The method of any of claims 1-70, wherein the converting comprises encoding the video into the codec representation.
72. The method of any of claims 1-70, wherein the converting comprises decoding the codec representation to generate the video.
73. A video processing apparatus comprising a processor configured to implement the method of any one or more of claims 1-72.
74. A computer-readable medium storing program code that, when executed, causes a processor to implement the method of any one or more of claims 1-72.
75. A computer readable medium storing a codec representation or bitstream generated according to any of the above methods.
CN202080064305.2A 2019-09-13 2020-09-11 Skip mode signaling Pending CN114424530A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
CN2019105825 2019-09-13
CNPCT/CN2019/105825 2019-09-13
CN2019107107 2019-09-20
CNPCT/CN2019/107107 2019-09-20
PCT/CN2020/114763 WO2021047633A1 (en) 2019-09-13 2020-09-11 Skip mode signaling

Publications (1)

Publication Number Publication Date
CN114424530A true CN114424530A (en) 2022-04-29

Family

ID=74865659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080064305.2A Pending CN114424530A (en) 2019-09-13 2020-09-11 Skip mode signaling

Country Status (2)

Country Link
CN (1) CN114424530A (en)
WO (1) WO2021047633A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012504925A (en) * 2008-10-06 2012-02-23 エルジー エレクトロニクス インコーポレイティド Video signal processing method and apparatus
KR101675118B1 (en) * 2010-01-14 2016-11-10 삼성전자 주식회사 Method and apparatus for video encoding considering order of skip and split, and method and apparatus for video decoding considering order of skip and split
US9571837B2 (en) * 2013-11-01 2017-02-14 Broadcom Corporation Color blending prevention in video coding
US10779007B2 (en) * 2017-03-23 2020-09-15 Mediatek Inc. Transform coding of video data

Also Published As

Publication number Publication date
WO2021047633A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
JP7263529B2 (en) Size selection application for decoder-side refinement tools
CN114450959A (en) Geometric partitioning mode in video coding and decoding
CN114175636B (en) Indication of adaptive loop filtering in adaptive parameter sets
CN114467308B (en) Reference picture resampling in video processing
KR20240024335A (en) Coordination method for sub-block based inter prediction
CN113475074A (en) Loop filtering in video processing
KR102627821B1 (en) Construction of motion candidate list using neighboring block information
CN114402591B (en) Derivation of collocated motion vectors
WO2021068921A1 (en) Motion vector handling in geometry partition mode
CN113196771A (en) Motion vector range based on motion vector precision
CN114270856A (en) Selective use of alternative interpolation filters in video processing
CN114097219A (en) Storage of motion information in history-based motion vector prediction tables
WO2021047633A1 (en) Skip mode signaling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination