CN113597759B

CN113597759B - Motion vector refinement in video coding and decoding

Info

Publication number: CN113597759B
Application number: CN202080020813.0A
Authority: CN
Inventors: 张凯; 张莉; 刘鸿彬; 许继征; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-03-11
Filing date: 2020-03-11
Publication date: 2022-09-13
Anticipated expiration: 2040-03-11
Also published as: WO2020182140A1; CN115633169A; CN113597759A

Abstract

Motion vector refinement in video coding and decoding is disclosed. A method of video processing, comprising: deriving, for a transition between a first block of the video and a bitstream representation of the first block of the video, an initial search point in a decoder-side motion vector refinement (DMVR) process to be applied during the transition, based on one or more Motion Vectors (MVs) and one or more offsets of a Merge candidate associated with the first block of the video; and performing the conversion based on the initial search point.

Description

Motion vector refinement in video coding and decoding

Cross Reference to Related Applications

The present application is intended to claim in time the priority and benefit of international patent application No. PCT/CN2019/077639 filed 3, 11/2019, according to applicable patent laws and/or rules under the paris convention. The entire disclosure of international patent application No. PCT/CN2019/077639 is incorporated by reference as part of the disclosure of the present application.

Technical Field

This patent document relates to video encoding and decoding techniques, devices and systems.

Background

Currently, efforts are being made to improve the performance of current video codec technologies to provide better compression rates or to provide video codec and decoding schemes that allow for lower complexity or parallel implementation. Industry experts have recently proposed several new video codec tools and are currently testing for their effectiveness.

Disclosure of Invention

Devices, systems, and methods related to digital video coding and decoding, and in particular, to management of motion vectors are described. The described method may be applied to existing Video codec standards (e.g. High Efficiency Video Coding (HEVC) or general Video codec) and future Video codec standards or Video codecs.

In one representative aspect, the disclosed technology can be used to perform a method of visual media processing. The method comprises the following steps: performing a conversion between the current video block and a bitstream representation of the current video block, wherein the conversion comprises a Decoder Motion Vector Refinement (DMVR) step for refining motion information signaled in the bitstream representation; and during the DMVR step, using at least one motion vector as a starting value for the refinement, wherein the at least one motion vector is equal to the offset of the candidate motion vector added to the set of candidate motion vectors.

In another representative aspect, the disclosed technology can be used to perform another method of visual media processing. The method comprises the following steps: performing a conversion between the current video block and a bitstream representation of the current video block, wherein the conversion includes using one or more of: a Decoder Motion Vector Refinement (DMVR) step, a bi-directional optical flow (BDOF) step, or a combined intra-inter prediction step, and wherein the coexistence of the DMVR step, the BDOF step, and the combined intra-inter prediction step is based at least on a size of the current video block.

In yet another representative aspect, the disclosed technology may be used to perform another method of visual media processing. The method comprises the following steps: performing a conversion between the current video block and a bitstream representation of the current video block, wherein the conversion comprises a Decoder Motion Vector Refinement (DMVR) step for refining the original motion information signaled in the bitstream representation, thereby producing refined motion information usable in the deblocking step; and calculating a difference between the refined motion information and the original motion information for at least a subset of the current video block.

In another representative aspect, the disclosed technology can be used to perform another method of visual media processing. The method comprises the following steps: deriving, for a transition between a first block of video and a bitstream representation of the first block of video, an initial search point in a decoder-side motion vector refinement (DMVR) process to be applied during the transition, based on one or more Motion Vectors (MVs) and one or more offsets of a Merge candidate associated with the first block of video; and performing the conversion based on the initial search point.

In another representative aspect, the disclosed technology can be used to perform another method of visual media processing. The method comprises the following steps: determining, based on a predetermined rule, that at least one of a Decoder Motion Vector Refinement (DMVR) process, a bi-directional optical flow (BDOF) process, and a combined intra-inter prediction process is disabled for a transition between a first block of video and a bitstream representation of the first block of video; and performing the conversion based on the determination.

In another representative aspect, the disclosed technology can be used to perform another method of visual media processing. The method comprises the following steps: deriving a Motion Vector (MV) associated with a first block of the video for a transition between the first block and a bitstream representation of the first block of the video, the MV being refined by applying a decoder-side motion vector refinement (DMVR) process; the conversion is performed by using the refined MV during deblocking.

In another representative aspect, the disclosed technology can be used to perform another method of visual media processing. The method comprises the following steps: calculating a MV difference (dMV) between a refined Motion Vector (MV) (rMV) and a non-refined MV (nmv) associated with each basic block of the first block for a conversion between the first block of video and a bitstream representation of the first block of video, rMV being motion vectors refined by applying a decoder-side motion vector refinement (DMVR) process, nMV being motion vectors not refined by the DMVR process; and performing the conversion by using the calculated MV differences.

Further, in a representative aspect, an apparatus in a video system is disclosed that includes a processor and a non-transitory memory having instructions thereon. The instructions, when executed by the processor, cause the processor to implement any one or more of the disclosed methods.

Furthermore, a computer program product stored on a non-transitory computer readable medium is disclosed, the computer program product comprising program code for performing any one or more of the disclosed methods.

The above and other aspects and features of the disclosed technology are described in more detail in the accompanying drawings, the description and the claims.

Drawings

Fig. 1 shows an example of building a Merge candidate list.

Fig. 2 shows an example of the positions of spatial domain candidates.

Fig. 3 shows an example of a candidate pair on which redundancy checking of the spatial domain Merge candidate is performed.

Fig. 4A and 4B illustrate examples of a location of a second Prediction Unit (PU) based on the size and shape of a current block.

Fig. 5 shows an example of motion vector scaling for temporal Merge candidates.

Fig. 6 shows an example of candidate positions of the time domain Merge candidate.

Fig. 7 shows an example of generating combined bidirectional predictive Merge candidates.

Fig. 8 shows an example of constructing motion vector prediction candidates.

Fig. 9 shows an example of motion vector scaling for spatial motion vector candidates.

Fig. 10 shows an example for deriving local illumination compensation parameters neighboring spots.

Fig. 11A and 11B show diagrams relating to a 4-parameter affine model and a 6-parameter affine model, respectively.

Fig. 12 shows an example of an affine motion vector field for each sub-block.

Fig. 13A and 13B show examples of a 4-parameter affine model and a 6-parameter affine model, respectively.

Fig. 14 shows an example of motion vector prediction for an affine inter mode of inherited affine candidates.

Fig. 15 shows an example of motion vector prediction of an affine inter mode for the constructed affine candidates.

Fig. 16A and 16B show diagrams relating to the affine Merge mode.

Fig. 17 shows an example of candidate positions of the affine Merge mode.

Fig. 18 shows an example of a merge (mmvd) mode search process with motion vector differences.

Fig. 19 shows an example of MMVD search points.

Fig. 20 shows an example of decoder-side motion video refinement (DMVR) in JEM 7.

Fig. 21 shows an example of Motion Vector Differences (MVDs) related to DMVR.

Fig. 22 shows an example illustrating the examination of the motion vector.

Fig. 23 is a block diagram of an example of a hardware platform for implementing the visual media decoding or visual media encoding techniques described in this document.

Fig. 24 shows a flow diagram of an example method for video coding.

Fig. 25 shows a flow diagram of an example method for video coding.

Fig. 26 shows a flow diagram of an example method for video coding.

Fig. 27 shows a flow diagram of an example method for video coding.

Fig. 28 shows a flow diagram of an example method for video coding.

Detailed Description

Video coding and decoding in HEVC/H.265

The video codec standard has evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T produces H.261 and H.263, ISO/IEC produces MPEG-1 and MPEG-4Visual, and these two organizations jointly produce the H.262/MPEG-2 Video and 264/MPEG-4 Advanced Video Coding (AVC) standards and the H.265/HEVC standard. Since h.262, video codec standards have been based on hybrid video codec structures, in which temporal prediction plus transform coding is utilized. In order to explore future Video coding and decoding technologies beyond HEVC, VCEG and MPEG united in 2015 to form Joint Video Exploration Team (jfet). Thereafter, JFET adopted many new methods and placed them into a reference software named Joint Exploration Model (JEM). In month 4 of 2018, a Joint Video Expert Team (jviet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) was established to address the VVC standard with a 50% bit rate reduction compared to HEVC.

Inter prediction in HEVC/H.265

Each inter-predicted PU has motion parameters of one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. The use of one of the two reference picture lists can also be signaled using inter _ pred _ idc. Motion vectors may be explicitly coded as increments relative to the predictor.

When a CU is coded in skip mode, one PU is associated with the CU and has no significant residual coefficients, no motion vector delta or reference picture index to code. A Merge mode is specified whereby the motion parameters of the current PU are obtained from neighboring PUs that include spatial and temporal candidates. The Merge mode may be applied to any inter-predicted PU, not just for the skip mode. An alternative to the Merge mode is the explicit transmission of Motion parameters, where the Motion vectors (more precisely, Motion Vector Difference (MVD) compared to Motion Vector predictors), the corresponding reference picture index and reference picture list usage of each reference picture list are explicitly signaled per PU. Such a mode is named Advanced Motion Vector Prediction (AMVP) in this disclosure.

When the signaling indicates that one of the two reference picture lists is to be used, the PU is generated from one sample block. This is called "unidirectional prediction". Unidirectional prediction applies to both P-slices and B-slices.

When the signaling indicates that two reference picture lists are to be used, the PU is generated from two blocks of samples. This is called "bi-prediction". Bi-directional prediction only applies to B slices.

The following text provides details regarding the inter prediction modes specified in HEVC. The description will start with the Merge mode.

2.1.1. Reference picture list

In HEVC, the term inter prediction is used to denote a prediction derived from data elements (e.g., sample values or motion vectors) of reference pictures other than the currently decoded picture. As in h.264/AVC, pictures can be predicted from multiple reference pictures. Reference pictures used for inter-prediction are organized in one or more reference picture lists. The reference index identifies which reference picture in the list should be used to create the prediction signal.

A single reference picture list (list 0) is used for P slices and two reference picture lists (list 0 and list 1) are used for B slices. It should be noted that the reference pictures included in the list 0/1 may be based on past and future pictures in terms of capture/display order.

2.1.2 Merge mode

2.1.2.1. derivation of candidates for Merge mode

When predicting a PU using the Merge mode, the index pointing to an entry in the Merge candidate list is parsed from the bitstream and used to retrieve motion information. The construction of this list is specified in the HEVC standard and can be summarized according to the following sequence of steps:

step 1: initial candidate derivation

Step 1.1: spatial domain candidate derivation

Step 1.2: redundancy check of spatial domain candidates

Step 1.3: time domain candidate derivation

Step 2: additional candidate insertions

Step 2.1: creating bi-directional prediction candidates

Step 2.2: inserting zero motion candidates

These steps are also schematically depicted in fig. 1. For spatial domain Merge candidate derivation, a maximum of four Merge candidates are selected from among the candidates located at five different positions. For time domain Merge candidate derivation, at most one Merge candidate is selected among the two candidates. Since the number of candidates per PU is assumed to be constant at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of Merge candidates (maxnummerge candidates) signaled in the slice header. Since the number of candidates is constant, the index of the best target candidate is encoded using Truncated Unary (TU). If the size of the CU is equal to 8, all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2N × 2N prediction unit.

Hereinafter, operations associated with the foregoing steps are described in detail.

2.1.2.2. Spatial domain candidate derivation

In the derivation of spatial domain Merge candidates, a maximum of four Merge candidates are selected from among candidates located at the positions depicted in fig. 2. The order of derivation was a1, B1, B0, a0, and B2. Position B2 is considered only when any PU of positions a1, B1, B0, a0 is unavailable (e.g., because it belongs to another stripe or slice) or intra-coded. After the candidate at position a1 is added, the addition of the remaining candidates is subjected to a redundancy check that ensures that candidates with the same motion information are excluded from the list, thereby improving the codec efficiency. To reduce the computational complexity, all possibilities are not taken into account in the redundancy check mentionedThe candidate pair of (1). Instead, only the pairs linked with arrows in fig. 3 are considered and the corresponding candidates are added to the list only if the candidates for redundancy check do not have the same motion information. Another source of repetitive motion information is the "second PU" associated with a partition other than 2 nx 2N. As an example, fig. 4 depicts a second PU for the N × 2N and 2N × N cases. When the current PU is partitioned into nx 2N, the candidate at position a1 is not considered for list construction. In fact, adding this candidate will result in two prediction units having the same motion information, which is redundant for having only one PU in the coded unit. Similarly, when the current PU is partitioned into 2 NxN, position B is not considered ₁ 。

2.1.2.3. Time domain candidate derivation

In this step, only one candidate is added to the list. In particular, in the derivation of the temporal-domain Merge candidate, the scaled motion vector is derived based on the collocated PU belonging to the picture with the smallest POC difference from the current picture within a given reference picture list. The derived reference picture list to be used for concatenating PUs is signaled explicitly in the slice header. A scaled motion vector of the temporal region Merge candidate is obtained as indicated by the dashed line in fig. 5, which is scaled from the motion vector of the collocated PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal region Merge candidate is set to zero. A practical implementation of the scaling process is described in the HEVC specification. For B slices, two motion vectors are obtained, one for reference picture list 0 and the other for reference picture list 1, and combined to form a bi-predictive Merge candidate.

As depicted in FIG. 6, in the collocated PU (Y) belonging to the reference frame, in candidate C ₀ And C ₁ The location of the time domain candidate is selected. If at position C ₀ Where the PU is unavailable, intra-coded or outside the current coding tree unit (CTU, also called LCU, maximum coding unit) row, then position C is used ₁ . Otherwise, the position is used in the derivation of the time domain Merge candidateC ₀ 。

2.1.2.4. Additional candidate insertions

In addition to the space-time Merge candidates, there are two additional types of Merge candidates: a combined bi-directional predicted Merge candidate and zero Merge candidate. The combined bidirectional predictive Merge candidate is generated by using the space-time Merge candidate. The combined bidirectional predictive Merge candidate is used for B slices only. The combined bi-directional prediction candidate is generated by combining the first reference picture list motion parameters of the initial candidate with the second reference picture list motion parameters of the other. If these two tuples provide different motion hypotheses they will form new bi-directional prediction candidates. As an example, fig. 7 depicts the case where two candidates with mvL0 and refIdxL0 or mvL1 and refIdxL1 in the original list (on the left) are used to create a combined bi-predictive Merge candidate that is added to the final list (on the right). There are many rules on combinations that are considered to generate these additional Merge candidates.

Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list and thus reach the maxnummerge capacity. These candidates have zero spatial displacement and a reference picture index that starts from zero and is incremented each time a new zero motion candidate is added to the list. Finally, no redundancy check is performed on these candidates.

2.1.3.AMVP

AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by first checking the availability of left, upper temporal neighboring PU locations, removing redundant candidates, and adding a zero vector to make the candidate list a constant length. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, the index of the best motion vector candidate is encoded using a truncated unary. In this case, the maximum value to be encoded is 2 (see fig. 8). In the following sections, details regarding the derivation process of motion vector prediction candidates are provided.

2.1.3.1. derivation of AMVP candidates

Fig. 8 summarizes the derivation of motion vector prediction candidates.

In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For spatial motion vector candidate derivation, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions as depicted in fig. 2.

For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different collocated positions. After the first list of space-time candidates is generated, the repeated motion vector candidates in the list are removed. If the number of potential candidates is greater than 2, the motion vector candidate with an in-list reference picture index greater than 1 is removed from the associated reference picture list. If the number of spatial-temporal motion vector candidates is less than two, additional zero motion vector candidates are added to the list.

2.1.3.2. Spatial motion vector candidates

In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates derived from PUs located at the positions depicted in fig. 2, which are the same as the position of the motion Merge. The derivation order to the left of the current PU is defined as A ₀ 、A ₁ And scaled A ₀ Zoom of A ₁ . The derivation order of the upper side of the current PU is defined as B ₀ 、B ₁ 、B ₂ Zoomed B ₀ Zoomed B ₁ Zoomed B ₂ . Thus, for each side, four cases may be used as motion vector candidates, two of which do not require spatial scaling and two of which use spatial scaling. These four different cases are summarized as follows:

no spatial domain scaling

- (1) identical reference Picture List and identical reference Picture indices (identical POC)

- (2) different reference picture lists but same reference pictures (same POC)

Spatial scaling

- (3) same reference picture list but different reference pictures (different POCs)

- (4) different reference Picture lists and different reference pictures (different POCs)

First check for no spatial scaling and then check for spatial scaling. Regardless of the reference picture list, spatial scaling is considered when POC is different between the reference picture of the neighboring PU and the reference picture of the current PU. If all PUs of the left side candidate are not available or intra coded, scaling of the upper side motion vectors is allowed to facilitate parallel derivation of left and upper side MV candidates. Otherwise, spatial scaling of the upper side motion vectors is not allowed.

As depicted in fig. 9, in the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner as the temporal scaling. The main difference is that the reference picture list and the index of the current PU are given as input; the actual scaling procedure is the same as that of the time domain scaling.

2.1.3.3. Temporal motion vector candidates

All processes for deriving temporal Merge candidates are the same as those for deriving spatial motion vector candidates except for reference picture index derivation (see FIG. 6). The reference picture index is signaled to the decoder.

Local illumination compensation in JEM

Local Illumination Compensation (LIC) is based on a linear model of the illumination variation, using a scaling factor a and an offset b. And it is adaptively enabled or disabled for each Codec Unit (CU) that is inter mode coded.

When LIC is applied to a CU, the parameters a and b are derived using the least squares method by using neighboring samples of the current CU and their corresponding reference samples. More specifically, as shown in fig. 10, sub-sampled (2:1 sub-sampled) neighboring samples and corresponding samples (identified by motion information of the current CU or sub-CU) of a CU in a reference picture are used.

2.2.1 derivation of prediction blocks

LIC parameters are derived and applied separately for each prediction direction. For each prediction direction, a first prediction block is generated using the decoded motion information, and then a provisional prediction block is obtained by applying the LIC model. Thereafter, a final prediction block is derived using the two provisional prediction blocks.

When a CU is coded in the Merge mode, copying LIC flags from neighboring blocks in a manner similar to the motion information copy in the Merge mode; otherwise, the LIC flag is signaled to the CU to indicate whether LIC is applicable.

When LIC is enabled for a picture, an additional CU level RD check is needed to determine if LIC is applicable to a CU. When LIC is enabled for a CU, the Mean-Removed Sum of Absolute differences (MR-SAD) and the Mean-Removed Sum of Absolute Hadamard-Transformed differences (MR-SATD) are used for integer-pel motion search and fractional-pel motion search, respectively, instead of SAD and SATD.

To reduce the coding complexity, the following coding scheme is applied in JEM:

when there is no significant illumination change between the current picture and its reference picture, LIC is disabled for the whole picture. To identify this situation, a histogram of the current picture and each reference picture of the current picture is computed at the encoder. Disabling LIC for the current picture if the histogram difference between the current picture and each reference picture of the current picture is less than a given threshold; otherwise, starting LIC for the current picture.

Inter-frame prediction method in VVC

There are several new codec tools for inter-Prediction improvement, such as Adaptive Motion Vector Difference Resolution (AMVR), affine Prediction Mode, Triangle Prediction Mode (TPM), advanced TMVP (ATMVP, also known as SbTMVP), Generalized Bi-Prediction (GBI), Bi-directional optical flow (BIO) (also known as Bi-directional optical flow (BDOF)) for signaling MVD.

2.3.1. codec block structure in VVC

In VVC, a quadtree/binary/multi-way tree (QT/BT/TT) structure is used to divide a picture into square or rectangular blocks. In addition to QT/BT/TT, independent trees (also called dual codec trees) are also used for I slices/slices in VVC. For independent trees, the codec block structure is signaled separately for the luma and chroma components.

2.3.2. Adaptive motion vector difference resolution

In HEVC, when use _ integer _ mv _ flag is equal to 0 in a slice header, a Motion Vector Difference (MVD) (between a motion vector of a PU and a prediction motion vector) is signaled in units of quarter luma samples. In VVC, locally Adaptive Motion Vector Resolution (AMVR) is introduced. In VVC, MVDs may be represented in quarter-luma samples, integer-luma samples, or four-luma samples (i.e., ¹ / ₄ pixel, 1 pixel, 4 pixels) as a unit. The MVD resolution is controlled at the Codec Unit (CU) level, and an MVD resolution flag is conditionally signaled for each CU having at least one non-zero MVD component.

For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that the quarter-luma sample MV precision is not used, another flag is signaled to indicate whether the integer-luma sample MV precision or the four-luma sample MV precision is used.

When the first MVD resolution flag of a CU is zero, or no coding is done for the CU (meaning all MVDs in the CU are zero), the CU uses the quarter-luma sample MV resolution. When a CU uses integer luma sample MV precision or four luma sample MV precision, the MVPs in the CU's AMVP candidate list are rounded to the corresponding precision.

2.3.3. Affine motion compensated prediction

In HEVC, only the translational Motion model is applied to Motion Compensation Prediction (MCP). Although in the real world there are many kinds of movements such as zoom in/out, rotation, perspective movement and other irregular movements. In VVC, a simplified affine transform motion compensated prediction is applied using a 4-parameter affine model and a 6-parameter affine model. As shown in fig. 11, the affine motion field of the block is described by two Control Point Motion Vectors (CPMVs) of the 4-parameter affine model and 3 CPMVs of the 6-parameter affine model.

The Motion Vector Field (MVF) of the block is described by the following equations using a 4-parameter affine model in equation (1) in which 4 parameters are defined as variables a, b, e, and f and a 6-parameter affine model in equation (2) in which 4 parameters are defined as variables a, b, c, d, e, and f, respectively:

wherein (mv) ^h ₀ ，mv ^h ₀ ) Is the motion vector of the upper left corner control point, and (mv) ^h ₁ ，mv ^h ₁ ) Is the motion vector of the upper right corner control point, and (mv) ^h ₂ ，mv ^h ₂ ) Is the motion vector of the lower left corner control point, all three motion vectors are called Control Point Motion Vectors (CPMV), (x, y) represent the coordinates of the representative point relative to the upper left sample point within the current block, and (mv) ^h (x，y)，mv ^v (x, y)) is a motion vector derived for a sample located at (x, y). The CP motion vectors may be signaled (as in affine AMVP mode) or derived on the fly (as in affine Merge mode). w and h are the width and height of the current block. In practice, division is performed by right-shifting and rounding operations. In the VTM, a representative point is defined as the center position of the subblock, for example, when the coordinates of the upper left corner of the subblock with respect to the upper left sampling point in the current block are (xs, ys), the coordinates of the representative point are defined as (xs +2, ys + 2). For each sub-block (i.e., 4 × 4 in VTM), the motion vector of the entire sub-block is derived using the representative point.

To further simplify the motion compensated prediction, sub-block based affine transform prediction is applied. To derive the motion vector for each M × N (in the current VVC, M and N are both set to 4) subblocks, the motion vector for the center sample point of each subblock (as shown in fig. 12) may be calculated according to equations (1) and (2) and rounded to 1/16 fractional accuracy. Then, a motion compensated interpolation filter of 1/16 pixels may be applied to generate a prediction for each sub-block with the derived motion vector. The affine mode introduces an interpolation filter of 1/16 pixels.

After MCP, the high precision motion vector of each sub-block is rounded and saved to the same precision as the standard motion vector.

2.3.3.1. Signaling of affine predictions

Similar to the translational motion model, there are also two modes for signaling side information due to affine prediction. They are AFFINE _ INTER and AFFINE _ MERGE modes.

AF _ INTER mode

For CUs with a width and height larger than 8, the AF _ INTER mode may be applied. An affine flag at the CU level is signaled in the bitstream to indicate whether AF _ INTER mode is used.

In this mode, for each reference picture list (list 0 or list 1), an affine AMVP candidate list is constructed with three types of affine motion predictors in the following order, where each candidate includes the estimated CPMV of the current block. Best CPMV found at encoder side (such as mv in fig. 15) ₀ mv ₁ mv ₂ ) The difference from the estimated CPMV is signaled. Furthermore, the index of the affine AMVP candidate from which the estimated CPMV is derived is further signaled.

1) Inherited affine motion predictor

The inspection order is similar to that of spatial MVPs in the HEVC AMVP list. First, the inherited affine motion predictor on the left is derived from the first block of the same reference picture as in the current block, which is affine coded in { a1, a0 }. Second, the inherited affine motion predictor of the upper side is derived from the first block of { B1, B0, B2} which is affine coded and has the same reference picture as in the current block. The five blocks a1, a0, B1, B0, B2 are depicted in fig. 14.

Once the neighboring blocks are found to be coded in affine mode, the CPMV of the coding unit covering the neighboring blocks is used to derive the prediction value of the CPMV of the current block. For example, if A1 was coded in a non-affine mode, while A0 was coded in a 4 parameter affine modeThe inherited affine MV predictor on the left side would be derived from a 0. In this case, the CPMV of the CU covering a0 (as in fig. 16B by

Top left CPMV and

the represented upper right CPMV) is used to derive an estimated CPMV for the current block, from

Representing the upper left (coordinates (x0, y0)), the upper right (coordinates (x1, y1)) and the lower right position (coordinates (x2, y2)) of the current block.

2) Constructed affine motion predictor

As shown in fig. 15, the constructed affine Motion predictor contains a Control-Point Motion Vector (CPMV) derived from neighboring inter-coded blocks with the same reference picture. The number of CPMVs is 2 if the current affine motion model is a 4-parameter affine, and 3 if the current affine motion model is a 6-parameter affine. Top left CPMV

Is derived from the MV at the first block in the set { a, B, C } that is inter-coded and has the same reference picture as in the current block. Upper right CPMV

Derived from MVs at the first block in the set { D, E } that are inter-coded and have the same reference picture as in the current block. Left lower CPMV

Is derived from the MV at the first block in the group F, G that is inter-coded and has the same reference picture as in the current block.

If the current affine motion model is a 4-parameter affine, only if

And

when both are established, the constructed affine motion predictor is inserted into the candidate list, i.e.,

and

CPMV used as an estimate of the upper left (coordinates (x0, y0)) and upper right (coordinates (x1, y1)) position of the current block.

If the current affine motion model is a 6-parameter affine, only if

And

are established, the constructed affine motion predictors are inserted into the candidate list, i.e.,

and

CPMVs used as estimates of the top left (coordinates (x0, y0)), top right (coordinates (x1, y1)) and bottom right (coordinates (x2, y2)) positions of the current block.

When the constructed affine motion predictor is inserted into the candidate list, no pruning process is applied.

3) Normal AMVP motion prediction

The following applies until the number of affine motion predictors reaches a maximum value.

1) If available, by setting all CPMVs equal to

To derive affine motion predictors.

2) If available, by setting all CPMVs equal to

To derive affine motion predictors.

3) If available, by setting all CPMVs equal to

To derive affine motion predictors.

4) Affine motion predictors are derived by setting all CPMVs equal to HEVC TMVP, if available.

5) Affine motion prediction values are derived by setting all CPMVs to zero MV.

It is to be noted that it is preferable that,

has been derived in the constructed affine motion predictor.

In AF _ INTER mode, when 4/6 parameter affine mode is used, 2/3 control points are used, so it is necessary to codec 2/3 MVDs for these control points, as shown in fig. 13. It is proposed to derive the MV from the mvd ₀ Predicting mvd ₁ And mvd ₂ 。

Wherein the content of the first and second substances,

mvd _i and mv ₁ The predicted motion vector, the motion vector difference, and the motion vector of the upper left pixel (i ═ 0), the upper right pixel (i ═ 1), or the lower left pixel (i ═ 2), respectively, are shown in fig. 13B. Note that addition of two motion vectors (e.g., mvA (xA, yA) and mvB (xB, yB)) is equal to separate summation of two components, i.e., newMV ═ mvA + mvB, and the two components of newMV are set to (xA + xB) and (yA + yB), respectively.

2.3.3.3.AF _ Merge mode

When a CU is applied to AF _ MERGE mode, it gets the first block coded in affine mode from the valid neighboring reconstructed blocks. And the selection order of the candidate blocks is from left, upper right, lower left to upper left as shown in fig. 16A (sequentially represented by A, B, C, D, E). For example, if the adjacent lower-left block is coded in affine mode as represented by a0 in fig. 16B, a Control Point (CP) motion vector mv containing the upper-left, upper-right, and lower-left corners of the adjacent CU/PU of block a is obtained ₀ ^N 、mv ₁ ^N And mv ₂ ^N . And based on mv ₀ ^N 、mv ₁ ^N And mv ₂ ^N Calculating the motion vector mv at top left/top right/bottom left on the current CU/PU ₀ ^C 、mv ₁ ^C And mv ₂ ^C (for 6 parameter affine models only). It should be noted that in the current VTM, the sub-block located at the upper left corner (e.g., 4 × 4 block in the VTM) stores mv0, and if the current block is affine-coded, the sub-block located at the upper right corner stores mv 1. If the current block is coded with a 6-parameter affine model, the sub-block located in the lower left corner stores mv 2; otherwise (with a 4-parameter affine model), the LB stores mv 2'. The other sub-modules store MVs for the MC.

Deriving the CPMV mv of the current CU from the simplified affine motion model in equations (1) and (2) ₀ ^C 、mv ₁ ^C And mv ₂ ^C Thereafter, the MVF of the current CU is generated. To identify whether the current CU is coded with AF _ MERGE mode, an affine flag is signaled in the bitstream when there is at least one neighboring block coded with affine mode.

The affine Merge candidate list is constructed by the following steps:

1) inserting inherited affine candidates

Inherited affine candidates refer to candidates that are derived from affine motion models of their valid neighboring affine codec blocks. The maximum two inherited affine candidates are derived from the affine motion models of the neighboring blocks and inserted into the candidate list. For the left predictor, the scan order is { A0, A1 }; for the upper prediction values, the scan order is { B0, B1, B2 }.

2) Insertion-built affine candidates

If the number of candidates in the affine Merge candidate list is less than MaxmumAffinic (e.g., 5), the constructed affine candidate is inserted into the candidate list. The constructed affine candidate refers to a candidate constructed by combining the neighboring motion information of each control point.

a) The motion information of the control points is first derived from the assigned spatial and temporal neighbors shown in fig. 17. CPk (k ═ 1, 2, 3, 4) denotes the kth control point. A0, a1, a2, B0, B1, B2, and B3 are spatial positions for predicting CPk (k ═ 1, 2, 3); t is the temporal location of the predicted CP 4.

The coordinates of the CP1, CP2, CP3, and CP4 are (0, 0), (W, 0), (H, 0), and (W, H), respectively, where W and H are the width and height of the current block.

The motion information of each control point is obtained according to the following priority order:

for CP1, the check priority is B2- > B3- > a 2. If B2 is available, then B2 is used. Otherwise, if B2 is available, B3 is used. If neither B2 nor B3 is available, then A2 is used. If all three candidates are not available, the motion information of the CP1 cannot be obtained.

For CP2, check priority B1- > B0.

For CP3, check priority a1- > a 0.

For CP4, T is used.

b) Next, affine Merge candidates are constructed using combinations of control points.

I. Motion information of three control points is required to construct a 6-parameter affine candidate. The three control points may be selected from one of the following four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combinations { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4} will be converted into 6-parameter motion models represented by top-left, top-right, and bottom-left control points.

Motion information of two control points is needed to construct a 4-parameter affine candidate. Two control points may be selected from one of two combinations ({ CP1, CP2}, { CP1, CP3 }). The two combinations will be converted into a 4-parameter motion model represented by the upper left and upper right control points.

The combination of constructed affine candidates is inserted into the candidate list in the following order:

{CP1，CP2，CP3}、{CP1，CP2，CP4}、{CP1，CP3，CP4}、{CP2，CP3，CP4}、{CP1，CP2}、{CP1，CP3}

i. for each combination, the reference index of list X of each CP is checked, and if they are all the same, the combination has a valid CPMV for list X. If the combination does not have a valid CPMV for both List 0 and List 1, the combination is marked invalid. Otherwise, it is valid and CPMV is put into the subblock Merge list.

3) Filling with zero motion vectors

If the number of candidates in the affine Merge candidate list is less than 5, a zero motion vector with a zero reference index is inserted into the candidate list until the list is full.

More specifically, for the sub-block Merge candidate list, MV is set to (0, 0) and the prediction direction is set to the 4-parameter Merge candidates from uni-prediction (for P-slices) and bi-prediction (for B-slices) of list 0.

2.3.4. Merge with motion vector difference (MMVD)

The final Motion Vector Expression (UMVE, also known as MMVD) is given. UMVE and one proposed motion vector expression method are used for skip or Merge modes.

FIG. 18 shows an example of a final vector representation (UMVE) search process. FIG. 19 illustrates an example of UMVE search points. In VVCs, UMVE reuses the same Merge candidates as those included in the conventional Merge candidate list. Among these Merge candidates, a base candidate may be selected and further extended by the proposed motion vector expression method.

UMVE provides a new Motion Vector Difference (MVD) representation method in which MVDs are represented by a starting point, a motion magnitude, and a motion direction.

This proposed technique uses the Merge candidate list as it is. But only candidates of the DEFAULT Merge TYPE (MRG _ TYPE _ DEFAULT _ N) are considered for the extension of UMVE.

The base candidate index defines a starting point. The base candidate index indicates the best candidate among the candidates in the list, as shown below.

TABLE 1 basic candidate IDX

Basic candidate IDX	0	1	2	3
					Nth MVP	1 st MVP	2 nd MVP	3 rd MVP	4 th MVP

If the number of basic candidates is equal to 1, the basic candidate IDX is not signaled.

The distance index is motion amplitude information. The distance index indicates a predefined distance from the start point information. The predefined distance is as follows:

TABLE 2 distance IDX

The direction index indicates the direction of the MVD with respect to the starting point. The direction index may represent four directions as shown below.

TABLE 3 Direction IDX

Direction IDX	00	01	10	11
					x axis	+	–	N/A	N/A
y axis	N/A	N/A	+	–

The UMVE flag is signaled immediately after the transfer of the skip flag or the Merge flag. If skip or Merge flag is true (true), the UMVE flag is parsed. If the UMVE flag is equal to 1, the UMVE syntax is parsed. But if not 1, then the AFFINE flag is resolved. If the AFFINE flag is equal to 1, it is AFFINE mode, but if not 1, the skip/Merge index is resolved to a skip/Merge mode for VTM.

No additional line buffers due to UMVE candidates are required. Since the software skip/Merge candidate is directly used as a base candidate. Using the input UMVE index, the MV supplementation is decided immediately before motion compensation. It is not necessary to reserve a long line buffer for this purpose.

Under the current general test conditions, the first or second Merge candidate in the Merge candidate list can be selected as the basic candidate.

In VVC, UMVE is also referred to as merge (mmvd) with MV difference.

2.3.5. Decoder side motion vector refinement (DMVR)

In the bi-directional prediction operation, for prediction of one block region, two prediction blocks respectively formed using Motion Vectors (MVs) of list 0 and MVs of list 1 are combined to form a single prediction signal. In the decoder-side motion vector refinement (DMVR) method, the two motion vectors of the bi-prediction are further refined.

DMVR in JEM

In the JEM design, the motion vectors are refined by a double-sided template matching process. The double-sided template matching is applied in the decoder to perform a distortion-based search between the double-sided template and reconstructed samples in the reference picture in order to obtain refined MV values without transmission of additional motion information. An example is depicted in fig. 20. The two-sided template is generated as a weighted combination (i.e., average) of the two prediction blocks from the initial MV0 of list 0 and the initial MV1 of list 1, respectively, as shown in fig. 20. The template matching operation involves computing a cost metric between the generated template and a sample region (around the initial prediction block) in the reference picture. For each of the two reference pictures, the MV that yields the smallest template cost is considered as the updated MV of the list to replace the original MV. In JEM, nine MV candidates are searched for each list. The nine MV candidates include the original MV and 8 surrounding MVs that are offset from the original MV by one luma sample in the horizontal direction or the vertical direction or both. Finally, the two new MVs (i.e., MV0 'and MV 1') as shown in fig. 20 are used to generate the final bi-directional prediction results. The Sum of Absolute Differences (SAD) is used as a cost metric. Note that when calculating the cost of a prediction block generated from one surrounding MV, the rounded MV (to integer pixels) is actually used to obtain the prediction block, rather than the actual MV.

2.3.5.2. DMVR in VVC

For DMVR in VVC, we assume MVDs mirrored between list 0 and list 1, as shown in fig. 21, and perform bilateral matching to refine the MV, i.e., find the best MVD among several MVD candidates. MVs of the two reference picture lists are denoted by MVL0(L0X, L0Y) and MVL1(L1X, L1Y). The MVDs of list 0 represented by (MvdX, MvdY) that can minimize the cost function (e.g., SAD) are defined as the optimal MVDs. For the SAD function, it is defined as the SAD between the list 0 reference blocks, which are derived using the motion vectors in the list 0 reference pictures (L0X + MvdX, L0Y + MvdY), and the list 1 reference blocks, which are derived using the motion vectors in the list 1 reference pictures (L1X-MvdX, L1Y-MvdY).

The motion vector refinement process may iterate twice. In each iteration, up to 6 MVDs (with integer pixel precision) can be checked in two steps, as shown in fig. 22. In a first step, the MVD (0, 0), (-1, 0), (0, -1), (0, 1) is examined. In a second step, one of the MVDs (-1, -1), (-1, 1), (1, -1), or (1, 1) may be selected and further examined. Assume that the function Sad (x, y) returns the Sad value of MVD (x, y). The MVD represented by (MvdX, MvdY) examined in the second step is decided as follows:

in the first iteration, the initial search point is the MV of the conventional Merge candidate, and in the second iteration, the initial search point is the MV of the conventional Merge candidate, plus the selected best MVD in the first iteration. DMVR is only applicable when one reference picture is a previous picture and the other reference picture is a subsequent picture and both reference pictures have the same picture order count distance to the current picture.

To further simplify the process of DMVR, several changes to the design in JEM have been proposed.

More specifically, the DMVR design adopted by VTM-4.0 (to be released) has the following main features:

early termination when the (0, 0) position SAD between list 0 and list 1 is less than the threshold.

Early termination when the SAD between list 0 and list 1 is zero for a certain position.

Block size of DMVR: w H > 64& & H > 8, where W and H are the width and height of the block.

Divide CUs into multiples of 16 × 16 sub-blocks of DMVR with CU size >16 × 16. If only the width or height of a CU is larger than 16, it is divided only in the vertical or horizontal direction.

The reference block size (W +7) × (H +7) (for luminance).

25 Point SAD-based integer Pixel search (i.e., (-2) refinement search Range, Single stage (stage))

DMVR based on bilinear interpolation.

Subpixel refinement based on "parametric error surface equation". This process is only performed when the minimum SAD cost is not equal to zero and the best MVD is (0, 0) in the last MV refinement iteration.

Luma/chroma MC w/reference block padding (if needed).

Refined MVs for MC and TMVP only.

2.3.5.2.1 use of DMVR

The DMVR may be enabled when the following conditions are true:

DMVR Enable flag (i.e. SPS _ DMVR _ enabled _ flag) in SPS equal to 1

TPM flags, interframe affine flags and subblock Merge flags (ATMVP or affine Merge), MMVD flags all equal to 0

The Merge flag is equal to 1

-the current block is bi-predicted and the POC distance between the current picture and the reference picture in list 1 is equal to the POC distance between the reference picture in list 0 and the current picture

-the height of the current CU is greater than or equal to 8

-number of brightness samples (CU width height) greater than or equal to 64

2.3.5.2.2. Sub-pixel refinement based on' parametric error surface equation

The method is summarized as follows:

1. the parametric error surface fit is only calculated when the center position is the best cost position in a given iteration.

2. Central position cost and cost at (-1, 0), (0, -1), (1, 0) and (0, 1) positions from the center are used to fit a 2-D parabolic error surface equation of shape

E(x,y)＝A(x-x ₀ ) ² +B(y-y ₀ ) ² +C

Wherein (x) ₀ ,y ₀ ) Corresponding to the least costly location and C corresponds to the least cost value. By solving 5 equations of 5 unknowns, (x) ₀ ,y ₀ ) Is calculated as:

x ₀ ＝(E(-1,0)-E(1,0))/(2(E(-1,0)+E(1,0)-2E(0,0)))

y ₀ ＝(E(0,-1)-E(0,1))/(2((E(0,-1)+E(0,1)-2E(0,0)))

(x ₀ ,y ₀ ) Any desired sub-pixel precision can be calculated by adjusting the precision with which the division is performed (i.e., calculating how many bits of the quotient). For 1/16 ^th Pixel accuracy, only 4 bits in the absolute value of the quotient need to be calculated, which makes it suitable for 2 divisions per CUBased on the implementation of fast shift subtraction.

3. Will calculate (x) ₀ ,y ₀ ) An integer distance refinement MV is added to obtain a sub-pixel accurate refinement increment MV.

2.3.6. Combined intra and inter prediction

Multi-hypothesis prediction is proposed, where combined intra and inter prediction is one way to generate multi-hypotheses.

When applying multi-hypothesis prediction to improve intra mode, multi-hypothesis prediction combines one intra prediction and one prediction of Merge index. In the Merge CU, when the flag is true, a flag is signaled for Merge mode to select the intra mode from the intra candidate list. For the luminance component, the intra candidate list is derived from 4 intra prediction modes including DC, planar, horizontal, and vertical modes, and the size of the intra candidate list may be 3 or 4 depending on the block shape. When the CU width is greater than twice the CU height, the horizontal mode does not include the intra-mode list, and when the CU height is greater than twice the CU width, the vertical mode is removed from the intra-mode list. One intra prediction mode selected by the intra mode index and one Merge index prediction selected by the Merge index are combined using a weighted average. For the chroma component, DM is always applied without additional signaling. The weights used for the combined prediction are described below. Equal weight will be applied when DC or planar mode is selected, or CB width or height is less than 4. For those CBs whose CB width and height are greater than or equal to 4, when the horizontal/vertical mode is selected, one CB is first divided vertically/horizontally into four equal-area regions. Each set of weights (denoted as (w _ intra) _i ，w_inter _i ) Where i is from 1 to 4 and (w _ intra) ₁ ，w_inter ₁ )＝(6，2)、(w_intra ₂ ，w_inter ₂ )＝(5，3)、(w_intra ₃ ，w_inter ₃ ) (w _ intra) and (3, 5) ₄ ，w_inter ₄ ) (2, 6)) will be applied to the corresponding region. (w _ intra) ₁ ，w_inter ₁ ) For the region closest to the reference sample point, and (w _ intra) ₄ ，w_inter ₄ ) For the region furthest from the reference sample point. Then, canTo compute a combined prediction by adding the two weighted predictions and right-shifting by 3 bits. Furthermore, intra-assumed intra prediction modes of the prediction values may be preserved for reference by subsequent neighboring CUs.

3. Disadvantages of the prior embodiment

Current DMVR may have the following problems:

the initial search point in the DMVR can only be the MV of the Merge candidate.

2. In the worst case, for bi-prediction, the decoder can sequentially do DMVR, BDOF, and combined inter-intra prediction.

3. The non-refined MV is used for spatial motion vector prediction and deblocking filter, while the refined MV is used as TMVP. Additional memory is required to store the refined MV.

4. Example techniques and embodiments

The detailed embodiments described below should be considered as examples to explain the general concept. These examples should not be construed in a narrow manner. Furthermore, these embodiments may be combined in any manner.

In the discussion that follows, SatShift (x, n) is defined as

Shift (x, n) is defined as Shift (x, n) ═ x + offset0) > > n.

In one example, offset0 and/or offset1 is set to (1< < n) > >1 or (1< < n-1). In another example, offset0 and/or offset1 are set to 0.

Clip3(min, max, x) is defined as

In the following discussion, an operation between two motion vectors means that the operation is to be applied to both components of the motion vector. For example, MV 3-MV 1+ MV2 corresponds to MV3 _x ＝MV1 _x +MV2 _x And MV3 _y ＝MV1 _y +MV2 _y 。

1. The initial search point in the proposed DMVR may be the MV of the conventional Merge candidate that adds an offset.

a. In one example, assuming that MVs of the Merge candidates of reference list 0 and reference list 1 are denoted as MV0 and MV1, respectively, the initial search points in the DMVR may be MV0+ offset0 and MV1+ offset 1.

i. In one example, offset0 and/or offset1 may be predefined. For example, offset0 ═ 4,0 and offset1 ═ 4.0;

alternatively, offset0 and offset1 may be signaled from the encoder to the decoder.

b. In one example, the offset may be signaled from the encoder to the decoder in MMVD mode.

i. For example, DMVR may be applied when the current block is coded as MMVD and/or MMVD skip mode.

When the current block is coded in MMVD and/or MMVD skip mode, the initial search point in the DMVR is set to MV with MMVD mode, which is derived as MV of the Merge candidate adding the signaled distance.

2. It is proposed that DMVR, BDOF and combined inter-intra prediction cannot all be applied.

a. In one example, DMVR, BDOF, and combined inter-intra prediction cannot all be applied when the size of the current block satisfies certain conditions. Assume that the width and height of the current block are W and H, respectively.

i. For example, when W > -T1 and H > -T2, DMVR, BDOF and combined inter-intra prediction cannot all be applied. For example, T1 ═ T2 ═ 16;

1) alternatively, when W > T1 and H > T2.

For example, when W < ═ T1 and H < ═ T2, DMVR, BDOF, and combined inter-intra prediction cannot all be applied. For example, T1 ═ T2 ═ 16;

1) alternatively, when W < T1 and H < T2.

For example, when W > -T1 or H > -T2, DMVR, BDOF and combined inter-intra prediction cannot all be applied. For example, T1 ═ T2 ═ 16;

1) alternatively, when W > T1 or H > T2.

For example, when W < ═ T1 or H < ═ T2, DMVR, BDOF, and combined inter-intra prediction cannot all be applied. For example, T1 ═ T2 ═ 16;

1) alternatively, when W < T1 or H < T2.

v. for example, when W × H > -T1, DMVR, BDOF and combined inter-intra prediction cannot all be applied. For example, T1 ═ 64.

1) Alternatively, when W H > T1.

For example, when W × H < ═ T1, DMVR, BDOF, and combined inter-intra prediction cannot all be applied. For example, T1 ═ 64.

1) Alternatively, when wxh < T1.

b. In one example, DMVR cannot be used when applying BDOF and inter-intra prediction.

c. In one example, when applying DMVR and inter-frame-intra prediction, BDOF cannot be used.

d. In one example, inter-intra prediction cannot be used when applying DMVR and BDOF.

3. Proposed MVs refined using DMVR in the deblocking process.

4. It is proposed to calculate and store the difference (denoted dMV) of the refined MV (denoted rMV) and the non-refined MV (denoted nMV) for each basic block of size w × h in the block after decoding the block. For example, w ═ h ═ 4, or w ═ h ═ 8, or w ═ h ═ 16.

Dmv is derived as dMV-rMV-nMV.

i. Alternatively, dMV ═ nMV-rMV.

b. In one example, rMV 'is calculated as rMV' ═ dMV + nMV prior to the deblocking process, and is used as the MV for the base block in the subsequent deblocking and temporal prediction processes.

i. Alternatively, rMV 'is calculated to be rMV' ═ - (dMV + nMV).

c. In one example, dvmx and dvvy may be defined in a range in the consistent bitstream. For example, dMVx and dMVy may satisfy T1x<＝dMVx<T2x and T1y<＝dMVy<T2 y. For example, T1x ═ T1y ═ -2 ^K And T2 x-T2 y-2 ^K -1, wherein K is an integer such as 3 or 4.

i. In one example, the search range in the DMVR may ensure dMV can satisfy the constraint.

in one example, T1x/T2x/T1y/T2y/K may be signaled from the encoder to the decoder.

in one example, T1x/T2x/T1y/T2y/K may depend on the search range of the DMVR.

in one example, T1x/T2x/T1y/T2y/K may depend on a standard profile and/or level and/or hierarchy.

d. In one example, dMV may be cropped.

i. For example, dvmx is set to Clip3(T1x, T2x, dvmx), and dvvy is set to Clip3(T1y, T2y, dvmx). For example, T1x ═ T1y ═ -2 ^K And T2 x-T2 y-2 ^K -1, wherein K is an integer such as 3 or 4.

e. In one example, dMV may be quantized to dMV 'and dMV' will be stored.

i. For example, dvmx 'is set to Shift (dvmx, Nx), and dvmy' is set to Shift (dvmy, Ny), e.g., Nx ═ Ny ═ 1.

1) Alternatively, dvmx 'is set to SatShift (dvmx, Nx) and dvmy' is set to SatShift (dvmy, Ny), e.g., Nx ═ Ny ═ 1.

2) In one example, Nx/Ny may be signaled from the encoder to the decoder.

3) In one example, Nx/Ny may depend on the search range of the DMVR.

4) In one example, Nx/Ny may depend on a standard profile and/or level and/or hierarchy.

in one example, dMV may be dequantized from dMV 'prior to deriving rMV'.

1) For example, dvmx '< < Nx and dvmy' < < Ny.

in one example, dMV may be clipped prior to quantization.

1) Alternatively, dMV' may be clipped after quantization.

f. In one example, instead of rMV, rMV' may be used in the motion compensation process.

5. Example embodiments of the disclosed technology

Fig. 23 is a block diagram of the video processing device 2300. The apparatus 2300 may be used to implement one or more of the methods described herein. The device 2300 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, or the like. The apparatus 2300 may include one or more processors 2302, one or more memories 2304, and video processing hardware 2306. The processor(s) 2302 may be configured to implement one or more of the methods described in this document. The memory(s) 2304 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 2306 may be used to implement some of the techniques described in this document in hardware circuitry, and may be partially or completely part of the processor 2302 (e.g., a graphics processor core GPU or other signal processing circuitry).

In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during the conversion from a pixel representation of the video to a corresponding bitstream representation, and vice versa. The bitstream representation of the current video block may, for example, correspond to bits that are collocated or dispersed at different locations within the bitstream, as defined by the syntax. For example, a macroblock may be encoded in terms of transform and codec error residual values and also using bits in the header and other fields in the bitstream.

It should be appreciated that the disclosed methods and techniques would be beneficial to video encoder and/or decoder embodiments incorporated within video processing devices, such as smart phones, laptops, desktops, and the like, by allowing the use of the techniques disclosed in this document.

Fig. 24 is a flow diagram of an example method 2400 of video processing. Method 2400 includes, at 2410, performing a conversion between a current video block and a bitstream representation of the current video block, wherein the conversion includes a Decoder Motion Vector Refinement (DMVR) step for refining motion information signaled in the bitstream representation. The method comprises, at step 2420, during the DMVR step, using at least one motion vector as a starting value for the refinement, wherein the at least one motion vector is equal to the offset of the candidate motion vector added to the set of candidate motion vectors.

Some embodiments may be described using the following clause-based format.

1.A method of visual media processing, comprising:

performing a conversion between the current video block and a bitstream representation of the current video block, wherein the conversion comprises a Decoder Motion Vector Refinement (DMVR) step for refining motion information signaled in the bitstream representation; and

during the DMVR step at least one motion vector is used as a starting value for the refinement, wherein the at least one motion vector is equal to the offset of the candidate motion vector added to the set of candidate motion vectors.

2. The method of clause 1, wherein the candidate motion vector is included in a Merge list.

3. The method of clause 1, wherein the offset added to the candidate motion vector is predefined.

4. The method of clause 1, wherein the offset added to the candidate motion vector is signaled as a parameter in the bitstream representation.

5. The method of clause 1, wherein the current video block is coded in Merge (MMVD) mode with motion vector differences.

6. The method of clause 1, wherein the current video block is coded in a merge (mmvd) skip mode with motion vector differences.

7. The method of any one or more of clauses 5-6, wherein the candidate motion vector is included in a Merge list, and the offset added to the candidate motion vector is signaled as a parameter in the bitstream representation.

8. A method of visual media processing, comprising:

performing a conversion between the current video block and a bitstream representation of the current video block, wherein the conversion includes using one or more of: a Decoder Motion Vector Refinement (DMVR) step, a bi-directional optical flow (BDOF) step, or a combined intra-inter prediction step, and wherein the coexistence of the DMVR step, the BDOF step, and the combined intra-inter prediction step is based at least on a size of the current video block.

9. The method of clause 8, further comprising:

in response to determining that the width of the current video block is greater than or equal to a first threshold and/or the height of the current video block is greater than or equal to a second threshold, disabling the concurrent use of the DMVR, BDOF, and combined intra-inter prediction steps.

10. The method of clause 8, further comprising:

in response to determining that the width of the current video block is less than or equal to a first threshold and/or the height of the current video block is less than or equal to a second threshold, disabling the concurrent use of the DMVR, BDOF, and combined intra-inter prediction steps.

11. The method of any one or more of clauses 9-10, wherein the first threshold and the second threshold are both sixteen (16).

12. The method of clause 8, further comprising:

in response to determining that a product of a width of the current video block and a height of the current video block is greater than or equal to a first threshold, disabling a coexistence use of the DMVR step, the BDOF step and the combined intra-inter prediction step.

13. The method of clause 8, further comprising:

in response to determining that a product of a width of the current video block and a height of the current video block is less than or equal to a first threshold, disabling a coexistence use of the DMVR step, the BDOF step and the combined intra-inter prediction step.

14. The method of any one or more of clauses 12-13, wherein the first threshold is sixty-four (64).

15. The method of any one or more of clauses 8-14, further comprising:

enabling coexistence use of any two of: DMVR step, BDOF step or combined intra-inter prediction step.

16. A method of visual media processing, comprising:

performing a conversion between the current video block and a bitstream representation of the current video block, wherein the conversion comprises a Decoder Motion Vector Refinement (DMVR) step for refining the original motion information signaled in the bitstream representation, thereby producing refined motion information usable in the deblocking step; and

the difference between the refined motion information and the original motion information is calculated for at least a subset of the blocks of the current video block.

17. The method of clause 8, wherein the size of the subset block is 4 x 4, 8 x 8, or 16 x 16.

18. The method of any one or more of clauses 16-17, wherein calculating the difference comprises subtracting the refined motion information from the original motion information.

19. The method of any one or more of clauses 16-17, wherein calculating the difference comprises subtracting the original motion information from the refined motion information.

20. The method of clause 16, wherein, prior to the deblocking step, the first motion information is calculated as a sum of (i) a difference of the refined motion information and (ii) the original motion information, wherein the first motion information is usable in the deblocking step.

21. The method of clause 20, wherein the first motion information is multiplied by negative one.

22. The method of clause 16, wherein the difference of the refined motion information and the original motion information is a difference vector, wherein an X-component of the difference vector is greater than an X-lower bound and/or less than an X-upper bound.

23. The method of clause 16, wherein the difference of the refined motion information and the original motion information is a difference vector, wherein a Y component of the difference vector is greater than a Y lower bound and/or less than a Y upper bound.

24. The method of any one or more of clauses 22-23, wherein the upper x bound, the lower x bound, the upper y bound, and the lower y bound are signaled in a bitstream representation.

25. The method of any one or more of clauses 22-23, wherein the search range for the DMVR step is defined in such a way that: such that (i) the X-component of the difference vector is greater than the lower X-bound and/or less than the upper X-bound and (ii) the Y-component of the difference vector is greater than the lower Y-bound and/or less than the upper Y-bound.

26. The method of any one or more of clauses 22-23, wherein the difference vector is clipped in such a way that: such that the X component of the difference vector is clipped according to the function Clip3 (lower X bound, upper X bound, X component of the difference vector) and the Y component of the difference vector is clipped according to the function Clip3 (lower Y bound, upper Y bound, Y component of the difference vector), where Clip3(min, max, X) is defined as

27. The method of any one or more of clauses 22-23, wherein the difference vector is quantized.

28. The method of clause 27, wherein the X component of the difference vector is quantized according to a function Shift (X component of the difference vector, first value) and the Y component of the difference vector is quantized according to a function Shift (Y component of the difference vector, second value), wherein

Shift(x,n)＝(x+offset0)>>n

Where offset0 is (1< < n) > >1, the first and second values are scalars.

29. The method of clause 27, wherein the X component of the difference vector is quantized according to a function Shift (X component of the difference vector, first value) and the Y component of the difference vector is quantized according to a function Shift (Y component of the difference vector, second value), wherein

Shift(x,n)＝(x+offset0)>>n

Where offset0 is (1< < (n-1)), the first and second values are scalars.

30. The method of clause 27, wherein the X-component of the difference vector is quantized according to a function Shift (X-component of the difference vector, first value) and the Y-component of the difference vector is quantized according to a function Shift (Y-component of the difference vector, second value), wherein

Shift(x,n)＝(x+offset0)>>n

Where offset0 is 0, the first and second values are scalar.

31. The method of clause 27, wherein the X-component of the difference vector is quantized according to a function SatShift (X-component of the difference vector, first value) and the Y-component of the difference vector is quantized according to a function SatShift (Y-component of the difference vector, second value), wherein

Wherein the

offsets

0, 1 are set to (1< < n) > >1 or (1< < (n-1)), and the first value and the second value are scalars.

32. The method of clause 27, wherein the X-component of the difference vector is quantized according to a function SatShift (the X-component of the difference vector, the first value) and the Y-component of the difference vector is quantized according to a function SatShift (the Y-component of the difference vector, the second value), wherein

Wherein offset0, offset1 are both set to 0, and the first and second values are scalars.

33. The method of any one or more of clauses 28-33, wherein the first value and the second value are signaled in a bitstream representation.

34. The method of any one or more of clauses 28-33, wherein the first value and the second value are associated with a search range of a DMVR step.

35. The method of any one or more of clauses 28-33, wherein the first value and the second value are associated with profile information or tier information of the current video block.

36. The method of any one or more of clauses 1-35, wherein the visual media processing is an encoder-side implementation.

37. The method of any one or more of clauses 1-35, wherein the visual media processing is a decoder-side implementation.

38. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any one or more of clauses 1-37.

39. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method according to any one or more of clauses 1 to 37.

Fig. 25 is a flow diagram of an example method 2500 of video processing. The method 2500 includes, at 2502, deriving, for a transition between a first block of video and a bitstream representation of the first block of video, an initial search point in a decoder-side motion vector refinement (DMVR) process to be applied during the transition, based on one or more Motion Vectors (MVs) and one or more offsets of a Merge candidate associated with the first block of video; and at 2504, the conversion is performed based on the initial search point.

In some examples, the initial search point is derived as one or more MVs of the Merge candidate that add the offset.

In some examples, when the one or more MVs of the Merge candidate include a first MV (MV0) of reference list 0 and a second MV (MV1) of reference list 1, the initial search point is derived as MV0+ offset0 and MV1+ offset1, offset0 is an offset corresponding to the first MV (MV0), and offset1 is an offset corresponding to the second MV (MV 1).

In some examples, offset0 and/or offset1 are predefined.

In some examples, offset0 is (4,0) and offset1 is (-4.0).

In some examples, offset0 and/or offset1 are signaled from the encoder to the decoder.

In some examples, the offset is signaled from the encoder to the decoder in merge (mmvd) mode with motion vector differences.

In some examples, the DMVR process is applied when the first block is coded into MMVD and/or MMVD skip mode.

In some examples, when the first block is coded in MMVD and/or MMVD skip mode, the initial search point in the DMVR process is set to an MV with MMVD mode, which is derived as the MV of the Merge candidate that adds the signaled offset.

Fig. 26 is a flow diagram of an example method 2600 of video processing. Method 2600 includes, at 2602, determining, based on a predetermined rule, that at least one of a Decoder Motion Vector Refinement (DMVR) process, a bi-directional optical flow (BDOF) process, and a combined intra-inter prediction process is disabled for the transition between the first block of the video and the bitstream representation of the first block of the video; and at 2604, performing the conversion based on the determination.

In some examples, at least one of the DMVR process, the BDOF process, and the combined intra-inter prediction process is disabled when a size of the first block, including at least one of the width W, the height H, or the WxH, satisfies one or more conditions.

In some examples, when W > -T1 and H > -T2, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 and T2 being integers.

In some examples, when W > T1 and H > T2, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 and T2 are integers.

In some examples, when W < ═ T1 and H < ═ T2, at least one of the DMVR process, the BDOF process, and the combined intra-inter prediction process is disabled, T1 and T2 are integers.

In some examples, when W < T1 and H < T2, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 and T2 are integers.

In some examples, when W > -T1 or H > -T2, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 and T2 being integers.

In some examples, when W > T1 or H > T2, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 and T2 are integers.

In some examples, when W < ═ T1 or H < ═ T2, at least one of the DMVR process, the BDOF process, and the combined intra-inter prediction process is disabled, T1 and T2 are integers.

In some examples, when W < T1 or H < T2, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 and T2 are integers.

In some examples, T1-T2-16.

In some examples, T1-T2-8.

In some examples, T1-T2-128.

In some examples, when W × H > -T1, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 being an integer.

In some examples, when wxh > T1, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 being an integer.

In some examples, when wxh < ═ T1, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 is an integer.

In some examples, when wxh < T1, at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled, T1 being an integer.

In some examples, T1 ═ 64.

In some examples, T1 ═ 128.

In some examples, the DMVR process is disabled when applying the BDOF process and the inter-intra prediction process.

In some examples, the DMVR process is disabled when the inter-intra prediction process is applied.

In some examples, the BDOF process is disabled when applying the DMVR process and the inter-intra prediction process.

In some examples, the BDOF process is disabled when the inter-intra prediction process is applied.

In some examples, the inter-intra prediction process is disabled when applying the DMVR and BDOF processes.

Fig. 27 is a flow diagram of an example method 2700 of video processing. The method 2700 includes, at 2702, deriving a Motion Vector (MV) associated with the first block for a conversion between the first block of the video and a bitstream representation of the first block of the video, the MV being refined by applying a decoder-side motion vector refinement (DMVR) process; and at 2704, the transformation is performed by using the refined MVs during the deblocking process.

Fig. 28 is a flow diagram of an example method 2800 of video processing. The method 2800 includes, at 2802, calculating, for a conversion between a first block of video and a bitstream representation of the first block of video, a MV difference (dMV) between a refined Motion Vector (MV) (rMV) and an unrefined MV (nmv) associated with each basic block of the first block, rMV being a motion vector refined by applying a decoder-side motion vector refinement (DMVR) process, nMV being a motion vector not refined by the DMVR process; and at 2804, performing the conversion by using the calculated MV differences.

In some examples, a basic block has a width w and a height h, where w-h-4, or w-h-8, or w-h-16.

In some examples, the MV difference dMV is derived as dMV-rMV-nMV or dMV-nMV-rMV.

In some examples, the refined MV (rMV ') before the deblocking process is calculated to be rMV' ═ dMV + nMV, and is to be used as the MV of the basic block in the subsequent deblocking process and temporal prediction process.

In some examples, the refined MV (rMV ') before the deblocking process is calculated to be rMV' ═ - (dMV + nMV), and is to be used as the MV of the basic block in the subsequent deblocking process and temporal prediction process.

In some examples, the MV difference dMV has a horizontal component (dvmx) and a vertical component (dvmy), which are in range in the consistent bitstream.

In some examples, dMVx and dMVy satisfy T1x < ═ dMVx < ═ T2x and T1y < ═ dMVy < ═ T2y, with T1x, T2x, T1y, and T2y being integers.

In some examples, T1x ═ T1y ═ -2 ^K And T2 x-T2 y-2 ^K -1, wherein K is an integer.

In some examples, K is 3 or 4.

In some examples, the search range in the DMVR process ensures that the MV difference dMV can satisfy the constraint.

In some examples, one or more of T1x, T2x, T1y, T2y, and K are signaled from the encoder to the decoder.

In some examples, one or more of T1x, T2x, T1y, T2y, and K depend on the search range of the DMVR process.

In some examples, one or more of T1x, T2x, T1y, T2y, and K depend on a standard profile and/or level and/or hierarchy.

In some examples, the motion vector difference dMV is clipped with a function Clip3(Min, Max, x), which is defined as Clip3(Min, Max, x)

In some examples, the horizontal component dvcx of dMV is set to Clip3(T1x, T2x, dvdx), and the vertical component dvdy of dMV is set to Clip3(T1y, T2y, dvdx).

In some examples, K is 3 or 4.

In some examples, the MV difference dMV is stored after the conversion.

In some examples, the MV difference dMV is quantized to dMV 'and dMV' is stored.

In some examples, the horizontal component dMVx 'of dMV' is set to Shift (dMVx, Nx) and the vertical component dMVy 'of dMV' is set to Shift (dMVy, Ny), Nx and Ny being integers, where Shift (x, n) is defined as:

Shift(x,n)＝(x+offset0)>>n，

where offset0 is set to (1< < n) > >1 or (1< < (n-1)), or offset0 is set to 0.

In some examples, the horizontal component dMVx 'of dMV' is set to SatShift (dMVx, Nx) and the vertical component dMVy 'of dMV' is set to SatShift (dMVy, Ny), Nx and Ny being integers, where SatShift (x, n) is defined as:

where offset0 and/or offset1 is set to (1< < n) > >1 or (1< < (n-1)), or offset0 and/or offset1 is set to 0.

In some examples, Nx ═ Ny ═ 1.

In some examples, Nx and/or Ny are signaled from an encoder to a decoder.

In some examples, Nx and/or Ny depend on the search range of the DMVR process.

In some examples, Nx and/or Ny depend on a standard profile and/or level and/or hierarchy.

In some examples, the MV difference dMV is dequantized from dMV 'before being derived rMV'.

In some examples, dvmx ═ dvmx '< < Nx, and dvmy ═ dvmy' < < Ny.

In some examples, the MV differences dMV are clipped prior to quantization.

In some examples, the MV differences dMV are clipped after quantization.

In some examples, the refined MV rMV' before the deblocking process is used in the motion compensation process.

In some examples, the conversion generates a first block of the video from the bitstream representation.

In some examples, the conversion generates a bitstream representation from a first block of the video.

The disclosed and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a combination of substances that affect a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (Field Programmable Gate Array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or potential claims, but rather as descriptions of features specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only some embodiments and examples are described and other embodiments, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1.A method of video processing, comprising:

deriving an initial search point in a decoder-side motion vector refinement (DMVR) process to be applied during a transition between a first block of a video and a bitstream of the first block of the video based on one or more Motion Vectors (MVs) and one or more offsets of Merge candidates associated with the first block of the video; and

the conversion is performed based on the initial search point.

2. The method of claim 1, wherein the initial search point is derived as one or more MVs of the Merge candidate that add an offset.

3. The method of claim 1, wherein, when the one or more MVs of the Merge candidate include a first MV (MV0) referencing reference list 0 and a second MV (MV1) referencing reference list 1, the initial search point is derived as MV0+ offset0 and MV1+ offset1, offset0 is an offset corresponding to the first MV (MV0), and offset1 is an offset corresponding to the second MV (MV 1).

4. The method according to claim 3, wherein offset0 and/or offset1 are predefined.

5. The method of claim 4, wherein offset0 ═ 4,0 and offset1 ═ 4, 0.

6. The method of claim 3, wherein offset0 and/or offset1 is signaled from an encoder to a decoder.

7. The method of claim 1, wherein the offset is signaled from an encoder to a decoder in Merge (MMVD) mode with motion vector difference.

8. The method of claim 7, wherein the DMVR process is applied when the first block is coded as MMVD and/or MMVD skip mode.

9. The method of claim 8, wherein when the first block is coded in MMVD and/or MMVD skip mode, an initial search point in the DMVR process is set to MV with MMVD mode, which is derived as MV of the Merge candidate adding the signaled offset.

10. The method of claim 1, further comprising:

determining, based on a predetermined rule, that at least one of a Decoder Motion Vector Refinement (DMVR) process, a bi-directional optical flow (BDOF) process, and a combined intra-inter prediction process is disabled for the transition between the first block of the video and a bitstream of the first block of the video; and

performing the conversion based on the determination.

11. The method of claim 10, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when a size of the first block, including at least one of a width W, a height H, or WxH, satisfies one or more conditions.

12. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W > -T1 and H > -T2, T1 and T2 being integers.

13. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W > T1 and H > T2, T1 and T2 being integers.

14. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W < ═ T1 and H < ═ T2, T1 and T2 are integers.

15. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W < T1 and H < T2, T1 and T2 being integers.

16. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W > -T1 or H > -T2, T1 and T2 being integers.

17. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W > T1 or H > T2, T1 and T2 being integers.

18. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W < ═ T1 or H < ═ T2, T1 and T2 are integers.

19. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W < T1 or H < T2, T1 and T2 being integers.

20. The method of claim 12, wherein T1-T2-16.

21. The method of claim 12, wherein T1-T2-8.

22. The method of claim 12, wherein T1-T2-128.

23. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when W × H > -T1, T1 being an integer.

24. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when wxh > T1, T1 being an integer.

25. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when wxh < ═ T1, T1 being an integer.

26. The method of claim 11, wherein at least one of the DMVR, BDOF, and combined intra-inter prediction processes is disabled when wxh < T1, T1 being an integer.

27. The method of claim 21, wherein T1 ═ 64.

28. The method of claim 21, wherein T1 ═ 128.

29. The method of claim 10, wherein the DMVR process is disabled when the BDOF process and the intra-inter prediction process are applied.

30. The method of claim 10, wherein the DMVR process is disabled when the intra-inter prediction process is applied.

31. The method of claim 10, wherein the BDOF process is disabled when the DMVR process and the intra-inter prediction process are applied.

32. The method of claim 10, wherein the BDOF process is disabled when the intra-inter prediction process is applied.

33. The method of claim 10, wherein the intra-inter prediction process is disabled when the DMVR and BDOF processes are applied.

34. The method of claim 1, further comprising:

deriving a Motion Vector (MV) associated with a first block of the video for the transition between the first block and a bitstream of the first block of the video, the MV being refined by applying the decoder-side motion vector refinement (DMVR) process; and

the conversion is performed by using the refined MVs during deblocking.

35. The method of claim 1, further comprising:

calculating an MV difference (dMV) between a refined Motion Vector (MV) (rMV) and a non-refined MV (nMV) associated with each basic block of the first block for the conversion between the first block of the video and a bitstream of the first block of the video, rMV being motion vectors refined by applying the decoder-side motion vector refinement (DMVR) process, nMV being motion vectors not refined by the DMVR process; and

the conversion is performed by using the calculated MV differences.

36. The method of claim 35, wherein the basic block has a width w and a height h, wherein w-h-4, or w-h-8, or w-h-16.

37. The method of claim 35 wherein the MV difference dMV is derived as dMV-rMV-nMV or dMV-nMV-rMV.

38. The method of claim 35, wherein the refined MV (rMV ') before the deblocking process is calculated as rMV' ═ dMV + nMV, and is to be used as the MV of the basic block in the subsequent deblocking process and temporal prediction process.

39. The method of claim 35, wherein the refined MV (rMV ') before the deblocking process is calculated as rMV' ═ - (dMV + nMV), and is used as the MV of the basic block in the subsequent deblocking process and temporal prediction process.

40. The method of claim 35 wherein the MV difference dMV has a horizontal component (dvmx) and a vertical component (dvmy), dvmx and dvmy being in range in a consistent bitstream.

41. The method of claim 40, wherein dMVx and dMVy satisfy T1x < ═ dMVx < (T2 x and T1y < ═ dMVy < (T2 y), T1x, T2x, T1y, and T2y are integers.

42. The method of claim 41, wherein T1 x-T1 y-2 ^K And T2 x-T2 y-2 ^K -1, wherein K is an integer.

43. The method of claim 42, wherein K is 3 or 4.

44. The method of claim 39, wherein a search range in the DMVR process ensures that the MV differences dMV can satisfy a constraint.

45. The method of claim 40, wherein one or more of T1x, T2x, T1y, T2y, and K are signaled from an encoder to a decoder.

46. The method of claim 40, wherein one or more of T1x, T2x, T1y, T2y, and K depend on a search range of the DMVR process.

47. The method of claim 40, wherein one or more of T1x, T2x, T1y, T2y, and K are dependent on a standard profile and/or level and/or hierarchy.

48. The method of claim 40, wherein the motion vector difference dMV is clipped using a function Clip3(Min, Max, x), the function Clip3(Min, Max, x) being defined as

49. The method of claim 48, wherein a horizontal component dMVx of dMV is set to Clip3(T1x, T2x, dMVx), and a vertical component dMVy of dMV is set to Clip3(T1y, T2y, dMVx).

50. The method of claim 49, wherein T1 x-T1 y-2 ^K And T2 x-T2 y-2 ^K -1, wherein K is an integer.

51. The method of claim 50, wherein K is 3 or 4.

52. The method of claim 35 wherein the MV difference dMV is stored after the converting.

53. The method of claim 35 wherein the MV difference dMV is quantized to dMV 'and dMV' is stored.

54. The method of claim 53, wherein the horizontal component dMVx 'of dMV' is set to Shift (dMVx, Nx) and the vertical component dMVy 'of dMV' is set to Shift (dMVy, Ny), Nx and Ny being integers, where Shift (x, n) is defined as:

Shift(x,n)＝(x+offset0)>>n，

where offset0 is set to (1< < n) > >1 or (1< < (n-1)), or offset0 is set to 0.

55. The method of claim 53, wherein the horizontal component dMVx 'of dMV' is set to SatShift (dMVx, Nx) and the vertical component dMVy 'of dMV' is set to SatShift (dMVy, Ny), Nx and Ny being integers, wherein SatShift (x, n) is defined as:

wherein offset0 and/or offset1 is set to (1< < n) > >1 or (1< < (n-1)), or offset0 and/or offset1 is set to 0.

56. The method of claim 54, wherein Nx ═ Ny ═ 1.

57. The method of claim 54, wherein Nx and/or Ny are signaled from an encoder to a decoder.

58. The method of claim 54, wherein Nx and/or Ny depend on a search range of the DMVR process.

59. The method of claim 54, wherein Nx and/or Ny depend on a standard profile and/or level and/or hierarchy.

60. The method of claim 53, wherein the MV differences dMV are inverse quantized from dMV 'prior to deriving rMV'.

61. The method of claim 60, wherein dMVx ═ dMVx '< < Nx, and dMVy ═ dMVy' < < Ny.

62. The method of claim 53, wherein the MV differences dMV are clipped prior to quantization.

63. The method of claim 53, wherein the MV differences dMV are clipped after quantization.

64. The method of claim 39 wherein the refined MV rMV' prior to the deblocking process is used in a motion compensation process.

65. The method of any of claims 1-64, wherein the converting generates a first block of the video from the bitstream.

66. The method of any of claims 1-64, wherein the converting generates the bitstream from a first block of the video.

67. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-66.

68. A non-transitory computer readable medium having stored thereon computer program code which, when executed by a processor, implements the method of any of claims 1-66.