WO2023222016A1

WO2023222016A1 - Method and apparatus for complexity reduction of video coding using merge with mvd mode

Info

Publication number: WO2023222016A1
Application number: PCT/CN2023/094702
Authority: WO
Inventors: Shih-Chun Chiu; Chih-Wei Hsu; Ching-Yeh Chen; Tzu-Der Chuang; Yu-Wen Huang
Original assignee: Mediatek Inc.
Priority date: 2022-05-17
Filing date: 2023-05-17
Publication date: 2023-11-23
Also published as: TW202410696A

Abstract

A method and apparatus for video coding using MMVD mode. According to this method, two or more base MVs are determined. Whether the base MVs comprise B similar MVs is checked. If the base MVs comprise B similar MVs, a target base MV is determined from the B similar MVs. A set of MVD candidates are determined by adding offsets to the target base MV. Top MVD candidates are selected from the set of MVD candidates reordered according to TM (Template Matching) cost associated with the set of MVD candidates. The top MVD candidates are divided into B modified MVD candidate sets, where each of the B modified MVD candidate sets is associated with one of the B similar MVs. The current block is then encoded or decoded by using motion information comprising the B modified MVD candidate sets.

Description

METHOD AND APPARATUS FOR COMPLEXITY REDUCTION OF VIDEO CODING USING MERGE WITH MVD MODE

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/342,642, filed on May 17, 2022. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to video coding system using MMVD (Merge mode Motion Vector Difference) coding tool. In particular, the present invention relates to the complexity reduction associated with MMVD.

BACKGROUND

Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.

Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing. For Intra Prediction, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based of the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.

As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.

The decoder, as shown in Fig. 1B, can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.

According to VVC, an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC. Each CTU can be partitioned into one or multiple smaller size coding units (CUs) . The resulting CU partitions can be in square or rectangular shapes. Also, VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.

The VVC standard incorporates various new coding tools to further improve the coding efficiency over the HEVC standard. Among various new coding tools, some coding tools relevant to the present invention are reviewed as follows. For example, Merge with MVD Mode (MMVD) technique re-uses the same merge candidates as those in VVC and a selected candidate can be further expanded by a motion vector expression method. It is desirable to develop techniques to reduce the complexity of MMVD.

BRIEF SUMMARY OF THE INVENTION

A method and apparatus for video coding using MMVD (Merge with MVD (Motion Vector Difference) ) mode are disclosed. According to the method, input data associated with a current block are received, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side. Two or more base MVs (Motion Vectors) are determined. If said two or more base MVs comprise B similar MVs (B ≥2) , the TM (Template Matching) based reordering in the MMVD mode is performed. The TM based reordering process comprises: determining a target base MV (Motion Vector) from the B similar MVs; determining a set of MVD (Motion Vector Difference) candidates by adding offsets to the target base MV; selecting top MVD candidates from the set of MVD candidates reordered according to TM (Template Matching) cost associated with the set of MVD candidates; dividing the top MVD candidates into B modified MVD candidate sets, wherein each of the B modified MVD candidate sets is associated with one of the B similar MVs; and encoding or decoding the current block by using motion information comprising the B modified MVD candidate sets.

In one embodiment, the B is equal to 2 and the top MVD candidates correspond to 2*K MVD candidates, and wherein K is a positive integer. In one embodiment, K MVD candidates of the 2*K MVD candidates are assigned to a first similar MV and remaining K MVD candidates of the 2*K MVD candidates are assigned to a second similar MV. In one embodiment, the K MVD candidates of the 2*K MVD candidates correspond to top K MVD candidates of the 2*K MVD candidates. In another embodiment, the K MVD candidates of the 2*K MVD candidates correspond to odd-numbered or even-numbered K MVD candidates of the 2*K MVD candidates.

In one embodiment, the B is equal to 3 or larger and the top MVD candidates correspond to B*K MVD candidates, and wherein K is a positive integer. In one embodiment, K MVD candidates of the B*K MVD candidates are assigned to each of the B similar MVs. In one embodiment, the B*K MVD candidates are evenly divided into B groups from beginning to end, and each group of the K MVD candidates corresponds to consecutive MVD candidates of the B*K MVD candidates. In another embodiment, the B*K MVD candidates are evenly divided into B groups in an interlaced fashion, and each group of the K MVD candidates corresponds to every B-th MVD candidates of the B*K MVD candidates.

In one embodiment, when difference between two base MVs is smaller than a threshold, the two base MVs are counted as similar MVs. In one embodiment, the threshold is adaptively changed based on low-delay picture condition. In another embodiment, the threshold is adaptively changed based on MV length associated with said two or more base MVs.

In one embodiment, whether to enable TM based reordering in the MMVD mode is controlled by one or more high-level flags. In one embodiment, said one or more high-level flags are shared with a DMVD (Decoder-Side Motion-Vector Derivation) flag. In one embodiment, the TM based reordering is enabled in the MMVD mode if and only if the MMVD mode is enabled and the DMVD flag is enabled. In one embodiment, an additional high-level flag is used to control whether to enable the TM based reordering in the MMVD mode. In one embodiment, the TM based reordering is enabled in the MMVD mode if and only if the MMVD mode is enabled and the additional high-level is enabled. In one embodiment, an additional high-level flag and a DMVD flag are used to control whether to enable the TM based reordering in the MMVD mode. In one embodiment, the TM based reordering is enabled in the MMVD mode if and only if the MMVD mode is enabled, the DMVD flag is enabled and the additional high-level is enabled.

In one embodiment, the target base MV is equal to one of the B similar MVs. In another embodiment, the target base MV is equal to a mid-point of the B similar MVs.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.

Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.

Fig. 2 illustrates an example of CPR (Current Picture Referencing) compensation, where blocks are predicted by corresponding blocks in the same picture.

Fig. 3 illustrates an example of MMVD (Merge mode Motion Vector Difference) search process, where a current block in the current frame is processed by bi-direction prediction using a L0 reference frame and a L1 reference frame.

Fig. 4 illustrates the offset distances in the horizontal and vertical directions for a L0 reference block and L1 reference block according to MMVD.

Fig. 5 illustrates an example of merge mode candidate derivation from spatial and temporal neighbouring blocks.

Fig. 6 illustrates a flowchart of an exemplary video coding system that utilizes TM based reordering in the MMVD mode according to an embodiment of the present invention.

Fig. 7 illustrates a flowchart of an exemplary video coding system that utilizes one or more high-level syntaxes to control whether to enable TM based reordering in the MMVD mode according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.

Current Picture Referencing

Motion Compensation, one of the key technologies in hybrid video coding, explores the pixel correlation between adjacent pictures. It is generally assumed that, in a video sequence, the patterns corresponding to objects or background in a frame are displaced to form corresponding objects in the subsequent frame or correlated with other patterns within the current frame. With the estimation of such displacement (e.g. using block matching techniques) , the pattern can be mostly reproduced without the need to re-code the pattern. Similarly, block matching and copy has also been tried to allow selecting the reference block from the same picture as the current block. It was observed to be inefficient when applying this concept to camera captured videos. Part of the reasons is that the textual pattern in a spatial neighbouring area may be similar to the current coding block, but usually with some gradual changes over the space. It is difficult for a block to find an exact match within the same picture in a video captured by a camera. Accordingly, the improvement in coding performance is limited.

However, the situation for spatial correlation among pixels within the same picture is different for screen contents. For a typical video with texts and graphics, there are usually repetitive patterns within the same picture. Hence, intra (picture) block compensation has been observed to be very effective. A new prediction mode, i.e., the intra block copy (IBC) mode or called current picture referencing (CPR) , has been introduced for screen content coding to utilize this characteristic. In the CPR mode, a prediction unit (PU) is predicted from a previously reconstructed block within the same picture. Further, a displacement vector (called block vector or BV) is used to indicate the relative displacement from the position of the current block to that of the reference block. The prediction errors are then coded using transformation, quantization and entropy coding. An example of CPR compensation is illustrated in Fig. 2, where block 212 is a corresponding block for block 210, and block 222 is a corresponding block for block 220. In this technique, the reference samples correspond to the reconstructed samples of the current decoded picture prior to in-loop filter operations, both deblocking and sample adaptive offset (SAO) filters in HEVC.

The very first version of CPR was proposed in JCTVC-M0350 (Budagavi et al., AHG8: Video coding using Intra motion compensation, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 13th Meeting: Incheon, KR, 18–26 Apr. 2013, Document: JCTVC-M0350) to the HEVC Range Extensions (RExt) development. In this version, the CPR compensation was limited to be within a small local area, with only 1-D block vector and only for block size of 2Nx2N. Later, a more advanced CPR design has been developed during the standardization of HEVC SCC (Screen Content Coding) .

When CPR is used, only part of the current picture can be used as the reference picture. A few bitstream conformance constraints are imposed to regulate the valid MV value referring to the current picture. First, one of the following two must be true:
BV_x + offsetX + nPbSw + xPbs –xCbs <= 0 (1)
BV_y + offsetY + nPbSh + yPbs –yCbs <= 0 (2)

Second, the following WPP condition must be true:
(xPbs + BV_x + offsetX + nPbSw -1 ) /CtbSizeY –xCbs /CtbSizeY <=
yCbs /CtbSizeY - (yPbs + BV_y + offsetY + nPbSh -1 ) /CtbSizeY (3)

In equations (1) through (3) , (BV_x, BV_y) is the luma block vector (the motion vector for CPR) for the current PU; nPbSw and nPbSh are the width and height of the current PU; (xPbS, yPbs) is the location of the top-left pixel of the current PU relative to the current picture; (xCbs, yCbs) is the location of the top-left pixel of the current CU relative to the current picture; and CtbSizeY is the size of the CTU. OffsetX and offsetY are two adjusted offsets in two dimensions in consideration of chroma sample interpolation for the CPR mode.
offsetX = BVC_x &0x7 ? 2 : 0 (4)
offsetY = BVC_y &0x7 ? 2 : 0 (5)

(BVC_x, BVC_y) is the chroma block vector, in 1/8-pel resolution in HEVC.

Third, the reference block for CPR must be within the same tile/slice boundary.

Merge with MVD Mode (MMVD) technique

The MMVD technique is proposed in JVECT-J0024. MMVD is used for either skip or merge modes with a proposed motion vector expression method. MMVD re-uses the same merge candidates as those in VVC. Among the merge candidates, a candidate can be selected, and is further expanded by the proposed motion vector expression method. MMVD provides a new motion vector expression with simplified signalling. The expression method includes prediction direction information, starting point (also referred as a base in this disclosure) , motion magnitude (also referred as a distance in this disclosure) , and motion direction. Fig. 3 illustrates an example of MMVD search process, where a current block 312 in the current frame 310 is processed by bi-direction prediction using a L0 reference frame 320 and a L1 reference frame 330. A pixel location 350 is projected to pixel location 352 in L0 reference frame 320 and pixel location 354 in L1 reference frame 330. According to the MMVD search process, updated locations will be searched by adding offsets in selected directions. For example, the updated locations correspond to locations along line 342 or 344 in the horizontal direction with distances to at s, 2s or 3s.

This proposed technique uses a merge candidate list as is. However, only candidates which are default merge type (i.e., MRG_TYPE_DEFAULT_N) are considered for MMVD’s expansion. Prediction direction information indicates a prediction direction among L0, L1, and L0 and L1 predictions. In B slice, the proposed method can generate bi-prediction candidates from merge candidates with uni-prediction by using mirroring technique. For example, if a merge candidate is uni-prediction with L1, a reference index of L0 is decided by searching a reference picture in list 0, which is mirrored with the reference picture for list 1. If there is no corresponding picture, the nearest reference picture to the current picture is used. L0’ MV is derived by scaling L1’s MV and the scaling factor is calculated by POC distance.

In MMVD, after a merge candidate is selected, it is further expanded or refined by the signalled MVDs information. The further information includes a merge candidate flag, an index to specify motion magnitude, and an index for indication of the motion direction. In MMVD mode, one of the first two candidates in the merge list is selected to be used as an MV basis. The MMVD candidate flag is signalled to specify which one is used between the first and second merge candidates. The initial MVs (i.e., merge candidates) selected from the merge candidate list are also referred as bases or base MVs in this disclosure. After searching the set of locations, a selected MV candidate is referred as an expanded MV candidate in this disclosure.

If the prediction direction of the MMVD candidate is the same as one of the original merge candidate, the index with value 0 is signalled as the MMVD prediction direction. Otherwise, the index with value 1 is signalled. After sending first bit, the remaining prediction direction is signalled based on the pre-defined priority order of MMVD prediction direction. Priority order is L0/L1 prediction, L0 prediction and L1 prediction. If the prediction direction of merge candidate is L1, signalling ‘0’ indicates MMVD’ prediction direction as L1. Signalling ‘10’ indicates MMVD’ prediction direction as L0 and L1. Signalling ‘11’ indicates MMVD’ prediction direction as L0. If L0 and L1 prediction lists are same, MMVD’s prediction direction information is not signalled.

Base candidate index, as shown in Table 1, defines the starting point. Base candidate index indicates the best candidate among candidates in the list as follows.

Table 1. Base candidate IDX

Distance index specifies motion magnitude information and indicates the pre-defined offset from the starting points (412 and 422) for a L0 reference block 410 and L1 reference block 420 as shown in Fig. 4. In Fig. 4, an offset is added to either the horizontal component or the vertical component of the starting MV, where small circles in different styles correspond to different offsets from the centre. The relation between the distance index and pre-defined offset is specified in Table 2.

Table 2. Distance IDX

Direction index represents the direction of the MVD relative to the starting point. The direction index can represent of the four directions as shown below. Direction index represents the direction of the MVD relative to the starting point. The direction index can represent the four directions as shown in Table 3. It is noted that the meaning of MVD sign could be variant according to the information of starting MVs. When the starting MVs are an un-prediction MV or bi-prediction MVs with both lists pointing to the same side of the current picture (i.e. POCs of two references both larger than the POC of the current picture, or both smaller than the POC of the current picture) , the sign in Table 3 specifies the sign of the MV offset added to the starting MV. When the starting MVs are bi-prediction MVs with the two MVs pointing to the different sides of the current picture (i.e. the POC of one reference larger than the POC of the current picture, and the POC of the other reference smaller than the POC of the current picture) , and the difference of POC in list 0 is greater than the one in list 1, the sign in Table 3 specifies the sign of MV offset added to the list0 MV component of the starting MV and the sign for the list1 MV has an opposite value. Otherwise, if the difference of POC in list 1 is greater than list 0, the sign in Table 3 specifies the sign of the MV offset added to the list1 MV component of starting MV and the sign for the list0 MV has an opposite value.

Table 3. Direction IDX

To reduce the encoder complexity, block restriction is applied. If either width or height of a CU is less than 4, MMVD is not performed.

Multi-Hypothesis Prediction (MH) Technique

Multi-hypothesis prediction is proposed to improve the existing prediction modes in inter pictures, including uni-prediction of advanced motion vector prediction (AMVP) mode, skip and merge mode, and intra mode. The general concept is to combine an existing prediction mode with an extra merge indexed prediction. The merge indexed prediction is performed in a manner the same as that for the regular merge mode, where a merge index is signalled to acquire motion information for the motion compensated prediction. The final prediction is the weighted average of the merge indexed prediction and the prediction generated by the existing prediction mode, where different weights are applied depending on the combinations. Detail information can be found in JVET-K1030 (Chih-Wei Hsu, et al., Description of Core Experiment 10: Combined and multi-hypothesis prediction, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 11th Meeting: Ljubljana, SI, 10–18 July 2018, Document: JVET-K1030) , or JVET-L0100 (Man-Shu Chiang, et al., CE10.1.1: Multi-hypothesis prediction for improving AMVP mode, skip or merge mode, and intra mode, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 12th Meeting: Macao, CN, 3–12 Oct. 2018, Document: JVET-L0100) .

Pairwise Averaged Merge Candidates

Pairwise average candidates are generated by averaging predefined pairs of candidates in the current merge candidate list, and the predefined pairs are defined as { (0, 1) , (0, 2) , (1, 2) , (0, 3) , (1, 3) , (2, 3) } , where the numbers denote the merge indices to the merge candidate list. The averaged motion vectors are calculated separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures; if only one motion vector is available, use the one directly; if no motion vector is available, treat this list as invalid.

Merge Mode

To increase the coding efficiency of motion vector (MV) coding in HEVC, HEVC has the Skip, and Merge mode. Skip and Merge modes obtains the motion information from spatially neighbouring blocks (spatial candidates) or a temporal co-located block (temporal candidate) . When a PU is Skip or Merge mode, no motion information is coded, instead, only the index of the selected candidate is coded. For Skip mode, the residual signal is forced to be zero and not coded. In HEVC, if a particular block is encoded as Skip or Merge, a candidate index is signalled to indicate which candidate among the candidate set is used for merging. Each merged PU reuses the MV, prediction direction, and reference picture index of the selected candidate.

For Merge mode in HM-4.0 in HEVC, as shown in Fig. 5, up to four spatial MV candidates are derived from A₀, A₁, B₀ and B₁, and one temporal MV candidate is derived from T_BR or T_CTR (T_BR is used first, if T_BR is not available, T_CTR is used instead) . Note that if any of the four spatial MV candidates is not available, the position B₂ is then used to derive another MV candidate as a replacement. After the derivation process of the four spatial MV candidates and one temporal MV candidate, removing redundancy (pruning) is applied to remove redundant MV candidates. If after removing redundancy (pruning) , the number of available MV candidates is smaller than five, three types of additional candidates are derived and added to the candidate set (candidate list) . The encoder selects one final candidate within the candidate set for Skip or Merge modes based on the rate-distortion optimization (RDO) decision, and transmits the index to the decoder.

Hereafter, we will denote the skip and merge mode as “merge mode” . In other words, when the “merge mode” is mentioned in the following specification, the “merge mode” may refer to both skip and merge modes.

MVD Candidate Redundancy Reduction in MMVD with Template Matching

In some design of MMVD with template matching (TM) , for each base, it selects K MVD candidates from a total number of S*D combinations of S steps and D directions. The selection is based on the TM cost. In other words, the set of MVD candidates are reordered according to the TM cost associated with the set of MVD candidates and the first K MVD candidates (also called top K MVD candidates) are selected. However, if the two base MVs are close to each other, the selected MVD candidates may be similar as well, which may be redundant. The following proposed methods can reduce such redundancy by considering the difference between the base MVs and adaptively changing MVD candidates for each base MV.

In one method, if two base MVs are similar, 2*K MVD candidates (e.g. {mvd₀, mvd₁, …, mvd_K-1, mvd_K, …, mvd_2K-1} ) are selected using one base MV according to the TM cost, where k is a positive integer. The first K selected MVD candidates (e.g. {mvd₀, mvd₁, …, mvd_K-1} ) are for one base MV, while the rest K selected MVD candidates (e.g. {mvd_K, mvd_K+1, …, mvd_2K-1} ) are for the other base MV. For example, a syntax can be used to indicate which base MV is used (i.e., base 0 or base 1) . After this process, two similar base MVs are guaranteed to have totally different MVD candidates (i.e., candidate set {mvd₀, mvd₁, …, mvd_K-1} being very different from candidate set {mvd_K, mvd_K+1, …, mvd_2K-1} ) .

In another method, if two base MVs are similar, 2*K MVD candidates (e.g. {mvd₀, mvd₁, …, mvd_K-1, mvd_K, …, mvd_2K-1} ) are selected using one base MV according to the TM cost, where K is a positive integer. Denote these 2*K MVD candidates by mvd₀, mvd₁, mvd₂, …, mvd_2K-1. The even-indexed K MVD candidates (i.e., mvd₀, mvd₂, mvd₄, …, mvd_2K-2) are for one base MV, while the rest K candidates (i.e., mvd₁, mvd₃, mvd₅, …, mvd_2K-1) are for the other base MV. Again, after this process, two similar base MVs are guaranteed to have totally different MVD candidates (i.e., candidate set {mvd₀, mvd₂, mvd₄, …, mvd_2K-2} being very different from candidate set {mvd₁, mvd₃, mvd₅, …, mvd_2K-1} ) .

The foregoing proposed methods can be extended to multiple base MVs. For example, there are B similar base MVs, where B is equal to 2 or larger than 2. A set of MVD candidates can be derived based on one base MV (referred as a target base MV) and the target base MV is derived from the B similar base MVs. The set of MVD candidates are reordered based on the TM cost associated with the set of MVD candidates. Then top MVD candidates are selected from the reordered set of MVD candidates and assigned to the B similar base MVs according to embodiment of the present invention. For example, in one embodiment, we select B*K MVD candidates (i.e., the top MVD candidates) using one base MV (referred as a target base MV) according to the TM cost for all base MVs instead of selecting K MVD candidates using each base MV individually. In one method, the base MV (i.e., the target base MV) used for MVD candidate selection is always the base MV with the same index (e.g. always use base MV 0) . In another method, the base used for MVD candidate selection is the merged base MV, where the merged base MV is derived by using the B similar base MVs (e.g. the midpoint of the B base MVs) . Furthermore, the merged base MV can be used to replace all the B similar base MVs in MMVD.

The top MVD candidates can be assigned to the B similar base MVs in various manner. In the case of B equal to 2, the top MVD candidates are divided into a first half (i.e., the first K MVD candidates of the top MVD candidates) and a second half (i.e., the remaining K MVD candidates of the top MVD candidates) in one example. In another example, the top MVD candidates are divided into interleaved groups (e.g. a first group consisting of even-numbered MVD candidates and a second group consisting of odd-numbered MVD candidates) . The similar concept can be extended to the case for B equal to 3 or larger than 3. For example, the top MVD candidates selected may consist of B*K MVD candidates. The first K MVD candidates of the top MVD candidates are assigned to the first base MV of the B similar base MVs, the second K MVD candidates of the top MVD candidates are assigned to the second base MV of the B similar base MVs, the third K MVD candidates of the top MVD candidates are assigned to the third base MV of the B similar base MVs, and so on. In this example, the K MVD candidates assigned to each of the B similar base MVs are consecutive MVD candidates in the top MVD candidates. In another example, the top MVD candidates are divided into B groups in an interleaved fashion. For example, {mvd₀, mvd_B, mvd_2B, …, mvd_(B-1) K} are assigned to the first base MV of the B similar base MVs, {mvd₁, mvd_(1+B), mvd_(1+2B), …, mvd_{(1+ (B-1) K)}} are assigned to the second base MV of the B similar base MVs, {mvd₂, mvd_(2+B), mvd_(2+2B), …, mvd_{(2+ (B-1) K)}} are assigned to the third base MV of the B similar base MVs, and so on. In this case, each group includes every B-th MVD candidates of the top MVD candidates (i.e., {mvd_i, mvd_(i+B), mvd_(i+2B), …, mvd_{(i+ (B-1) K)}} , i=0, … (B-1) ) .

For checking whether the two base MVs are similar or not, the motion vector difference between the two base MVs are calculated. In one method, the two base MVs are considered similar if and only if the difference is smaller than a pre-defined constant threshold. In another method, the threshold is adaptively changed based on low-delay picture condition. In another method, the threshold is adaptively changed based on base MV length.

For the flexibility of coding tools, TM-based re-ordering in MMVD can be controlled by one or multiple high-level flags. In one embodiment, the control flag is shared with decoder-side motion-vector derivation (DMVD) flag, which controls all DMVD-related tools. In other words, TM-based re-ordering is enabled in the MMVD mode if and only if MMVD is enabled and DMVD is enabled. In another embodiment, one additional high-level flag is added to control the TM-based re-ordering in MMVD. In other words, TM-based re-ordering is enabled in MMVD if and only if MMVD is enabled and the additional flag is enabled. The additional flag being enabled refers to the case that the additional flag indicates TM-based re-ordering being enabled. In another embodiment, one additional high-level flag is added to control the TM-based re-ordering in the MMVD mode accompanied with the existing DMVD flag. In other words, TM-based re-ordering is enabled in the MMVD mode if and only if MMVD is enabled, DMVD is enabled (as indicated by the DMVD flag) , and the additional flag is enabled.

Any of the MMVD methods described above can be implemented in encoders and/or decoders. For example, the process of any of the proposed methods (e.g. determining whether any base MVs being similar, performing TM-based reordering for MMVD candidates, determining the target base MV, etc. ) can be implemented in an inter coding module of an encoder (e.g. Inter Pred. 112 in Fig. 1A) , a motion compensation module (e.g., MC 152 in Fig. 1B) , a merge candidate derivation module in the encoder or the decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder. While the Inter-Pred. 112 and MC 152 are shown as individual processing units to support the MMVD methods, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .

Fig. 6 illustrates a flowchart of another exemplary video coding system that utilizes modified search location for MMVD according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, input data associated with a current block are received in step 610, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or prediction residual data associated with the current block to be decoded at a decoder side. Two or more base MVs (Motion Vectors) are determined in step 620. Whether said two or more base MVs comprise B similar MVs is checked in step 630, B is an integer equal to or greater than 2. If said two or more base MVs comprise B similar MVs, steps 640 to 680 are performed. Otherwise (i.e., said two or more base MVs do not comprise B similar MVs) , steps 640 to 680 are skipped. In step 640, a target base MV (Motion Vector) is determined from the B similar MVs. In step 650, a set of MVD (Motion Vector Difference) candidates are determined by adding offsets to the target base MV. In step 660, top MVD candidates are selected from the set of MVD candidates reordered according to TM (Template Matching) cost associated with the set of MVD candidates. In step 670, the top MVD candidates are divided into B modified MVD candidate sets, wherein each of the B modified MVD candidate sets is associated with one of the B similar MVs. In step 680, the current block is encoded or decoded by using motion information comprising the B modified MVD candidate sets.

Fig. 7 illustrates a flowchart of an exemplary video coding system that utilizes one or more high-level syntaxes to control whether to enable TM based reordering in the MMVD mode according to an embodiment of the present invention. According to this method, input data associated with a current block are received in step 710, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or prediction residual data associated with the current block to be decoded at a decoder side. One or more base MVs (Motion Vectors) are determined in step 720. Whether said one or more high-level flags indicates TM (Template Matching) based reordering in the MMVD mode is enabled for the current block in step 740. If said one or more high-level flags indicates TM (Template Matching) based reordering in the MMVD mode is enabled for the current block (i.e., the “Yes” path) , steps 750 to 780 are performed. Otherwise (i.e., the “No” path) , steps 750 to 780 are skipped. In step 750, a set of MVD (Motion Vector Difference) candidates are determined by adding offsets to said one or more base MVs. In step 760, the set of MVD candidates is reordered according to TM (Template Matching) cost associated with the set of MVD candidates to form a set of reordered MVD candidates. In step 770, one final MVD candidate is selected from the set of reordered MVD candidates or a subset of reordered MVD candidates. In step 780, the current block is encoded or decoded by using motion information comprising the final MVD candidate.

The flowcharts shown are intended to illustrate examples of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method of video coding using MMVD (Merge with MVD (Motion Vector Difference) ) mode, the method comprising:

receiving input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side;

determining two or more base MVs (Motion Vectors) ; and

in response to said two or more base MVs comprising B similar MVs, performing TM (Template Matching) based reordering in the MMVD mode, wherein B is an integer equal to or greater than 2 and said TM based reordering in the MMVD mode comprises:

determining a target base MV (Motion Vector) from the B similar MVs;

determining a set of MVD (Motion Vector Difference) candidates by adding offsets to the target base MV;

selecting top MVD candidates from the set of MVD candidates reordered according to TM (Template Matching) cost associated with the set of MVD candidates;

dividing the top MVD candidates into B modified MVD candidate sets, wherein each of the B modified MVD candidate sets is associated with one of the B similar MVs; and

encoding or decoding the current block by using motion information comprising the B modified MVD candidate sets.
The method of Claim 1, wherein the B is equal to 2 and the top MVD candidates correspond to 2*K MVD candidates, and wherein K is a positive integer.
The method of Claim 2, wherein K MVD candidates of the 2*K MVD candidates are assigned to a first similar MV and remaining K MVD candidates of the 2*K MVD candidates are assigned to a second similar MV.
The method of Claim 3, wherein the K MVD candidates of the 2*K MVD candidates correspond to top K MVD candidates of the 2*K MVD candidates.
The method of Claim 3, wherein the K MVD candidates of the 2*K MVD candidates correspond to odd-numbered or even-numbered K MVD candidates of the 2*K MVD candidates.
The method of Claim 1, wherein the B is equal to 3 or larger and the top MVD candidates correspond to B*K MVD candidates, and wherein K is a positive integer.
The method of Claim 6, wherein K MVD candidates of the B*K MVD candidates are assigned to each of the B similar MVs.
The method of Claim 7, wherein the B*K MVD candidates are evenly divided into B groups from beginning to end, and each group of the K MVD candidates corresponds to consecutive MVD candidates of the B*K MVD candidates.
The method of Claim 7, wherein the B*K MVD candidates are evenly divided into B groups in an interleaved fashion, and each group of the K MVD candidates corresponds to every B-th MVD candidates of the B*K MVD candidates.
The method of Claim 1, wherein when difference between two base MVs is smaller than a threshold, the two base MVs are counted as similar MVs.
The method of Claim 10, wherein the threshold is adaptively changed based on low-delay picture condition.
The method of Claim 10, wherein the threshold is adaptively changed based on MV length associated with said two or more base MVs.
The method of Claim 1, wherein the target base MV is equal to one of the B similar MVs.
The method of Claim 1, wherein the target base MV is equal to a mid-point of the B similar MVs.
An apparatus for video coding using MMVD (Merge with MVD (Motion Vector Difference) ) mode, the apparatus comprising one or more electronics or processors arranged to:

receive input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or prediction residual data associated with the current block to be decoded at a decoder side;

determine two or more base MVs (Motion Vectors) ; and

in response to said two or more base MVs comprising B similar MVs, perform TM (Template Matching) based reordering in the MMVD mode, wherein B is an integer equal to or greater than 2 and TM based reordering process in the MMVD mode comprises:

determine a target base MV (Motion Vector) from the B similar MVs;

determining a set of MVD (Motion Vector Difference) candidates by adding offsets to the target base MV;

select top MVD candidates from the set of MVD candidates reordered according to TM (Template Matching) cost associated with the set of MVD candidates;

divide the top MVD candidates into B modified MVD candidate sets, wherein each of the B modified MVD candidate sets is associated with one of the B similar MVs; and

encode or decode the current block by using motion information comprising the B modified MVD candidate sets.
A method of video coding using MMVD (Merge with MVD (Motion Vector Difference) ) mode, the method comprising:

receiving input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side;

determining one or more base MVs (Motion Vectors) ;

signalling or parsing one or more high-level flags; and

in response to said one or more high-level flags indicating TM (Template Matching) based reordering in the MMVD mode is enabled for the current block, performing the TM-based reordering in the MMVD mode, wherein the TM-based reordering comprises:

determining a set of MVD (Motion Vector Difference) candidates by adding offsets to said one or more base MVs;

reordering the set of MVD candidates according to TM (Template Matching) cost associated with the set of MVD candidates to form a set of reordered MVD candidates;

selecting one final MVD candidate from the set of reordered MVD candidates or a subset of reordered MVD candidates; and

encoding or decoding the current block by using motion information comprising the final MVD candidate.
The method of Claim 16, wherein said one or more high-level flags are shared with a DMVD (Decoder-Side Motion-Vector Derivation) flag.
The method of Claim 17, wherein the TM based reordering is enabled in the MMVD mode if and only if the MMVD mode is enabled and the DMVD flag is enabled.
The method of Claim 16, wherein an additional high-level flag is used to control whether to enable the TM based reordering in the MMVD mode.
The method of Claim 19, wherein the TM based reordering is enabled in the MMVD mode if and only if the MMVD mode is enabled and the additional high-level flag is enabled.
The method of Claim 16, wherein an additional high-level flag and a DMVD (Decoder-Side Motion-Vector Derivation) flag are used to control whether to enable the TM based reordering in the MMVD mode.
The method of Claim 21, wherein the TM based reordering is enabled in the MMVD mode if and only if the MMVD mode is enabled, the DMVD flag is enabled, and the additional high-level flag is enabled.
An apparatus for video coding using MMVD (Merge with MVD (Motion Vector Difference) ) mode, the apparatus comprising one or more electronics or processors arranged to:

receive input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side;

determine one or more base MVs (Motion Vectors) ;

signal or parse one or more high-level flags; and

in response to said one or more high-level flags indicating TM (Template Matching) based reordering in the MMVD mode is enabled for the current block, perform the TM-based reordering in the MMVD mode, wherein the TM-based reordering comprises:

determine a set of MVD (Motion Vector Difference) candidates by adding offsets to said one or more base MVs;

reorder the set of MVD candidates according to TM (Template Matching) cost associated with the set of MVD candidates to form a set of reordered MVD candidates;

select one final MVD candidate from the set of reordered MVD candidates or a subset of reordered MVD candidates; and

encode or decode the current block by using motion information comprising the final MVD candidate.