WO2023143325A1 - Procédé et appareil de codage vidéo utilisant un mode fusion avec mvd - Google Patents
Procédé et appareil de codage vidéo utilisant un mode fusion avec mvd Download PDFInfo
- Publication number
- WO2023143325A1 WO2023143325A1 PCT/CN2023/072978 CN2023072978W WO2023143325A1 WO 2023143325 A1 WO2023143325 A1 WO 2023143325A1 CN 2023072978 W CN2023072978 W CN 2023072978W WO 2023143325 A1 WO2023143325 A1 WO 2023143325A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- merge
- search
- base
- mvs
- modified
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 239000013598 vector Substances 0.000 claims description 28
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000013139 quantization Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 238000009795 derivation Methods 0.000 description 4
- 230000011664 signaling Effects 0.000 description 4
- VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/57—Motion estimation characterised by a search window with variable size or shape
Definitions
- the present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/304,010 filed on January 28, 2022.
- the U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.
- the present invention relates to video coding system using MMVD (Merge mode Motion Vector Difference) coding tool.
- MMVD Merge mode Motion Vector Difference
- the present invention relates to the design of search locations to enhance the performance associated with MMVD.
- VVC Versatile video coding
- JVET Joint Video Experts Team
- MPEG ISO/IEC Moving Picture Experts Group
- ISO/IEC 23090-3 2021
- Information technology -Coded representation of immersive media -Part 3 Versatile video coding, published Feb. 2021.
- VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
- HEVC High Efficiency Video Coding
- Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
- Intra Prediction the prediction data is derived based on previously coded video data in the current picture.
- Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based of the result of ME to provide prediction data derived from other picture (s) and motion data.
- Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues.
- the prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120.
- T Transform
- Q Quantization
- the transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data.
- the bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area.
- the side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well.
- the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues.
- the residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data.
- the reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
- incoming video data undergoes a series of processing in the encoding system.
- the reconstructed video data from REC 128 may be subject to various impairments due to a series of processing.
- in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality.
- deblocking filter (DF) may be used.
- SAO Sample Adaptive Offset
- ALF Adaptive Loop Filter
- the loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream.
- DF deblocking filter
- SAO Sample Adaptive Offset
- ALF Adaptive Loop Filter
- Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134.
- the system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
- HEVC High Efficiency Video Coding
- the decoder can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126.
- the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) .
- the Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140.
- the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
- an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC.
- CTUs Coding Tree Units
- Each CTU can be partitioned into one or multiple smaller size coding units (CUs) .
- the resulting CU partitions can be in square or rectangular shapes.
- VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.
- the VVC standard incorporates various new coding tools to further improve the coding efficiency over the HEVC standard.
- various new coding tools some coding tools relevant to the present invention are reviewed as follows. For example, Merge with MVD Mode (MMVD) technique re-uses the same merge candidates as those in VVC and a selected candidate can be further expanded by a motion vector expression method. It is desirable to develop techniques to further improve MMVD.
- MMVD Merge with MVD Mode
- a method and apparatus for video coding using MMVD (Merge with MVD (Motion Vector Difference) ) mode are disclosed.
- input data associated with a current block are received, where the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side.
- Two or more base merge MVs (Motion Vectors) from a merge list are determined for the current block.
- a modified expanded merge candidate is determined for at least one of said two or more base merge MVs using a modified set of search locations if said at least one of said two or more base merge MVs is close to another base MV (Motion Vector) of said two or more base merge MVs, where at least one search location is different between a nominal set of search locations and the modified set of search locations, and where the nominal set of search locations comprises one or more defined directions at a set of nominal distances around a target base merge MV.
- the current block is encoded or decoded using motion information comprising the modified expanded merge candidate.
- said one or more defined directions correspond to a horizontal direction, a vertical direction, or both.
- the modified set of search locations comprise modified search locations in a non-horizontal and non-vertical direction.
- the modified set of search locations comprise modified search locations having at least one distance different from the set of nominal distances.
- the modified set of search locations correspond to a set of modified distances normalized from the set of nominal distances according to a length of said at least one of said two or more base merge MVs.
- the modified set of search locations comprise modified search locations in a non-horizontal and non-vertical direction and having at least one distance different from the set of nominal distances.
- a common base merge MV is derived from the B base merge MVs and the modified set of search locations are applied to the common base merge MV, and wherein B is an integer greater than 1.
- the modified set of search locations comprise at least one direction in addition to the horizontal direction and the vertical direction.
- the modified set of search locations comprise B sets of search directions.
- the common base merge MV corresponds to a mid-point of said B base merge MVs.
- the common base merge MV corresponds to one of said B base merge MVs having a smallest base index.
- search directions for the second base merge MV is dependent on the first base merge MV.
- the modified set of search locations for the second base merge MV comprise at least one non-horizontal and non-vertical search direction pointing away from the first base merge MV.
- the modified set of search locations for the second base merge MV comprise two non-horizontal and non-vertical search directions, one horizontal search direction and one vertical search direction, all pointing away from the first base merge MV.
- the modified set of search locations for the first base merge MV use modified search directions parallel to and perpendicular to a line respectively, and wherein the line connects the first base merge MV and the second base merge MV.
- the modified set of search locations for the second base merge MV use rotated search directions, wherein the rotated search directions are formed by rotating the modified search directions.
- an offset is added to one of two base merge MVs to generate a new base merge MV so that the distance is large enough.
- one of two base merge MVs is replaced by another base merge MV.
- Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
- Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
- Fig. 2 illustrates an example of CPR (Current Picture Referencing) compensation, where blocks are predicted by corresponding blocks in the same picture.
- CPR Current Picture Referencing
- Fig. 3 illustrates an example of MMVD (Merge mode Motion Vector Difference) search process, where a current block in the current frame is processed by bi-direction prediction using a L0 reference frame and a L1 reference frame.
- MMVD Merge mode Motion Vector Difference
- Fig. 4 illustrates the offset distances in the horizontal and vertical directions for a L0 reference block 410 and L1 reference block according to MMVD.
- Fig. 5 illustrates an example of merge mode candidate derivation from spatial and temporal neighbouring blocks.
- Fig. 6A illustrates an example of modified search locations for one of two bases according to an embodiment of the present invention when the two bases are close in the horizontal direction, where the modified search locations include slant search directions downward.
- Fig. 6B illustrates an example of modified search locations for one of two bases according to an embodiment of the present invention when the two bases are close in the vertical direction, where the modified search locations include slant search directions toward right.
- Fig. 7A illustrates an example of search locations for four bases with three being close to each other according to the conventional MMVD.
- Fig. 7B illustrates an example of search locations for four bases with three being close to each other according to an embodiment of the present invention, where the MMVD search locations are based on a common base derived from the three closely located bases.
- Fig. 8A illustrates an example of search locations according to the conventional MMVD, where the search is performed along the vertical and horizontal directions from the bases b0 and b1 respectively.
- Fig. 8B illustrates an example of search locations according to an embodiment of the present invention, where the search locations of the second base are dependent on a relative location with respect to the first base.
- Fig. 9A illustrates an example of search locations according to the conventional MMVD, where the search is performed along the vertical and horizontal directions from the bases b0 and b1 respectively.
- Fig. 9B illustrates an example of search locations according to an embodiment of the present invention, where the search directions for the first base are parallel to the direction of (b 1 –b 0 ) and perpendicular to the direction of (b 1 –b 0 ) and the search directions for the second base are rotated from those of the first base.
- Fig. 10 illustrates a flowchart of another exemplary video coding system that utilizes modified search location for MMVD according to an embodiment of the present invention.
- Motion Compensation one of the key technologies in hybrid video coding, explores the pixel correlation between adjacent pictures. It is generally assumed that, in a video sequence, the patterns corresponding to objects or background in a frame are displaced to form corresponding objects in the subsequent frame or correlated with other patterns within the current frame. With the estimation of such displacement (e.g. using block matching techniques) , the pattern can be mostly reproduced without the need to re-code the pattern. Similarly, block matching and copy has also been tried to allow selecting the reference block from the same picture as the current block. It was observed to be inefficient when applying this concept to camera captured videos. Part of the reasons is that the textual pattern in a spatial neighbouring area may be similar to the current coding block, but usually with some gradual changes over the space. It is difficult for a block to find an exact match within the same picture in a video captured by a camera. Accordingly, the improvement in coding performance is limited.
- a new prediction mode i.e., the intra block copy (IBC) mode or called current picture referencing (CPR)
- IBC intra block copy
- CPR current picture referencing
- a prediction unit PU
- a displacement vector called block vector or BV
- the prediction errors are then coded using transformation, quantization and entropy coding.
- FIG. 2 An example of CPR compensation is illustrated in Fig. 2, where block 212 is a corresponding block for block 210, and block 222 is a corresponding block for block 220.
- the reference samples correspond to the reconstructed samples of the current decoded picture prior to in-loop filter operations, both deblocking and sample adaptive offset (SAO) filters in HEVC.
- SAO sample adaptive offset
- JCTVC-M0350 The very first version of CPR was proposed in JCTVC-M0350 (Budagavi et al., AHG8: Video coding using Intra motion compensation, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 13th Meeting: Incheon, KR, 18–26 Apr. 2013, Document: JCTVC-M0350) to the HEVC Range Extensions (RExt) development.
- the CPR compensation was limited to be within a small local area, with only 1-D block vector and only for block size of 2Nx2N.
- HEVC SCC Stcreen Content Coding
- (BV_x, BV_y) is the luma block vector (the motion vector for CPR) for the current PU; nPbSw and nPbSh are the width and height of the current PU; (xPbS, yPbs) is the location of the top-left pixel of the current PU relative to the current picture; (xCbs, yCbs) is the location of the top-left pixel of the current CU relative to the current picture; and CtbSizeY is the size of the CTU.
- OffsetX and offsetY are two adjusted offsets in two dimensions in consideration of chroma sample interpolation for the CPR mode.
- BVC_x, BVC_y is the chroma block vector, in 1/8-pel resolution in HEVC.
- the reference block for CPR must be within the same tile/slice boundary.
- MMVD Merge with MVD Mode
- MMVD The MMVD technique is proposed in JVECT-J0024 .
- MMVD is used for either skip or merge modes with a proposed motion vector expression method.
- MMVD re-uses the same merge candidates as those in VVC.
- a candidate can be selected, and is further expanded by the proposed motion vector expression method.
- MMVD provides a new motion vector expression with simplified signalling.
- the expression method includes prediction direction information, starting point (also referred as a base in this disclosure) , motion magnitude (also referred as a distance in this disclosure) , and motion direction. Fig.
- FIG. 3 illustrates an example of MMVD search process, where a current block 312 in the current frame 310 is processed by bi-direction prediction using a L0 reference frame 320 and a L1 reference frame 330.
- a pixel location 350 is projected to pixel location 352 in L0 reference frame 320 and pixel location 354 in L1 reference frame 330.
- updated locations will be searched by adding offsets in selected directions. For example, the updated locations correspond to locations along line 342 or 344 in the horizontal direction with distances to at s, 2s or 3s.
- Prediction direction information indicates a prediction direction among L0, L1, and L0 and L1 predictions.
- the proposed method can generate bi-prediction candidates from merge candidates with uni-prediction by using mirroring technique. For example, if a merge candidate is uni-prediction with L1, a reference index of L0 is decided by searching a reference picture in list 0, which is mirrored with the reference picture for list 1. If there is no corresponding picture, the nearest reference picture to the current picture is used. L0’ MV is derived by scaling L1’s MV and the scaling factor is calculated by POC distance.
- MMVD after a merge candidate is selected, it is further expanded or refined by the signalled MVDs information.
- the further information includes a merge candidate flag, an index to specify motion magnitude, and an index for indication of the motion direction.
- one of the first two candidates in the merge list is selected to be used as an MV basis.
- the MMVD candidate flag is signalled to specify which one is used between the first and second merge candidates.
- the initial MVs (i.e., merge candidates) selected from the merge candidate list are also referred as bases in this disclosure. After searching the set of locations, a selected MV candidate is referred as an expanded MV candidate in this disclosure.
- the index with value 0 is signalled as the MMVD prediction direction. Otherwise, the index with value 1 is signalled. After sending first bit, the remaining prediction direction is signalled based on the pre-defined priority order of MMVD prediction direction. Priority order is L0/L1 prediction, L0 prediction and L1 prediction. If the prediction direction of merge candidate is L1, signalling ‘0’ indicates MMVD’ prediction direction as L1. Signalling ‘10’ indicates MMVD’ prediction direction as L0 and L1. Signalling ‘11’ indicates MMVD’ prediction direction as L0. If L0 and L1 prediction lists are same, MMVD’s prediction direction information is not signalled.
- Base candidate index as shown in Table 1, defines the starting point.
- Base candidate index indicates the best candidate among candidates in the list as follows.
- Distance index specifies motion magnitude information and indicates the pre-defined offset from the starting points (412 and 422) for a L0 reference block 410 and L1 reference block 420 as shown in Fig. 4.
- an offset is added to either the horizontal component or the vertical component of the starting MV, where small circles in different styles correspond to different offsets from the centre.
- Table 2 The relation between the distance index and pre-defined offset is specified in Table 2.
- Direction index represents the direction of the MVD relative to the starting point.
- the direction index can represent of the four directions as shown below.
- Direction index represents the direction of the MVD relative to the starting point.
- the direction index can represent the four directions as shown in Table 3. It is noted that the meaning of MVD sign could be variant according to the information of starting MVs. When the starting MVs are an un-prediction MV or bi-prediction MVs with both lists pointing to the same side of the current picture (i.e.
- the sign in Table 3 specifies the sign of the MV offset added to the starting MV.
- the starting MVs are bi-prediction MVs with the two MVs pointing to the different sides of the current picture (i.e. the POC of one reference larger than the POC of the current picture, and the POC of the other reference smaller than the POC of the current picture)
- the sign in Table 3 specifies the sign of MV offset added to the list0 MV component of the starting MV and the sign for the list1 MV has an opposite value. Otherwise, if the difference of POC in list 1 is greater than list 0, the sign in Table 3 specifies the sign of the MV offset added to the list1 MV component of starting MV and the sign for the list0 MV has an opposite value.
- Multi-hypothesis prediction is proposed to improve the existing prediction modes in inter pictures, including uni-prediction of advanced motion vector prediction (AMVP) mode, skip and merge mode, and intra mode.
- the general concept is to combine an existing prediction mode with an extra merge indexed prediction.
- the merge indexed prediction is performed in a manner the same as that for the regular merge mode, where a merge index is signalled to acquire motion information for the motion compensated prediction.
- the final prediction is the weighted average of the merge indexed prediction and the prediction generated by the existing prediction mode, where different weights are applied depending on the combinations.
- JVET-K1030 Choh-Wei Hsu, et al., Description of Core Experiment 10: Combined and multi-hypothesis prediction, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 11th Meeting: Ljubljana, SI, 10–18 July 2018, Document: JVET-K1030) , or JVET-L0100 (Man-Shu Chiang, et al., CE10.1.1: Multi-hypothesis prediction for improving AMVP mode, skip or merge mode, and intra mode, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 12th Meeting: Macao, CN, 3–12 Oct. 2018, Document: JVET-L0100) .
- Pairwise average candidates are generated by averaging predefined pairs of candidates in the current merge candidate list, and the predefined pairs are defined as ⁇ (0, 1) , (0, 2) , (1, 2) , (0, 3) , (1, 3) , (2, 3) ⁇ , where the numbers denote the merge indices to the merge candidate list.
- the averaged motion vectors are calculated separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures; if only one motion vector is available, use the one directly; if no motion vector is available, treat this list as invalid.
- HEVC has the Skip, and Merge mode.
- Skip and Merge modes obtains the motion information from spatially neighbouring blocks (spatial candidates) or a temporal co-located block (temporal candidate) .
- spatial candidates spatially neighbouring blocks
- temporal co-located block temporary candidate
- the residual signal is forced to be zero and not coded.
- a candidate index is signalled to indicate which candidate among the candidate set is used for merging.
- Each merged PU reuses the MV, prediction direction, and reference picture index of the selected candidate.
- up to four spatial MV candidates are derived from A 0 , A 1 , B 0 and B 1 , and one temporal MV candidate is derived from T BR or T CTR (T BR is used first, if T BR is not available, T CTR is used instead) .
- T BR is used first, if T BR is not available, T CTR is used instead
- the position B 2 is then used to derive another MV candidate as a replacement.
- removing redundancy (pruning) is applied to remove redundant MV candidates.
- the encoder selects one final candidate within the candidate set for Skip or Merge modes based on the rate-distortion optimization (RDO) decision, and transmits the index to the decoder.
- RDO rate-distortion optimization
- the skip and merge mode may refer to both skip and merge modes.
- MVD candidates are generated using the same combinations of distances and directions. However, if there are two bases close to each other, applying the same MVD to the two bases will result in two similar motion vectors, which may be redundant.
- the present invent discloses methods to reduce such similar candidates by considering the difference between the bases and adaptively changing distances, directions, or both for each base.
- the set of search locations according to the conventional MMVD is referred as a nominal set of search locations in this disclosure.
- the set of search distances according to the conventional MMVD is referred as a nominal set of search distances in this disclosure.
- the first method is illustrated in Figs. 6A-B.
- the search locations 610 on the left side correspond to the conventional MMVD search.
- the search locations (shown as “x” in the left side of Fig.
- Fig. 6A the search locations associated with base b 1 enclosed by the ellipses 612 may be redundant since they are closely located with the search locations (shown as circles in left side of Fig. 6A) associated with base b 0 .
- the search direction is diagonal and the new search locations 620 are indicated by the ellipses 622.
- search other directions e.g. diagonal direction instead of horizontal directions for b 1 .
- the search locations 630 on the left side correspond to the conventional MMVD search.
- the search locations shown as “x” in the left side of Fig.
- the search locations 640 according to one embodiment of the present invention is shown, where the search direction is diagonal and the new search locations are indicated by the ellipses 642. If neither x nor y difference is small enough, no changes are made since there are probably no redundant candidates.
- the present invention modifies the nominal search locations to avoid redundancy in the search location.
- embodiments of the present invention use a modified set of search locations for MMVD. While the conventional MMVD always searches in the horizontal direction and the vertical direction, the embodiments as shown in Fig. 6A and Fig. 6B use a modified set of search locations comprising a non-horizontal and non-vertical direction.
- B of the bases are close enough, where B is an integer larger than one, define a common base b c using these B bases and search B different direction sets based on the common base instead of searching one direction set based on B different bases.
- the common base can be the base corresponding to the smallest base index or the midpoint of the B bases.
- Fig. 7A illustrates the search locations based on the conventional MMVD, where the search locations (indicated by contour 710) associated with bases b 0 , b 1 and b 2 are concentrated near the center of the search location cluster 710 while search locations (indicated by contour 720) associated with base b 2 well separated with search location cluster 710.
- Fig. 7B illustrates the search locations according to an embodiment of the present invention. In Fig.
- the common base b c 732 is the midpoint of the three bases (i.e., b 0 , b 1 and b 2 ) .
- the search locations (indicated by contour 730) according to the present invention spread out to cover a larger area.
- Fig. 8A illustrates an example of search locations according to the conventional MMVD, where the search is performed along the vertical and horizontal directions from the bases b 0 and b 1 respectively.
- basis b 1 lies in the first quadrant with respect to b 0 .
- search candidates of b 1 are generated in the first quadrant along with the directions (810, 812, 814 and 816) pointing away from basis b 0 as shown in Fig. 8B to prevent redundant candidates.
- the four search directions (810, 812, 814 and 816) include two non-horizontal and non-vertical search directions, one horizontal search direction and one vertical search direction, all pointing away from the first base merge MV. As shown in Fig. 8B, the search locations for b 1 are well separated from these for b 0 .
- search candidates of b 0 are generated along with the directions parallel to or perpendicular to the direction of b 1 –b 0 .
- Fig. 9A illustrates an example of search locations according to the conventional MMVD, where the search is performed along the vertical and horizontal directions from the bases b 0 and b 1 respectively.
- search direction 910 is parallel to the direction of b 1 –b 0 and search direction 920 is perpendicular to the direction of b 1 –b 0 .
- the directions (930 and 940) of b 1 are determined by rotating the directions of b 0 .
- the distance is one of the values in the distance table (Table 2. ) ⁇ 1/4, 1/2, 1, 2, 4, 8, 16, 32 ⁇ .
- one basis generates the candidates following the original distance table, but the other basis generates the candidates using a new distance table, such as ⁇ 3/4, 3/2, 3, 6, 12, 24, 48, 96 ⁇ , to prevent repetitive candidates.
- the distance table for a base is normalized by a factor related to the length of the base (i.e., the magnitude of the MV) .
- the underlying assumption is that a base with larger length tends to have larger MVD. Therefore, the distance table should be changed accordingly.
- the proposed methods that adaptively change the directions can be combined with the proposed methods that adaptively change the distances.
- adding constraints to the base generation process can also prevent redundant MMVD candidates.
- the constraint corresponds to that the distance between any two bases should be large enough (e.g. greater than a threshold) . If two existing bases do not satisfy the constraint, one of them should be removed or replaced by another MVP which satisfies the constraint. In another embodiment, if the distance between two bases is small enough in one direction, add an offset to the one or both to keep them apart.
- any of the MMVD methods described above can be implemented in encoders and/or decoders.
- any of the proposed methods can be implemented in an inter coding module of an encoder (e.g. Inter Pred. 112 in Fig. 1A) , a motion compensation module (e.g., MC 152 in Fig. 1B) , a merge candidate derivation module of a decoder.
- any of the proposed methods can be implemented as a circuit coupled to the inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder. While the Inter-Pred.
- MC 112 and MC 152 are shown as individual processing units to support the MMVD methods, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
- a media such as hard disk or flash memory
- CPU Central Processing Unit
- programmable devices e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) .
- Fig. 10 illustrates a flowchart of another exemplary video coding system that utilizes modified search location for MMVD according to an embodiment of the present invention.
- the steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side.
- the steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart.
- input data associated with a current block are received in step 1010, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side.
- Two or more base merge MVs are determined from a merge list for the current block in step 1020.
- a modified expanded merge candidate is determined for at least one of said two or more base merge MVs using a modified set of search locations if said at least one of said two or more base merge MVs is close to another base MV of said two or more base merge MVs, wherein at least one search location is different between the nominal set of search locations and the modified set of search locations, and wherein the nominal set of search locations comprises one or more defined directions at a set of nominal distances around a target base merge MV.
- the current block is encoded or decoded using motion information comprising the modified expanded merge candidate in step 1040.
- Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both.
- an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein.
- An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein.
- DSP Digital Signal Processor
- the invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) .
- These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention.
- the software code or firmware code may be developed in different programming languages and different formats or styles.
- the software code may also be compiled for different target platforms.
- different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Sont divulgués un procédé et un appareil de codage vidéo utilisant un mode MMVD (fusion avec MVD). Selon le procédé, au moins deux MV de fusion de base provenant d'une liste de fusion sont déterminés pour le bloc actuel. Un candidat de fusion étendu modifié est déterminé pour au moins l'un d'au moins deux MV de fusion de base à l'aide d'un ensemble modifié d'emplacements de recherche si ledit MV desdits deux MV de fusion de base est proche d'un autre MV de base desdits deux MV de fusion de base, au moins un emplacement de recherche étant différent entre l'ensemble nominal d'emplacements de recherche et l'ensemble modifié d'emplacements de recherche ; l'ensemble nominal d'emplacements de recherche comprenant une ou plusieurs directions définies sur un ensemble de distances nominales autour d'un MV de fusion de base cible. Le bloc courant est codé ou décodé à l'aide d'informations de mouvement comprenant le candidat de fusion étendu modifié.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112102681A TWI822567B (zh) | 2022-01-28 | 2023-01-19 | 用於使用移動向量差值之合併模式的視訊編碼之方法和裝置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263304010P | 2022-01-28 | 2022-01-28 | |
US63/304,010 | 2022-01-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023143325A1 true WO2023143325A1 (fr) | 2023-08-03 |
Family
ID=87470738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/072978 WO2023143325A1 (fr) | 2022-01-28 | 2023-01-18 | Procédé et appareil de codage vidéo utilisant un mode fusion avec mvd |
Country Status (2)
Country | Link |
---|---|
TW (1) | TWI822567B (fr) |
WO (1) | WO2023143325A1 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113170191A (zh) * | 2018-11-16 | 2021-07-23 | 联发科技股份有限公司 | 用于视频编码的运动向量差值改良式合并方法和装置 |
CN113228643A (zh) * | 2018-12-28 | 2021-08-06 | 韩国电子通信研究院 | 图像编码/解码方法和设备以及用于存储比特流的记录介质 |
CN113366854A (zh) * | 2019-02-09 | 2021-09-07 | 腾讯美国有限责任公司 | 视频编解码方法及装置 |
US20210337209A1 (en) * | 2019-01-04 | 2021-10-28 | Lg Electronics Inc. | Method and apparatus for decoding image on basis of prediction based on mmvd in image coding system |
-
2023
- 2023-01-18 WO PCT/CN2023/072978 patent/WO2023143325A1/fr unknown
- 2023-01-19 TW TW112102681A patent/TWI822567B/zh active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113170191A (zh) * | 2018-11-16 | 2021-07-23 | 联发科技股份有限公司 | 用于视频编码的运动向量差值改良式合并方法和装置 |
CN113228643A (zh) * | 2018-12-28 | 2021-08-06 | 韩国电子通信研究院 | 图像编码/解码方法和设备以及用于存储比特流的记录介质 |
US20210337209A1 (en) * | 2019-01-04 | 2021-10-28 | Lg Electronics Inc. | Method and apparatus for decoding image on basis of prediction based on mmvd in image coding system |
CN113366854A (zh) * | 2019-02-09 | 2021-09-07 | 腾讯美国有限责任公司 | 视频编解码方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
TWI822567B (zh) | 2023-11-11 |
TW202337216A (zh) | 2023-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11785207B2 (en) | Apparatus of encoding or decoding video blocks by current picture referencing coding | |
US11089323B2 (en) | Method and apparatus of current picture referencing for video coding | |
US11122260B2 (en) | Method and apparatus of Merge list generation for Intra Block Copy mode | |
US11381838B2 (en) | Method and apparatus of improved merge with motion vector difference for video coding | |
US11818383B2 (en) | Methods and apparatuses of combining multiple predictors for block prediction in video coding systems | |
US11539977B2 (en) | Method and apparatus of merge with motion vector difference for video coding | |
US11924444B2 (en) | Method and apparatus of subblock deblocking in video coding | |
WO2023020390A1 (fr) | Procédé et appareil de mise en correspondance de modèles à faible latence dans un système de codage vidéo | |
WO2023143325A1 (fr) | Procédé et appareil de codage vidéo utilisant un mode fusion avec mvd | |
WO2023208224A1 (fr) | Procédé et appareil de réduction de complexité de codage vidéo à l'aide de fusion avec mode mvd | |
WO2023208189A1 (fr) | Procédé et appareil pour l'amélioration d'un codage vidéo à l'aide d'une fusion avec un mode mvd avec mise en correspondance de modèles | |
WO2023222016A1 (fr) | Procédé et appareil de réduction de complexité d'un codage vidéo à l'aide d'une fusion avec un mode mvd | |
WO2023208220A1 (fr) | Procédé et appareil pour réordonner des candidats de fusion avec un mode mvd dans des systèmes de codage vidéo | |
US20240357083A1 (en) | Method and Apparatus for Low-Latency Template Matching in Video Coding System | |
US20240357084A1 (en) | Method and Apparatus for Low-Latency Template Matching in Video Coding System | |
WO2024078331A1 (fr) | Procédé et appareil de prédiction de vecteurs de mouvement basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo | |
WO2023134564A1 (fr) | Procédé et appareil dérivant un candidat de fusion à partir de blocs codés affine pour un codage vidéo | |
WO2023246408A1 (fr) | Procédés et appareil de codage vidéo utilisant une prédiction de vecteur de mouvement non adjacent | |
WO2024149285A1 (fr) | Procédé et appareil de prédiction intra d'appariement modèle-objet pour un codage vidéo | |
WO2024027784A1 (fr) | Procédé et appareil de prédiction de vecteurs de mouvement temporel basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo | |
WO2021093730A1 (fr) | Procédé et appareil de signalisa(ion de résolution adaptative de différence de vecteur de mouvement dans le codage vidéo | |
WO2024088048A1 (fr) | Procédé et appareil de prédiction de signe pour une différence de vecteur de bloc dans une copie de bloc intra | |
WO2024153198A1 (fr) | Procédés et appareil de vecteurs de blocs fractionnaires dans une copie de bloc intra et une mise en correspondance de modèles intra pour un codage vidéo | |
WO2024012045A1 (fr) | Procédés et appareil de codage vidéo utilisant des tables de prédiction de vecteur de mouvement basées sur l'historique et basées sur une ctu | |
WO2023246412A1 (fr) | Procédés et appareil de codage vidéo utilisant de multiples tables de prédiction de vecteur de mouvement basées sur l'historique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23746206 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |