WO2023131298A1 - Boundary matching for video coding - Google Patents

Boundary matching for video coding Download PDF

Info

Publication number
WO2023131298A1
WO2023131298A1 PCT/CN2023/071007 CN2023071007W WO2023131298A1 WO 2023131298 A1 WO2023131298 A1 WO 2023131298A1 CN 2023071007 W CN2023071007 W CN 2023071007W WO 2023131298 A1 WO2023131298 A1 WO 2023131298A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate
coding modes
candidate coding
group
modes
Prior art date
Application number
PCT/CN2023/071007
Other languages
French (fr)
Inventor
Man-Shu CHIANG
Chun-Chia Chen
Chih-Wei Hsu
Tzu-Der Chuang
Ching-Yeh Chen
Shih-Ta Hsiang
Yu-Wen Huang
Original Assignee
Mediatek Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mediatek Inc. filed Critical Mediatek Inc.
Priority to TW112100602A priority Critical patent/TW202337207A/en
Publication of WO2023131298A1 publication Critical patent/WO2023131298A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Definitions

  • the present disclosure relates generally to video coding.
  • the present disclosure relates to ordering of candidate coding modes based on boundary matching.
  • High-Efficiency Video Coding is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) .
  • JCT-VC Joint Collaborative Team on Video Coding
  • HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture.
  • the basic unit for compression termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached.
  • Each CU contains one or multiple prediction units (PUs) .
  • VVC Versatile Video Coding
  • HDR high dynamic range
  • motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation.
  • the motion parameter can be signalled in an explicit or implicit manner.
  • a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index.
  • a merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC.
  • the merge mode can be applied to any inter-predicted CU.
  • the alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.
  • VVC includes several new and refined inter prediction coding tools listed as follows:
  • MMVD Merge mode with MVD
  • SMVD Symmetric MVD
  • AMVR Adaptive motion vector resolution
  • Some embodiments provide a method for using costs to select a candidate coding mode to encode or decode a current block.
  • a video coder receives data for a block of pixels to be encoded or decoded as the current block of a current picture of a video.
  • the video coder identifies multiple candidate coding modes applicable to the current block.
  • the video coder identifies a first group of candidate coding modes that is a subset of the plurality of candidate coding modes.
  • the first group of candidate coding modes may be the highest priority candidate coding modes identified based on cost.
  • the number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes.
  • the video coder selects a candidate coding mode in the first group of candidate coding modes.
  • the video coder encodes or decodes the current block by using the selected candidate coding mode.
  • the plurality of candidate coding modes includes merge candidates of the current block.
  • the merge candidates of the current block may include (i) merge candidates that use combined inter and intra predictions and/or (ii) merge candidates that use affine transform motion compensation prediction.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different combinations of distances and offsets for refining motion information.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different linear models for deriving predictors of chroma samples of the current block based on luma samples of the current block.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different candidate weights for combining inter predictions of different directions.
  • the first group of candidate coding modes are highest priority candidate coding modes identified based on costs of the plurality of candidate coding modes.
  • the encoder may index the candidate coding modes or assign codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes.
  • the cost of a candidate coding mode is a boundary matching cost computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current block that are generated according to the candidate coding mode.
  • each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes.
  • each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K.
  • the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes.
  • the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes.
  • candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group.
  • candidates coding modes that are merge candidates with motion differences greater than a threshold are assigned to a same group.
  • the encoder identifies a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes and signals an index selecting one candidate coding mode from the identified group of candidate coding modes.
  • the representative cost of the identified group may be a mean, a maximum, or a minimum of the costs (e.g., boundary matching costs) of the candidate coding modes of the identified group.
  • the encoder signals an index selecting a group of candidate coding modes and identifies a candidate coding mode from the selected group of candidate coding modes based on costs (e.g., boundary matching costs) of the candidate coding modes of the selected group.
  • FIG. 1 illustrates reconstructed neighboring samples and predicted samples of the current block used for boundary matching.
  • FIG. 2 shows positions of spatial merge candidates.
  • FIG. 3 illustrates motion vector scaling for temporal merge candidate.
  • FIG. 4 illustrates candidate positions for the temporal merge candidate for the current block.
  • FIG. 5 conceptually illustrates Merge Mode with Motion Vector Difference (MMVD) candidates and their corresponding offsets.
  • MMVD Motion Vector Difference
  • FIG. 6 illustrates an example video encoder that may implement candidate coding mode selection based on boundary matching cost.
  • FIG. 7 illustrates portions of the video encoder that implement candidate coding mode selection based on boundary matching costs.
  • FIG. 8 conceptually illustrates a process for using boundary matching costs to select a candidate coding mode to encode the current block.
  • FIG. 9 illustrates an example video decoder that may implement candidate coding mode selection based on boundary matching cost.
  • FIG. 10 illustrates portions of the video decoder that implement candidate coding mode selection based on boundary matching costs.
  • FIG. 11 conceptually illustrates a process for using boundary matching costs to select a candidate coding mode to decode the current block.
  • FIG. 12 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.
  • new merge candidates such as pair-wise average merge candidates, HMVP merge candidates, etc.
  • An index of the best merge candidate is encoded/decoded to indicate the selected merge candidate for the current block.
  • the number of merge candidates in a merge candidate list is limited to a pre-defined number, so not all merge candidates can be added into the merge candidate list. As the number of merge candidates in a merge candidate list increases, the codeword length of the index of the best merge candidate also increases.
  • Some embodiments of the disclosure provide a scheme for adaptively reordering the candidate modes.
  • the video encoder /decoder calculates costs for each candidate mode, which can be a merge candidate and/or a candidate mode of another tool.
  • the video coder determines the priority order of the candidate modes according to the costs. (In some embodiments, the candidate modes with smaller costs get higher priority. In some other embodiments, candidate modes with smaller costs get lower priority. )
  • the candidate modes are then reordered according to the priority order.
  • the video coder uses a reduced, reordered candidate mode set that includes only the first k candidate modes with higher priorities (k ⁇ number of all possible candidate modes. ) Since the number of candidate modes in the candidate mode set is reduced, the syntax for indicating the selected candidate mode is also reduced.
  • An index may be used to refer to the selected candidate mode after reordering, such that a smaller index value may refer to a candidate mode with a higher priority. (In other words, the value of index refers to the index of candidate mode in the list of candidates, and after applying reordering, the value of index refers to the reordered index of candidate mode. )
  • shorter codewords are used for encoding/decoding. The candidate mode with the highest priority may be implicitly set as the coding mode for the current block.
  • the video coder determines the priority order of the candidate modes in the candidate mode set based on boundary matching. For each candidate mode, a boundary matching cost for coding the current block using the candidate mode is calculated, and the priority of the candidate mode in the reordered candidate mode list is determined based on the candidate mode’s boundary matching cost.
  • a boundary matching cost for a candidate mode refers to the discontinuity measurement (including top boundary matching and/or left boundary matching) between the current prediction (the predicted samples or the predictor of the current block) , generated from the candidate mode, and the neighboring reconstruction (the reconstructed samples within one or more neighboring blocks) .
  • Top boundary matching means the comparison between the current top predicted samples and the neighboring top reconstructed samples
  • left boundary matching means the comparison between the current left predicted samples and the neighboring left reconstructed samples.
  • FIG. 1 illustrates the reconstructed neighboring samples and predicted samples used in boundary matching.
  • pred x, 0 are predictor samples along the top boundary
  • reco x, -1 are reconstructed neighboring samples along the top boundary
  • pred 0, y are predictor samples along the left boundary
  • reco - 1 y are reconstructed neighboring samples along the left boundary.
  • a pre-defined subset of the current prediction is used to calculate the boundary matching cost. For example, n line (s) of top boundary within the current block and/or m line (s) of left boundary within the current block can be used. In some of these embodiments, n 2 line (s) of top neighboring reconstruction and/or m 2 line (s) of left neighboring reconstruction can be used for boundary matching.
  • n or n 2 can be any positive integer such as 1, 2, 3, 4, etc.
  • m or m 2 can be any positive integer such as 1, 2, 3, 4, etc.
  • n and/or m vary with block width, height, or area.
  • threshold may be 64, 128, or 256, and when area > threshold, m may be increased to 2 from 1, or increased to 4 from 1 or 2.
  • threshold may be 1, 2, or 4; when height > threshold *width, m maybe increased to 2 from 1, or increased to 4 from 1 or 2.
  • n becomes larger, threshold maybe 64, 128, or 256; when area > threshold, n is increased to 2 from 1, or increased to 4 from 1 or 2.
  • n and/or m can be defined in a video coding standard or depend on the signaling or parsing from the coded video syntax at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, sps level, and/or pps level.
  • top boundary matching is not used and/or only left boundary matching is used. (The neighboring reconstructed samples at cross CTU rows are not used. )
  • left boundary matching is not used and/or only top boundary matching is used.
  • the current block is taller (height > threshold *width)
  • only left boundary matching is used.
  • the current block is wider (width > threshold *height)
  • only top boundary matching is used.
  • top-left neighboring reconstructed samples (reco -1, 1 ) can be used for boundary matching.
  • boundary matching cost is added with the following term:
  • the current prediction when calculating boundary matching cost, can be further added with residuals to reconstruct the samples of the current block, and the reconstructed samples of the current block are used for calculating boundary matching cost.
  • the residuals are generated by recovering the DC and/or all AC coefficients or any subset of AC coefficients after transform process.
  • the transform process can use any pre-defined transform kernel for secondary transform and/or primary transform.
  • the transform kernel for secondary transform refers to Low Frequency Non-Separable Transform (LFNST) transform kernel.
  • the transform kernel for primary transform refers to DCT2 and/or any transform kernel for Multiple Transform Selection (MTS) such as DST7.
  • the transform kernel refers to the real transform applied in the transform module for the current block.
  • Some embodiments of the disclosure provide a scheme to reorder and/or reduce merge candidates.
  • the index (index_best_merge) of the best merge candidate refers to the priority order based on boundary matching costs.
  • only the first k merge candidates (with higher priorities) can be the candidate modes.
  • the boundary matching costs of the merge candidate are calculated as ⁇ cost_cand0, cost_cand1, cost_cand2, ... ⁇ , such that cost_cand0 refers to the boundary matching cost for cand0, cost_cand1 refers to the boundary matching cost for cand1, cost_cand2 refers to the boundary matching cost for cand2, etc.
  • the video coder then reorders ⁇ cand0, cand1, cand2, ... ⁇ based on boundary matching costs.
  • index_best_merge 0 may refer to cand5 (The merge candidate with the smallest cost is signaled with the shortest codewords. )
  • index_best_merge 1 may refer to cand4
  • index_best_merge 2 may refer to cand3, etc.
  • the candidate mode set includes only the first k merge candidates (according to ordering by boundary matching costs) , rather than determined by a constant (e.g., MaxNumMergeCand in VVC, which may be 6) .
  • k may be 4, and the signaling of each merge candidate may include: index_best_merge 0 referring to cand5 having codewords 0 (the merge candidate with the largest cost is signaled with the shortest codeword) ; index_best_merge 1 referring to cand4 having codewords 10; index_best_merge 2 referring to cand3 having codewords 110; index_best_merge 3 referring to cand2 having codewords 111.
  • the order of the merge candidates may be the same as the original list without reordering: index_best_merge 0 referring to cand0 having codeword 0; index_best_merge 1 referring to cand1 having codeword 10; index_best_merge 2 referring to cand2 having codeword 110; index_best_merge 3 referring to cand3 having codeword 111, etc.
  • the merge candidate with the largest cost is signaled with the shortest codewords.
  • the reordered merge candidates are formed as ⁇ cand5, cand4, cand3, cand2, cand1, cand0 ⁇ , such that index_best_merge 0 refers to cand5 having codeword 0 (largest cost having the shortest codeword) ; index_best_merge 1 refers to cand4 having codeword 10; index_best_merge 2 refers to cand3 having code word 110, etc.
  • whether to use the reduced and reordered merge candidate list (reordered based on boundary matching cost and limited to k candidates) or the original merge candidate list (without reordering) is determined based on a predefined rule (e.g. implicitly depending on block width, block height, or block area, or explicitly depending on one or more flags at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, SPS level, and/or PPS level. ) .
  • a predefined rule e.g. implicitly depending on block width, block height, or block area, or explicitly depending on one or more flags at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, SPS level, and/or PPS level.
  • index_best_merge variable for the index_best_merge variable, a smaller value is coded with a shorter length of codewords.
  • the index_best_merge variable may be coded with truncated unary codewords.
  • reordering is applied to only a subset of merge candidate list.
  • the subset may refer to the original first n candidates such as cand0, cand1, and cand2, such that index_best_merge 0/1/2 refers to the priority order based on boundary matching and index_best_merge 3/4/5 refers to original cand 3/4/5.
  • the subset refers to the original last n candidates such as cand3, cand4, and cand5.
  • index_best_merge 3/4/5 refers to the priority order based on boundary matching and index_best_merge 0/1/2 refers to original cand 0/1/2.
  • the subset may refer to spatial merge candidates.
  • the best merge candidate is inferred to be the merge candidate with the smallest boundary matching cost among all merge candidates. In some embodiments, the best merge candidate is inferred to be the merge candidate with the largest boundary matching cost among all merge candidates. In some of these embodiments, and index_best_merge is not signaled/parsed by the encoder/decoder and can be inferred as 0.
  • the merge candidates are split into several groups.
  • a boundary matching cost is calculated for each group.
  • a promising group is implicitly identified as the group with highest priority. If more than one merge candidates are included in the identified promising group, a reduced merge index is signaled/parsed to indicate a merge candidate from the promising group. Since the number of merge candidates in each group is less than the number of merge candidates in the original merge candidate list, the reduced merge index would take less codewords than the original merge index.
  • the number of merge candidates in each group is “N/k” and the grouping rule depends on the merge index and/or merge type.
  • the merge candidates with the same value of “merge index %k” are in the same group.
  • the merge candidates with the same value of “merge index /k” are in the same group.
  • spatial merge candidates are in the same group.
  • k varies with block width, block height, and/or block area.
  • the merge candidates with small motion difference are in the same group.
  • the motion difference includes mv difference and/or reference picture difference.
  • An example of calculating mv difference (denoted as mv_diff) between candidate 0 and candidate 1 is:
  • mv_diff
  • Motion difference is small if the reference pictures are the same and/or mv difference is smaller than a pre-defined threshold.
  • the cost for this group is the average cost of the costs from all merge candidates in this group.
  • the representative cost for this group is the mean, maximum or minimum cost of the costs from all merge candidates in this group.
  • the merge candidates are partitioned into several subsets, and the selection of a subset is explicitly signaled to the decoder. Furthermore, the selection of a merge candidate in the subset of merge candidates is made by using boundary matching costs.
  • the subset partitioning rule can depend on the merge index and/or merge type. For example, the merge candidates with the same value of “merge index %k” are in the same subset. For another example, the merge candidates with the same value of merge index /k are in the same subset (k may vary with block width, block height, and/or block area. ) For another example, spatial merge candidates are in different subsets. For another example, the merge candidates with large motion difference are in the same subset.
  • the motion difference includes mv difference and/or reference picture difference.
  • An example of calculating mv difference (denoted as mv_diff) between candidate 0 and candidate 1 is:
  • mv_diff
  • Motion difference is large if the reference pictures are different and/or mv difference is larger than a pre-defined threshold.
  • the merge candidates of a CU including one or more following candidates: (1) Spatial merge candidates or Spatial MVP from spatial neighbour CUs, (2) Temporal MVP from collocated CUs, (3) History-based MVP from a FIFO table, (4) Pairwise average MVP, and/or (5) Zero MVs.
  • the merge candidates in this section refers to the merge candidates for combined inter and intra prediction (CIIP) .
  • the predicted samples within the current block are generated as CIIP process.
  • the merge candidates in this section refers to the merge candidates for subblock merging candidates such as affine merge candidates.
  • the predicted samples within the current block are generated by an affine process, e.g., block-based affine transform motion compensation prediction.
  • a maximum of four merge candidates are selected among candidates located in the positions around the CU as shown in FIG. 2, which shows positions of spatial merge candidates.
  • the order of derivation is B 0, A 0, B 1, A 1 and B 2 .
  • Position B 2 is considered only when one or more than one CUs of position B 0 , A 0 , B 1 , A 1 are not available (e.g., because it belongs to another slice or tile) or is intra coded.
  • candidate at position A 1 is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check.
  • a candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information. Only the following pairs are considered: (A 1 , B 1 ) , (A 1 , A 0 ) , (A 1 , B 2 ) , (B 1 , B 0 ) , (B 1 , B 2 )
  • a scaled motion vector is derived based on co-located CU belonging to the collocated reference picture.
  • the reference picture list and the reference index to be used for derivation of the co-located CU is explicitly signaled in the slice header.
  • the scaled motion vector for temporal merge candidate is obtained as illustrated by the dotted line in FIG. 3, which illustrates motion vector scaling for temporal merge candidate.
  • the scaled motion vector is scaled from the motion vector of the co-located CU using the picture order count (POC) distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture.
  • the reference picture index of temporal merge candidate is set equal to zero.
  • the position for the temporal candidate is selected between candidates C 0 and C 1 , as depicted FIG. 4, which illustrates candidate positions for the temporal merge candidate for a current block. If a CU at position C 0 is not available, is intra coded, or is outside of the current row of CTUs, position C 1 is used for the temporal merge candidate. Otherwise, position C 0 is used in the derivation of the temporal merge candidate.
  • the history-based MVP (HMVP) merge candidates are added to merge list after the spatial MVP and TMVP.
  • HMVP history-based MVP
  • the motion information of a previously coded block is stored in a table and used as MVP for the current CU.
  • the table with multiple HMVP candidates is maintained during the encoding/decoding process.
  • the table is reset (emptied) when a new CTU row is encountered. Whenever there is a non-subblock inter-coded CU, the associated motion information is added to the last entry of the table as a new HMVP candidate.
  • the HMVP table size S is set to be 6, which indicates up to 5 History-based MVP (HMVP) candidates may be added to the table.
  • HMVP History-based MVP
  • FIFO constrained first-in-first-out
  • HMVP candidates could be used in the merge candidate list construction process.
  • the latest several HMVP candidates in the table are checked in order and inserted to the candidate list after the TMVP candidate. Redundancy check is applied on the HMVP candidates to the spatial or temporal merge candidate.
  • the last two entries in the table are redundancy checked to A 1 and B 1 spatial candidates, respectively. Once the total number of available merge candidates reaches the maximally allowed merge candidates minus 1, the merge candidate list construction process from HMVP is terminated.
  • Pairwise average candidates are generated by averaging predefined pairs of candidates in the existing merge candidate list, using the first two merge candidates.
  • the first merge candidate is defined as p0Cand and the second merge candidate can be defined as p1Cand, respectively.
  • the averaged motion vectors are calculated according to the availability of the motion vector of p0Cand and p1Cand separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures, and its reference picture is set to the one of (or for) p0Cand and p1Cand (e.g.
  • the half-pel interpolation filter indices of p0Cand and p1Cand are different, it is set to 0.
  • the zero MVPs are inserted in the end until the maximum merge candidate number is encountered.
  • a CU When a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64) , and if both CU width and CU height are less than 128 luma samples, an additional flag maybe signaled to indicate if the combined inter/intra prediction (CIIP) mode is applied to the current CU.
  • the CIIP prediction combines an inter prediction signal with an intra prediction signal.
  • the inter prediction signal in the CIIP mode P inter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal P intra is derived following the regular intra prediction process with the planar mode or the one or more intra prediction modes derived from a pre-defined mechanism.
  • the pre-defined mechanism is based on the neighboring reference regions (template) of the current block.
  • the intra prediction mode of a CU is implicitly derived by a neighboring template at both encoder and decoder, instead of being signalled as the exact intra prediction mode bits to the decoder.
  • the prediction samples of the template are generated using the reference samples of the template for each candidate mode.
  • a cost is calculated as the SATD between the prediction and the reconstruction samples of the template.
  • the intra prediction mode with the minimum cost and/or some intra prediction modes with the smaller costs are selected and used for intra prediction of the CU.
  • the candidate modes may be all MPMs and/or any subset of MPMs, 67 intra prediction modes as in VVC or extended to 131 intra prediction modes.
  • the intra and inter prediction signals are combined using weighted averaging, where the weight value is calculated depending on the coding modes of the top and left neighbouring blocks.
  • the CIIP prediction P CIIP is formed as follows: (wt is the weight value)
  • An object in a video may have different types of motion, including translation motions, zoom in/out motions, rotation motions, perspective motions and the other irregular motions.
  • a block-based affine transform motion compensation prediction is used to account for these various types of motion.
  • VVC provides a block-based affine transform motion compensation prediction.
  • the affine motion field of the current block can be described by motion information of two control points (at e.g., top-right and top-left corners of the block) (4-parameter) or motion information of three control points (at e.g., top-right, top-left, and bottom-left corners of the block) (6-parameter) .
  • 4-parameter affine motion model motion vector at sample location (x, y) in a block is derived as:
  • motion vector at sample location (x, y) in a block is derived as:
  • Affine merge mode or AF_MERGE mode can be applied for CUs with both width and height larger than or equal to 8.
  • the motion vectors at the control points (CPMVs) of the current CU are generated based on the motion information of the spatial neighboring CUs.
  • the following three types of CPMV candidates are used to form the affine merge candidate list: (1) inherited affine merge candidates that extrapolated from the CPMVs of the neighbour CUs; (2) constructed affine merge candidates CPMVPs that are derived using the translational MVs of the neighbour CUs; (3) zero MVs.
  • VVC there are maximum two inherited affine candidates, which are derived from affine motion model of the neighboring blocks, one from left neighboring CUs (left predictor) and one from above neighboring CUs (above predictor) .
  • the scan order is A0->A1
  • the scan order is B0->B1->B2.
  • Only the first inherited candidate from each side is selected. No pruning check is performed between two inherited candidates.
  • a neighboring affine CU is identified, its control point motion vectors are used to derive the CPMVP candidate in the affine merge list of the current CU.
  • the three CPMVs of the current CU are calculated according to the motion vectors of the top-left, top-right, and left-bottom corners of the neighbouring affine CU.
  • the two CPMVs of the current CU are calculated according to motion vectors of the top-left and top-right corners of the neighbouring affine CU.
  • a constructed affine candidate is constructed by combining the neighbor translational motion information of each control point.
  • the motion information for the control points is derived from the spatial neighbors (A0, A1, A2, B0, B1, B2, B3) and temporal neighbors of the current block.
  • CPMV 1 the B2->B3->A2 blocks are checked and the MV of the first available block is used.
  • CPMV 2 the B1->B0 blocks are checked and for CPMV 3 , the A1->A0 blocks are checked.
  • For TMVP is used as CPMV 4 if it’s available.
  • affine merge candidates are constructed based on those motion information.
  • the following combinations of control point MVs are used to construct in order: ⁇ CPMV 1 , CPMV 2 , CPMV 3 ⁇ , ⁇ CPMV 1 , CPMV 2 , CPMV 4 ⁇ , ⁇ CPMV 1 , CPMV 3 , CPMV 4 ⁇ , ⁇ CPMV 2 , CPMV 3 , CPMV 4 ⁇ , ⁇ CPMV 1 , CPMV 2 ⁇ , ⁇ CPMV 1 , CPMV 3 ⁇ .
  • the combination of 3 CPMVs constructs a 6-parameter affine merge candidate and the combination of 2 CPMVs constructs a 4-parameter affine merge candidate.
  • the related combination of control point MVs is discarded.
  • MMVD Motion Vector Difference
  • VVC Versatile Video Coding
  • a MMVD flag may be signaled after sending a skip flag and merge flag to specify whether MMVD mode is used for a CU. If MMVD mode is used, a selected merge candidate is refined by MVD information.
  • the MVD information also include a merge candidate flag, a distance index to specify motion magnitude, and an index for indication of motion direction.
  • the merge candidate flag is signaled to specify which of the first two merge candidates is to be used as a starting MV.
  • the distance index is used to specify motion magnitude information by indicating a pre-defined offset from the starting MV.
  • the offset may be added to either horizontal component or vertical component of the starting MV.
  • An example mapping from the distance index to the pre-defined offset is specified in Table lII-1 below:
  • the direction index represents the direction of the MVD relative to the starting point.
  • the direction index can represent one of the four directions as shown in Table lII-2.
  • MVD sign may vary according to the information of the starting MV.
  • the starting MV is an un-prediction MV or a bi-prediction MV with both lists pointing to the same side of the current picture (i.e., picture order counts or POCs, of the two reference pictures are both larger than the POC of the current picture, or are both smaller than the POC of the current picture)
  • the sign in Table lII-2 specifies the sign of MV offset added to the starting MV.
  • the starting MVs is bi-prediction MVs with the two MVs point to the different sides of the current picture (i.e.
  • a predefined offset (MmvdOffset) of a MMVD candidate is derived from or expressed as a distance value (MmvdDistance) and a directional sign (MmvdSign or MmvdDirection)
  • FIG. 5 conceptually illustrates MMVD candidates and their corresponding offsets.
  • the figure illustrates a merge candidate 510 as the starting MV and several MMVD candidates in the vertical direction and the in the horizontal direction.
  • Each of the MMVD candidate is derived by applying an offset to the starting MV 510.
  • the MMVD candidate 522 is derived by adding offset of 2 to the horizontal component of the merge candidate 510
  • the MMVD candidate 524 is derived by adding offset -1 to the vertical component to the merge candidate 510.
  • MMVD candidates with offsets in the horizontal direction such as the MMVD candidate 522
  • MMVD candidates with offsets in the vertical direction such as the MMVD candidate 524, are referred to as vertical MMVD candidates.
  • the candidate modes reduce/reordering scheme is used to reorder MMVD candidates to improve the certain syntax elements.
  • the following is the syntax table of MMVD in VVC standard.
  • the syntax element mmvd_cand_flag [x0] [y0] specifies whether the first (0) or the second (1) candidate in the merging candidate list is used with the motion vector difference derived from the syntax elements mmvd_distance_idx [x0] [y0] and mmvd_direction_idx [x0] [y0] .
  • the array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture.
  • the syntax element mmvd_distance_idx [x0] [y0] specifies the index used to derive the variable MmvdDistance [x0] [y0] as specified below.
  • the array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture.
  • Syntax element mmvd_direction_idx [x0] [y0] specifies index used to derive the variable MmvdSign [x0] [y0] as specified below.
  • the array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture.
  • the syntax element mmvd_cand_flag can be improved by using reduced/reordered candidate mode list. Specifically, boundary matching costs is calculated for each MMVD mode, including MMVD mode 0, in which MMVD with the first candidate is in the merging candidate list; and MMVD mode 1, in which MMVD with the second candidate is in the merging candidate list.
  • MMVD mode 0 cost for MMVD mode 1
  • mmvd_cand_flag being equal to 0 refers to MMVD mode 1
  • mmvd_cand_flag 1 refers to MMVD mode 0.
  • mmvd_cand_flag is implicit and the first or second candidate (which has a smallest cost) in the merging candidate list is used for MMVD.
  • mmvd_cand_flag is implicit and the first or second candidate (which has a largest cost) in the merging candidate list is used for MMVD.
  • the signaling of syntax elements mmvd_cand_flag, mmvd_distance_idx, and mmvd_direction_idx can be improved by using reduced/reordered candidate mode list.
  • a joint indication (denoted as MMVD_joint_idx) is signaled/parsed/assigned to specify the selected combination of mmvd_cand_flag, mmvd_distance_idx, and mmvd_direction_idx.
  • MMVD_joint_idx is ranging from 0 to the value (e.g., 63) calculated by “number of MMVD candidates” (e.g., 2) * “number of MMVD distances” (e.g., 8) * “number of MMVD directions” (e.g., 4) minus one.
  • boundary matching cost is calculated for each MMVD combination.
  • MMVD_joint_idx may be used to select a MMVD combination based on the reordering. For example, in some embodiments, if the cost for MMVD combination 0 > cost for MMVD combination 1 > ..., MMVD_joint_idx being equal to 0 refers to MMVD combination 63 and MMVD_joint_idx being equal to 63 refers to MMVD combination 0. In some embodiments, MMVD_joint_idx is implicit and the MMVD combination that has a smallest cost is used for MMVD. In some embodiments, the number of MMVD combinations is reduced and the MMVD combinations with smaller costs are kept in the candidate mode set. The codewords for signaling/parsing MMVD_joint_idx are thereby reduced.
  • MMVD_joint_idx may be used to select a MMVD combination based on the reordering. In some embodiments, if the cost for MMVD combination 0 ⁇ cost for MMVD combination 1 ⁇ ..., then MMVD_joint_idx being equal to 0 refers to MMVD combination 63 and MMVD_joint_idx being equal to 63 refers to MMVD combination 0. In some embodiments, MMVD_joint_idx is implicit and the MMVD combination with the largest cost is used for MMVD.
  • the number of MMVD combinations is reduced and the MMVD combinations with larger costs are kept in the candidate mode set.
  • the codewords for signaling/parsing MMVD_joint_idx are thereby reduced. Similar method can be used to improve the signaling of mmvd_distance_idx, and/or mmvd_direction_idx.
  • BCW CU-level Weight
  • P 0 represents pixel values predicted by L0 MV (or L0 prediction) .
  • P 1 represents pixel values predicted by L1 MV (or L1 prediction) .
  • P bi-pred is the weighted average of P 0 and P 1 according to w.
  • the possible values for w include ⁇ -2, 3, 4, 5, 10 ⁇ , these are also referred to as BCW candidate weights.
  • the possible values for w include ⁇ 3, 4, 5 ⁇ .
  • the LC-RDO stage may employ an interleaving search pattern for finding the best value for the BCW weighting parameter w.
  • More weights can be supported as follows. For example, for merge mode, weights are extended from ⁇ -2, 3, 4, 5, 10 ⁇ to ⁇ -4, -3, -2, -1, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12 ⁇ or any subset of above. When negative bi-predicted weights are not supported, weights for merge mode are extended from ⁇ -2, 3, 4, 5, 10 ⁇ to ⁇ 1, 2, 3, 4, 5, 6, 7 ⁇ . In addition, the negative bi-predicted weights for non-merge mode are replaced with positive weights, that is, the weights ⁇ -2, 10 ⁇ is replaced with ⁇ 1, 7 ⁇ .
  • the proposed candidate modes reduce/reordering scheme is used to reorder BCW candidates to improve the signaling of certain syntax elements, e.g., bcw_idx.
  • certain syntax elements e.g., bcw_idx.
  • the following is the syntax table of BCW in VVC standard.
  • the syntax element bcw_idx [x0] [y0] specifies the weight index of bi-prediction with CU weights.
  • the array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture.
  • the weight w is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; 2) for a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index.
  • the boundary matching cost of each BCW candidate weights is computed, and the different BCW candidate weights are reordered according to the costs.
  • the cost for a pre-defined candidate can be reduced.
  • the pre-defined candidate refers to the candidate inferred from one or more neighbouring blocks.
  • the candidate weights include (only) the two neighboring bi-predicted weights (i.e. ⁇ 1) and the inherited (inferred) bi-predicted weight.
  • the bcw_idx being equal to 0 refers to the BCW candidate weight with the smallest cost and bcw_idx being equal to 4 refers to the BCW candidate weight with the largest cost.
  • bcw_idx is implicit and the BCW candidate weight having a smallest cost is used. In some embodiments, the bcw_idx being equal to 0 refers to the BCW candidate weight with the largest cost and bcw_idx being equal to 4 refers to the BCW candidate weight with the smallest cost. In some embodiments, bcw_idx is implicit and the BCW candidate weight having a largest cost is used.
  • only the first k BCW weights can be the candidate weights (the original number of BCW weights is 5) in a candidate mode set.
  • the number of BCW weights in the candidate mode set is k (e.g., 3) , and the signaling of each BCW weight is shown below: bcw_idx 0 refers to the BCW candidate with highest priority using codewords 0; bcw_idx 1 refers to the BCW candidate with second highest priority using codewords 10; bcw_idx 2 refers to the BCW candidate with third highest priority using codewords 11.
  • whether to use the reduced and reordered BCW candidates (reordered based on boundary matching cost and limited to k candidates) or the original BCW candidates (without reordering) is determined based on a predefined rule (e.g. implicitly depending on block width, block height, or block area, or explicitly depending on one or more flags at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, SPS level, and/or PPS level. ) .
  • a predefined rule e.g. implicitly depending on block width, block height, or block area, or explicitly depending on one or more flags at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, SPS level, and/or PPS level.
  • the BCW weights are split into several groups. Next, a boundary matching cost is calculated for each group. Then, a promising group is implicitly selected to be the group with highest priority. If more than one BCW weights are included in the promising group, a reduced BCW weight index is signaled/parsed to indicate a BCW weight from the promising group. Since the number of BCW weights in each group is less than the number of BCW weights in the original BCW weight set, the reduced BCW weight index may take less codewords than the original BCW weight index.
  • the number of BCW weights in each group is “N/k” and the grouping rule depends on the BCW weight index and/or BCW weight values.
  • the BCW weights with the same value of “BCW weight index %k” are in the same group.
  • the BCW weights with the same value of “BCW weight index /k” are in the same group.
  • BCW weight with similar values are in the same group (e.g., the BCW weight with negative values are in one group and/or the BCW weight with positive values are in another group) .
  • k (the number of candidates in the group) varies with block width, block height, and/or block area.
  • the cost for this group is the average cost of the costs from all BCW weights in this group. In some embodiments, when a group contains more than one BCW weights, the cost for this group is the mean, maximum, or minimum cost of the costs from all BCW weights in this group. In some embodiments, the BCW weights are partitioned into several subsets, and the selection of a subset is explicitly signaled to the decoder. And the selection of the subset is made by using boundary matching costs.
  • the BCW weights are partitioned into multiple subsets.
  • BCW weights having similar values are partitioned into different subsets such that the BCW weights of a same subset have large differences with each other.
  • the weights having negative values may be partitioned into different group.
  • the weights with weight differences larger than a pre-defined threshold are partitioned into the same group.
  • the pre-defined threshold can be fixed or explicitly decided. This may help to improve the hit rate of boundary matching.
  • Cross Component Linear Model (CCLM) or Linear Model (LM) mode is a cross component prediction mode in which chroma components of a block is predicted from the collocated reconstructed luma samples by linear models.
  • the parameters (e.g., scale and offset) of the linear model are derived from already reconstructed luma and chroma samples that are adjacent to the block.
  • the CCLM mode makes use of inter-channel dependencies to predict the chroma samples from reconstructed luma samples. This prediction is carried out using a linear model in the form of:
  • P (i, j) represents the predicted chroma samples in a CU (or the predicted chroma samples of the current CU) and rec′ L (i, j) represents the down-sampled reconstructed luma samples of the same CU (or the corresponding reconstructed luma samples of the current CU) .
  • the CCLM model parameters ⁇ (scaling parameter) and ⁇ (offset parameter) are derived based on at most four neighboring chroma samples and their corresponding down-sampled luma samples.
  • LM_Amode also denoted as LM-T mode or INTRA_T_CCLM
  • LM_L mode also denoted as LM-L mode or INTRA_L_CCLM
  • LM-LA mode also denoted as LM-LT mode or INTRA_LT_CCLM
  • both left and above templates are used to calculate the linear model coefficients.
  • the proposed methods can apply to any cross-component mode and not be limited to applying to LM modes.
  • the cross-component mode is to use the information of the first color component (e.g. Y) to predict the second/third color components (e.g. Cb/Cr) .
  • the proposed candidate modes reduce/reordering scheme is used to reorder LM candidates to improve the signaling of certain syntax, e.g., cclm_mode_idx.
  • certain syntax e.g., cclm_mode_idx.
  • the following is the syntax table of LM in VVC standard.
  • the syntax element cclm_mode_idx specifies which one of the LM candidate modes (e.g., INTRA_LT_CCLM, INTRA_L_CCLM and INTRA_T_CCLM) is applied. For example, boundary matching costs are calculated for the different LM candidate modes. After reordering of the different LM canddiate modes according to the boundary matching costs, cclm_mode_idx may be used to select a candidate LM candidate modes based on the reordering.
  • the cclm_mode_idx being equal to 0 refers to the LM candidate mode with the smallest cost and cclm_mode_idx being equal to 2 refers to the LM candidate mode with the largest cost.
  • cclm_mode_idx is implicit and the LM candidate mode having a smallest cost is used.
  • only the first k LM candidate modes can be the candidate LM modes (the original number of LM candidate modes is 3) .
  • k 2 and the signaling of the LM candidate modes are as follows: cclm_mode_idx 0 refers to the candidate LM mode with smallest cost using codewords 0; cclm_mode_idx 1 refers to the candidate LM mode with second smallest cost using codewords 1.
  • cclm_mode_idx is implicit and the LM candidate mode having a largest cost is used.
  • cclm_mode_idx 0 referring to the candidate LM mode with the largest cost using codewords 0;
  • cclm_mode_idx 1 refers to the candidate LM mode with second largest cost using codewords 1.
  • the reduced/reordered candidate modes list may also be used in other variations of CCLM.
  • the variations of CCLM here mean that some optional modes can be selected when the block indication refers to using one of cross-component modes (e.g. CCLM_LT, MMLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, and/or an intra prediction mode, which is not one of traditional DC, planar, and angular modes) for the current block.
  • cross-component modes e.g. CCLM_LT, MMLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, and/or an intra prediction mode, which is not one of traditional DC, planar, and angular modes
  • CCCM convolutional cross-component mode
  • the cost for a pre-defined candidate can be reduced.
  • the pre-defined candidate may refer to the candidate inferred from one or more neighbouring blocks and/or a popular or common mode such as CCLM_LT, MMLM_LT, CCCM_LT.
  • the candidate modes include (only) the related popular modes and the popular mode.
  • the related popular modes are CCCM_L and/or CCCM_T and the popular mode is CCCM_LT.
  • the related popular modes are CCLM_L and/or CCLM_T and the popular mode is CCLM_LT.
  • the related popular modes are MMLM_L and/or MMLM_T and the popular mode is MMLM_LT.
  • the popular mode indicates whether the used reference is from L, T, or LT. If the used reference is from LT, the related popoular modes are all or subset of variations of LT such as CCLM_LT, MMLM_LT, and CCCM_LT.
  • the candidate modes correspond to linear models that are derived by using neighboring reconstructed samples of Cb and Cr as the inputs X and Y, (i.e., using reconstructed Cb samples to derive or predict Cr samples or vice versa) .
  • the candidate modes correspond to linear models that are derived from multiple collocated luma blocks for predicting or deriving chroma components.
  • a candidate mode in the reduced/reorder candidate list may correspond to a multi-model linear model (MMLM) mode, such that different linear models may be selected from multiple linear models for predicting chroma components of different regions or groups of pixels. Similar to CCLM, MMLM can have MMLM_LT, MMLM_L, and/or MMLM_T.
  • the LM candidate modes in the reduced/reordered candidate modes list may also include any LM extentions/variations, such that the number of candidate modes in the list may also increase. With increased number of LM candidiate modes, the coding performance improvement from using the reduced/reordered candidate modes list based on boundary matching cost becomes more significant.
  • MHP multi-hypothesis prediction
  • one or more hypotheses of prediction are combined with the existing hypothesis of prediction to form the final (resulting) prediction of the current block.
  • MHP can be applied to many inter modes such as merge, subblock merge, inter AMVP and/or affine.
  • the final prediction may be accumulated iteratively with each additional hypothesis of prediction signal. The example of accumulation is shown below:
  • the resulting prediction signal is obtained as the last p n (i.e., the p n having the largest index n) .
  • P 0 is the first (existing) prediction of the current block. For example, if MHP is applied to a merge candidate, p 0 is indicated with the existing merge index.
  • the additional prediction is denoted as h and will be further combined with the previously-accumulated prediction by a weighting ⁇ . Therefore, for each additional hypothesis of the prediction, a weighting index is signaled/parsed to indicate the weighting and/or for each hypothesis of prediction a inter index is signaled/parsed to indicate the motion candidate (used for generate the predicted samples for this hypothesis) .
  • boundary matching costs are calculated for each MHP candidate weight for each additional hypothesis of prediction. For example, assume that there are two candidate weights for a hypothesis of prediction:
  • Step 0 for an additional hypothesis of prediction, cost_w0 and cost_w1 are calculated as the costs of
  • Step 1 for that hypothesis of prediction, the candidate weights are reordered depending on the costs.
  • the candidate weight with a smaller cost gets a higher priiority. If cost_w0 >cost_w1, the weight index 0 refers to w1 and the weight index 1 refers to w0. Otherwise, reordering is not used and the weight index 0 and 1 refers to w0 and w1, as original. In some embodiments, the candidate weight with a larger cost gets a higher priiority. If cost_w0 ⁇ cost_w1, the weight index 0 refers to w1 and the weight index 1 refers to w0. Otherwise, reordering is not used and the weight index 0 and 1 refers to w0 and w1, as original.
  • the candidate weight with a smallest cost is used for the current additional hypothesis of prediction.
  • the weight index for the current additional hypothesis of prediction is inferred in this case.
  • the weights for each hypothesis or any subset of hypotheses are implicitly set according to the costs.
  • the candidate weight with a largest cost is used for the current additional hypothesis of prediction.
  • the weight index for the current additional hypothesis of prediction is inferred in this case.
  • Steps 0 and 1 may be repeated for each additional hypothesis of prediction and get the meaning of each weight index for each additional hypothesis of prediction.
  • Step 0 For an additional hypothesis of prediction, calculate the costs of each candidate weight,
  • Step 1 For that hypothesis of prediction, the candidate weights are reordered depending on the costs.
  • the candidate weight with a smaller cost gets a higher priority.
  • the candidate weight with a larger cost gets a higher priority.
  • Only the first k candidate weights (with higher priorities) can be the candidate weights.
  • the candidate mode set includes only the first k candidate weights.
  • the number of candidate weights in the candidate mode set is k (e.g., 2) and the signaling of each candidate weight is as follows: weight_idx 0 referring to cand with highest priority with codewords 0; weight_idx 1 referring to cand with second highest priority with codewords 1. Steps 0 and 1 may be repeated for each additional hypothesis of prediction and get the meaning of each weight index for each additional hypothesis of prediction.
  • the MHP weights are split into several groups. Next, a boundary matching cost is calculated for each group. Then, the promising group is implicitly decided to be the group with highest priority. If more than one MHP weights are included in the promising group, a reduced MHP weight index is signaled/parsed to indicate a MHP weight from the promising group. Since the number of MHP weights in each group is less than the number of MHP weights in the original MHP weight set, the reduced MHP weight index should take less codewords than the original MHP weight index.
  • the number of MHP weights in each group is “N/k” and the grouping rule depends on the MHP weight index and/or MHP weight values.
  • the MHP weights with the same value of “MHP weight index %k” are in the same group.
  • the MHP weights with the same value of “MHP weight index /k” are in the same group.
  • MHP weight with similar values are in the same group (e.g., the MHP weights with negative values are in one group and/or the MHP weight with positive values are in another group)
  • k varies with block width, block height, and/or block area.
  • the cost for this group is the average cost of the costs from all merge candidates in this group. In some embodiments, when a group contains more than one merge candidates, the cost for this group is the mean, maximum or minimum cost of the costs from all merge candidates in this group.
  • the MHP weights are partitioned into several subsets, and the subset selection is explicitly signaled to the decoder. And the selection in the subset is made by using boundary matching costs.
  • the weights with large different values with each other will be partitioned in the same subset. That is, the weights with similar values will be partitioned in the different subsets. This may help to improve the hit rate of boundary matching.
  • the weights with negative values are partitioned into different group.
  • those weights with weight difference larger than a pre-defined threshold are partitioned into the same group.
  • the pre-defined threshold can be fixed or explicitly decided.
  • the current prediction (used in calculating boundary matching cost) for a candidate weight is the combined preidction (the weighted average of the prediction from a motion candidate and the previously accumulated prediction) .
  • the motion candidate is indicated by a signaled/parsed motion index.
  • a example is shown in the following figure. When applying the propose methods to reorder the signalling of candidate weights for h2 and there are 4 candidate weights (including cand 0 to 3) for h2.
  • the cost for cand n is calculated as the weighted average of previously accumulated prediction and h2’s prediction. (Weighting is from cand n. )
  • boundary matching costs are calculated for each MHP motion candidate. Take MHP for merge mode as an example in the following sub-embodiments. (MHP can be applied to other inter modes such as inter AMVP and/or affine and when inter AMVP or affine is used, “merge” in the following example is replaced with the naming of that inter mode. )
  • the boundary matching costs for each motion candidate are calculated.
  • the candidate modes include cand 0 to 4. Originally, the index 0 (shorter codewords) refers to cand 0 and the index 4 (longer codewords) refers to cand 4. With the proposed method, the meaning of index follows the priority. If the priority order (based on the boundary matching costs) specifies that cand 4 has the highest priority. The index 0 is mapped to cand 4.
  • the boundary matching costs for each motion candidate are calculated.
  • the candidate modes include cand 0 to 4.
  • the index 0 shorter codewords
  • the index 4 longer codewords
  • the meaning of index follows the priority. If the priority order (based on the boundary matching costs) specifies that cand 4 has the highest priority. The index is not signaled/parsed and the selected motion candidate is inferred as the one with highest priority.
  • the current prediction (used in calculating boundary matching cost) for a motion candidate is the motion compensation result generated by that motion candidate.
  • the current prediction (used in calculating boundary matching cost) for a motion candidate is the combined preidction (the weighted average of the prediction from a motion candidate and the existing prediction (p0) ) . (Weight is indicated by a signaled/parsed weight index. )
  • the current prediction (used in calculating boundary matching cost) for a motion candidate is the combined preidction (the weighted average of the prediction from a motion candidate and the previously accumulated prediction) .
  • Weight is indicated by a signaled/parsed weight index.
  • the cost for cand n is calculated as the weighted average of p0’s prediction, h1’s prediction, and the prediction from cand n.
  • current prediction for candidate 0 weighted average of prediction of h1, prediction from candidate 0
  • current prediction for candidate 1 weighted average of prediction of h1, prediction from candidate 1.
  • current prediction for candidate 0 weighted average of prediction of p0, prediction from candidate 0
  • current prediction for candidate 1 weighted average of prediction of p0, prediction from candidate 1.
  • boundary matching costs are calculated for each MHP combination (motion candidate and weight) for each hypothesis of prediction.
  • MHP for merge mode Take MHP for merge mode as an example in the following sub-embodiments.
  • MHP can be applied to other inter modes such as inter AMVP and/or affine and when inter AMVP or affine is used, “merge” in the following example is replaced with the naming of that inter mode. )
  • one combination refers to a motion cahdiadte and a weight. If there are m motion canidate and n meights for each motion candiadte, the number of cmbinations is m*n.
  • the current prediction (used in calculating boundary matching cost) for a combination is the combined preidction (the weighted average of the prediction from a motion candidate and the existing prediction (p 0 ) ) .
  • the merge index and the weight for indicating the additional hypothese of prediction are jointly decided.
  • the combination with the highest priority is the selected MHP combination. (No need to signal/parse the merge index and the weight for indicate the additional hypothese of prediction) .
  • a joint index is signalled/parased to decide the MHP combination. The number of addition hypotheses can be fixed this embodiment.
  • boundary matching costs are calculated for each MHP motion candidate.
  • MHP for merge mode as an example.
  • the merge index (which is used to inidcate a motion candidate for each hypothesis) maybe inferred. For example, depending on the order of merge canidates in the merge candidate list, merge candidate 0 for hypothesis 0, merge candidate 1 for hypothesis 1, etc. For another example, depending on the costs, merge candidate with a smaller cost is used first. For another example, depending on the predefined number of merge canidates. If number of hypotheses is 4, use 4 merge candidates from the merge candidate list to be the motion candidate for each hypothesis. The first 4 merge candidates are used, and any 4 merge candidates from the merge caniddaite list are used.
  • the weights (which are used to combine hypotheses of prediction) are implicit depending on the costs. MHP resulting prediction is formed by
  • a fixed number of hypotheses of prediction is used. That is, blend fixed number of hypotheses and use matching cost as weights implicitly.
  • the weight for the motion candidate (or can be said as for the hypothesis) with higher priority is larger than the weight for the motion candidate with lower priority.
  • the current prediction (used in calculating boundary matching cost) for a motion candidate is the motion compensation result generated by that motion candidate.
  • the first n motions candidates in the merge candidate list are used for generating the hypotheses of predictions. With this proposed method, no merge index are signalled for MHP. In some embodiments, all motions candidates in the merge candidate list are used for generating the hypotheses of predictions.
  • the cost for each motion candidate is normalized to an interval [MIN_VALUE, MAX_VALUE] first.
  • the MAX_VALUE is pre-defined such as number of hypotheses of prediction.
  • the MIN_VALUE is pre-defined such as 0.
  • (MAX_VALUE-the normalized costs) can be the weights for motion candidates or the normalized costs can be the weights for motion candidates.
  • the weights are scaled values of the costs or scaled values of multiplicative inverse of the costs.
  • the weight and merge index are implicit with the proposed method.
  • the generation of current prediction for this method can reference any other proposed methods in this invention.
  • the proposed scheme is applied to a subset of all additional hypotheses of prediction. (That is, the above step 0 and 1 are repeated for the subset of all additional hypotheses of prediction. ) For example, only candiadte weights of the first additional hypothesis of prediction (which is combined with the existing hypothesis of prediction) are reordered with the proposed scheme.
  • the subset is pre-defined in the video coding standard. In embodiments, the subset depends on the current block width, height, or area. For example, for a block with the block area larger (or smaller) than a threshold, the subset includes more hypotheses of prediction.
  • the reordering results from the subset can be reused for the remaining additional hypotheses of prediction.
  • the weight index 0 and 1 refer to w1 and w0, respectively.
  • the weight index 0 and 1 also refer to w1 and w0, respectively.
  • the proposed scheme is applied to a subset of all candidate weights for an additional hypothesis of prediction. That is, the above step 0 and 1 are repeated for the subset of all candidate weights for an additional hypothesis of prediction. Take the number of candidate weights (for an additional hypothesis of prediction) equal to 4 as an example. For an additional hypothesis of prediction, only the first (or last) two candiadte weights are reordered with the proposed scheme.
  • the subset is pre-defined in the video coding standard. In some embodiments, the subset depends on the current block width, height, or area. For example, for a block with the block area larger (or smaller) than a threshold, the subset includes more candidate weights.
  • a hypothesis of prediction can be the prediction signal from an uni-prediction or bi-prediction motion compendation result.
  • the proposed reordering scheme for different tools can be unified.
  • the proposed reordering scheme for MHP, LM, BCW, MMVD, and/or merge candidates is unified with the same rule of calculating boundary matching costs.
  • the proposed methods in this invention can be enabled and/or disabled according to implicit rules (e.g. block width, height, or area) or according to explicit rules (e.g., based on syntax on block, tile, slice, picture, SPS, or PPS level) .
  • implicit rules e.g. block width, height, or area
  • explicit rules e.g., based on syntax on block, tile, slice, picture, SPS, or PPS level
  • the proposed reordering is applied when the block area is smaller than a threshold. Any combination of the proposed methods in this invention can be applied.
  • any of the foregoing proposed methods can be implemented in encoders and/or decoders.
  • any of the proposed methods can be implemented in an intra/inter coding module of an encoder, a motion compensation module, a merge candidate derivation module of a decoder.
  • any of the proposed methods can be implemented as a circuit coupled to the intra/inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder.
  • FIG. 6 illustrates an example video encoder 600 that may implement candidate coding mode selection based on boundary matching cost.
  • the video encoder 600 receives input video signal from a video source 605 and encodes the signal into bitstream 695.
  • the video encoder 600 has several components or modules for encoding the signal from the video source 605, at least including some components selected from a transform module 610, a quantization module 611, an inverse quantization module 614, an inverse transform module 615, an intra-picture estimation module 620, an intra-prediction module 625, a motion compensation module 630, a motion estimation module 635, an in-loop filter 645, a reconstructed picture buffer 650, a MV buffer 665, and a MV prediction module 675, and an entropy encoder 690.
  • the motion compensation module 630 and the motion estimation module 635 are part of an inter-prediction module 640.
  • the modules 610 –690 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 610 –690 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 610 –690 are illustrated as being separate modules, some of the modules can be combined into a single module.
  • the video source 605 provides a raw video signal that presents pixel data of each video frame without compression.
  • a subtractor 608 computes the difference between the raw video pixel data of the video source 605 and the predicted pixel data 613 from the motion compensation module 630 or intra-prediction module 625.
  • the transform module 610 converts the difference (or the residual pixel data or residual signal 608) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) .
  • the quantization module 611 quantizes the transform coefficients into quantized data (or quantized coefficients) 612, which is encoded into the bitstream 695 by the entropy encoder 690.
  • the inverse quantization module 614 de-quantizes the quantized data (or quantized coefficients) 612 to obtain transform coefficients, and the inverse transform module 615 performs inverse transform on the transform coefficients to produce reconstructed residual 619.
  • the reconstructed residual 619 is added with the predicted pixel data 613 to produce reconstructed pixel data 617.
  • the reconstructed pixel data 617 is temporarily stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
  • the reconstructed pixels are filtered by the in-loop filter 645 and stored in the reconstructed picture buffer 650.
  • the reconstructed picture buffer 650 is a storage external to the video encoder 600.
  • the reconstructed picture buffer 650 is a storage internal to the video encoder 600.
  • the intra-picture estimation module 620 performs intra-prediction based on the reconstructed pixel data 617 to produce intra prediction data.
  • the intra-prediction data is provided to the entropy encoder 690 to be encoded into bitstream 695.
  • the intra-prediction data is also used by the intra-prediction module 625 to produce the predicted pixel data 613.
  • the motion estimation module 635 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 650. These MVs are provided to the motion compensation module 630 to produce predicted pixel data.
  • the video encoder 600 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 695.
  • the MV prediction module 675 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation.
  • the MV prediction module 675 retrieves reference MVs from previous video frames from the MV buffer 665.
  • the video encoder 600 stores the MVs generated for the current video frame in the MV buffer 665 as reference MVs for generating predicted MVs.
  • the MV prediction module 675 uses the reference MVs to create the predicted MVs.
  • the predicted MVs can be computed by spatial MV prediction or temporal MV prediction.
  • the difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 695 by the entropy encoder 690.
  • the entropy encoder 690 encodes various parameters and data into the bitstream 695 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
  • CABAC context-adaptive binary arithmetic coding
  • the entropy encoder 690 encodes various header elements, flags, along with the quantized transform coefficients 612, and the residual motion data as syntax elements into the bitstream 695.
  • the bitstream 695 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.
  • the in-loop filter 645 performs filtering or smoothing operations on the reconstructed pixel data 617 to reduce the artifacts of coding, particularly at boundaries of pixel blocks.
  • the filtering operation performed includes sample adaptive offset (SAO) .
  • the filtering operations include adaptive loop filter (ALF) .
  • FIG. 7 illustrates portions of the video encoder 600 that implement candidate coding mode selection based on boundary matching costs. Specifically, the figure illustrates the components of the inter-prediction module 640 of the video encoder 600 that calculate boundary matching costs of all or any subset of the different candidate coding modes and select a candidate coding mode based on group assignment and the computed costs.
  • the inter-prediction module 640 includes various boundary prediction modules 710. These boundary prediction modules 710 generates predictor samples 715 for the current block along the boundary of the current block for various candidate coding modes.
  • the boundary predictor samples 715 of each candidate coding modes are compared with neighboring samples 725 of the current block (retrieved from the reconstructed picture buffer 650) by a boundary matching cost calculator 730 to compute the boundary matching cost of each of all or any subset of the candidate coding modes.
  • Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
  • a group assignment and selection module 740 assigns the different candidate coding modes into different groups and select one of the groups.
  • a candidate selection module 750 then select one candidate coding mode from the selected group.
  • the inter-prediction module 640 then uses the selected candidate coding mode to perform motion compensation.
  • the group assignment module 740 assigns certain number of candidate coding modes having the lowest boundary matching costs to form one lowest cost group and the candidate selection module 750 selects a candidate coding mode from this lowest cost group.
  • the group assignment module 740 assigns certain number of candidate coding modes to form a group with a pre-defined rule (may depending or not depending costs) and the candidate selection module 750 selects a candidate coding mode from the group depending on the costs.
  • the selected candidate coding mode is the candidate mode with the highest priority in the group.
  • the identity of the selected group is sent to the entropy encoder 690 to be signaled in the bitstream 695. In some embodiments, the identity of the selected group is to be implicitly determined and not signaled in the bitstream. In some embodiments, the selection of the group is determined based on the computed boundary matching costs of the different groups, e.g., the group selection module 740 may select a group having a lowest representative cost.
  • the identity of the selected candidate coding mode within the selected group is provided to the entropy encoder 690 to be signaled in the bitstream 695.
  • the candidate coding modes within a group are reordered according to the computed boundary matching costs such that the lowest cost (or the highest cost) candidate will be signaled using the shortest codeword.
  • the identity of the selected candidate coding mode is to be implicitly determined and not signaled in the bitstream, e.g., by selecting the candidate coding mode with the lowest boundary matching cost within the group. The selection of a group of candidate coding modes and the selection of a candidate coding mode from the group is described in Sections II-VI above.
  • the boundary prediction modules 710 include modules for various different candidate coding modes. These candidate coding modes may use luma and chroma samples stored in the reconstructed buffer 650, motion information from MV buffer 665, and/or incoming luma and chroma samples from the video source 605 to generate the boundary predictor samples 715 (by e.g., deriving merge candidates, deriving MMVD candidates, using motion information to fetch reference samples, generating linear models, etc. )
  • the various candidate coding modes include those that correspond to various merge candidates of the current block. Merge candidates of various types (e.g., spatial, temporal, affine, CIIP, etc., ) as candidate coding modes are described in Section II above.
  • FIG. 8 conceptually illustrates a process 800 for using boundary matching costs to select a candidate coding mode to encode the current block.
  • one or more processing units e.g., a processor
  • a computing device implementing the encoder 600 performs the process 800 by executing instructions stored in a computer readable medium.
  • an electronic apparatus implementing the encoder 600 performs the process 800.
  • the encoder receives (at block 810) data to be encoded as a current block of pixels in a current picture.
  • the encoder identifies (at block 820) a plurality of candidate coding modes applicable to the current block.
  • the plurality of candidate coding modes includes merge candidates of the current block.
  • the merge candidates of the current block may include (i) merge candidates that use CIIP and/or (ii) merge candidates that use affine transform motion compensation prediction.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different MMVD combinations of distances and offsets for refining motion information.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different linear models for deriving predictors of chroma samples of the current block based on luma samples of the current block. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different candidate BCW weights for combining inter predictions of different directions.
  • the encoder identifies (at block 830) a first group of candidate coding modes that is a subset of the plurality of candidate coding modes, such that the number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes.
  • the first group of candidate coding modes are highest priority candidate coding modes identified based on costs of the plurality of candidate coding modes.
  • the encoder may index the candidate coding modes or assign codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes.
  • the cost of a candidate coding mode is a boundary matching cost computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current block that are generated according to the candidate coding mode. Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
  • each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes.
  • each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K.
  • the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes.
  • the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes.
  • candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group.
  • candidates coding modes that are merge candidates with motion differences greater than a threshold are assigned to a same group.
  • the encoder identifies a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes and signals an index selecting one candidate coding mode from the identified group of candidate coding modes.
  • the representative cost of the identified group may be a mean, a maximum, or a minimum of the costs (e.g., boundary matching costs) of the candidate coding modes of the identified group.
  • the encoder signals an index selecting a group of candidate coding modes and identifies a candidate coding mode from the selected group of candidate coding modes based on costs (e.g., boundary matching costs) of the candidate coding modes of the selected group.
  • costs e.g., boundary matching costs
  • the encoder selects (at block 840) a candidate coding mode in the first group of candidate coding modes.
  • the selected candidate coding mode in the first group of candidate coding modes may be selected based on cost.
  • the encoder encodes (at block 850) the current block by using the selected candidate coding mode. Specifically, the encoder constructs a predictor of the current block according to the selected candidate coding mode and use the predictor to encode the current block.
  • an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.
  • FIG. 9 illustrates an example video decoder 900 that may implement candidate coding mode selection based on boundary matching cost.
  • the video decoder 900 is an image-decoding or video-decoding circuit that receives a bitstream 995 and decodes the content of the bitstream into pixel data of video frames for display.
  • the video decoder 900 has several components or modules for decoding the bitstream 995, including some components selected from an inverse quantization module 911, an inverse transform module 910, an intra-prediction module 925, a motion compensation module 930, an in-loop filter 945, a decoded picture buffer 950, a MV buffer 965, a MV prediction module 975, and a parser 990.
  • the motion compensation module 930 is part of an inter-prediction module 940.
  • the modules 910 –990 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 910 –990 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 910 –990 are illustrated as being separate modules, some of the modules can be combined into a single module.
  • the parser 990 receives the bitstream 995 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard.
  • the parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 912.
  • the parser 990 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
  • CABAC context-adaptive binary arithmetic coding
  • Huffman encoding Huffman encoding
  • the inverse quantization module 911 de-quantizes the quantized data (or quantized coefficients) 912 to obtain transform coefficients, and the inverse transform module 910 performs inverse transform on the transform coefficients 916 to produce reconstructed residual signal 919.
  • the reconstructed residual signal 919 is added with predicted pixel data 913 from the intra-prediction module 925 or the motion compensation module 930 to produce decoded pixel data 917.
  • the decoded pixels data are filtered by the in-loop filter 945 and stored in the decoded picture buffer 950.
  • the decoded picture buffer 950 is a storage external to the video decoder 900.
  • the decoded picture buffer 950 is a storage internal to the video decoder 900.
  • the intra-prediction module 925 receives intra-prediction data from bitstream 995 and according to which, produces the predicted pixel data 913 from the decoded pixel data 917 stored in the decoded picture buffer 950.
  • the decoded pixel data 917 is also stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
  • the content of the decoded picture buffer 950 is used for display.
  • a display device 955 either retrieves the content of the decoded picture buffer 950 for display directly, or retrieves the content of the decoded picture buffer to a display buffer.
  • the display device receives pixel values from the decoded picture buffer 950 through a pixel transport.
  • the motion compensation module 930 produces predicted pixel data 913 from the decoded pixel data 917 stored in the decoded picture buffer 950 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 995 with predicted MVs received from the MV prediction module 975.
  • MC MVs motion compensation MVs
  • the MV prediction module 975 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation.
  • the MV prediction module 975 retrieves the reference MVs of previous video frames from the MV buffer 965.
  • the video decoder 900 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 965 as reference MVs for producing predicted MVs.
  • the in-loop filter 945 performs filtering or smoothing operations on the decoded pixel data 917 to reduce the artifacts of coding, particularly at boundaries of pixel blocks.
  • the filtering operation performed includes sample adaptive offset (SAO) .
  • the filtering operations include adaptive loop filter (ALF) .
  • FIG. 10 illustrates portions of the video decoder 900 that implement candidate coding mode selection based on boundary matching costs. Specifically, the figure illustrates the components of the inter-prediction module 940 of the video decoder 900 that calculate boundary matching costs of all or any subset of the different candidate coding modes and select a candidate coding mode based on group assignment and the computed costs.
  • the inter-prediction module 940 includes various boundary prediction modules 1010. These boundary prediction modules 1010 generates predictor samples 1015 for the current block along the boundary of the current block for various candidate coding modes.
  • the boundary predictor samples 1015 of each candidate coding modes are compared with neighboring samples 1025 of the current block (retrieved from the reconstructed picture buffer 950) by a boundary matching cost calculator 1030 to compute the boundary matching cost of each of all or any subset of the candidate coding modes.
  • Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
  • a group assignment and selection module 1040 assigns the different candidate coding modes into different groups and select one of the groups.
  • a candidate selection module 1050 then select one candidate coding mode from the selected group.
  • the inter-prediction module 940 then uses the selected candidate coding mode to perform motion compensation.
  • the group assignment module 1040 assigns certain number of candidate coding modes having the lowest boundary matching costs to form one lowest cost group and the candidate selection module 1050 selects a candidate coding mode from this lowest cost group.
  • the group assignment module 1040 assigns certain number of candidate coding modes to form a group with a pre-defined rule (may depending or not depending costs) and the candidate selection module 1050 selects a candidate coding mode from the group depending on the costs.
  • the selected candidate coding mode is the candidate mode with the highest priority in the group.
  • the entropy decoder 990 parses the bitstream 995 for the identity of the selected group and provide the identity of the selected group to the group assignment and selection module 1040.
  • the identity of the selected group is implicitly determined and not signaled in the bitstream.
  • the selection of the group is determined based on the computed boundary matching costs of the different groups, e.g., the group selection module 1040 may select a group having a lowest representative cost.
  • the entropy decoder 990 parses the bitstream 995 for the identity of the selected candidate coding mode within the selected group and provide the identity of the selected candidate coding mode to the candidate selection module 1050.
  • the candidate coding modes within a group are reordered according to the computed boundary matching costs such that the lowest cost (or the highest cost) candidate will be signaled using the shortest codeword.
  • the identity of the selected candidate coding mode is to be implicitly determined and not signaled in the bitstream, e.g., by selecting the candidate coding mode with the lowest boundary matching cost within the group.
  • the boundary prediction modules 1010 include modules for various different candidate coding modes. These candidate coding modes may use luma and chroma samples stored in the reconstructed buffer 950 and motion information from MV buffer 965 to generate the boundary predictor samples 1015 (by e.g., deriving merge candidates, deriving MMVD candidates, using motion information to fetch reference samples, generating linear models, etc. )
  • the candidate coding modes include those that correspond to various merge candidates of the current block.
  • Merge candidates of various types e.g., spatial, temporal, affine, CIIP, etc.,
  • Different MMVD distances and directions as candidate coding modes are described in Section III above.
  • Different BCW weights as candidate coding modes are described in Section IV above.
  • Linear models of various types e.g., LM-L, LM-T, LM-LT
  • Linear models of various types e.g., LM-L, LM-T, LM-LT
  • FIG. 11 conceptually illustrates a process 1100 for using boundary matching costs to select a candidate coding mode to decode the current block.
  • one or more processing units e.g., a processor
  • a computing device implementing the decoder 900 performs the process 1100 by executing instructions stored in a computer readable medium.
  • an electronic apparatus implementing the decoder 900 performs the process 1100.
  • the decoder receives (at block 1110) data to be decoded as a current block of pixels in a current picture.
  • the decoder identifies (at block 1120) a plurality of candidate coding modes applicable to the current block.
  • the plurality of candidate coding modes includes merge candidates of the current block.
  • the merge candidates of the current block may include (i) merge candidates that use combined inter and intra prediction (CIIP) and/or (ii) merge candidates that use affine transform motion compensation prediction.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different MMVD combinations of distances and offsets for refining motion information.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different linear models (LMs) for deriving predictors of chroma samples of the current block based on luma samples of the current block.
  • the plurality of candidate coding modes includes candidate coding modes that correspond to different candidate BCW weights for combining inter predictions of different directions.
  • the decoder identifies (at block 1130) a first group of candidate coding modes that is a subset of the plurality of candidate coding modes.
  • the number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes. (i.e., the first group is a subset of the plurality of candidate coding modes. )
  • the first group of candidate coding modes are highest priority candidate coding modes identified based on costs of the plurality of candidate coding modes.
  • the decoder may index the candidate coding modes or assign codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes.
  • the cost of a candidate coding mode is a boundary matching cost computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current block that are generated according to the candidate coding mode. Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
  • each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes.
  • each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K.
  • the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes.
  • the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes.
  • candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group.
  • candidates coding modes that are merge candidates with motion differences greater than a threshold are assigned to a same group.
  • the decoder identifies a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes and signals an index selecting one candidate coding mode from the identified group of candidate coding modes.
  • the representative cost of the identified group may be a mean, a maximum, or a minimum of the costs (e.g., boundary matching costs) of the candidate coding modes of the identified group.
  • the decoder signals an index selecting a group of candidate coding modes and identifies a candidate coding mode from the selected group of candidate coding modes based on costs (e.g., boundary matching costs) of the candidate coding modes of the selected group.
  • costs e.g., boundary matching costs
  • the decoder selects (at block 1140) a candidate coding mode in the first group of candidate coding modes.
  • the selected candidate coding mode in the first group of candidate coding modes may be selected based on cost.
  • the decoder decodes (at block 1150) the current block by using the selected candidate coding mode to reconstruct the current block. Specifically, the decoder constructs a predictor of the current block according to the selected candidate coding mode and use the predictor to reconstruct the current block. The decoder may then provide the reconstructed current block for display as part of the reconstructed current picture.
  • Computer readable storage medium also referred to as computer readable medium
  • these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions.
  • computational or processing unit e.g., one or more processors, cores of processors, or other processing units
  • Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc.
  • the computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
  • the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor.
  • multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions.
  • multiple software inventions can also be implemented as separate programs.
  • any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure.
  • the software programs when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
  • FIG. 12 conceptually illustrates an electronic system 1200 with which some embodiments of the present disclosure are implemented.
  • the electronic system 1200 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device.
  • Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media.
  • Electronic system 1200 includes a bus 1205, processing unit (s) 1210, a graphics-processing unit (GPU) 1215, a system memory 1220, a network 1225, a read-only memory 1230, a permanent storage device 1235, input devices 1240, and output devices 1245.
  • the bus 1205 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1200.
  • the bus 1205 communicatively connects the processing unit (s) 1210 with the GPU 1215, the read-only memory 1230, the system memory 1220, and the permanent storage device 1235.
  • the processing unit (s) 1210 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure.
  • the processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1215.
  • the GPU 1215 can offload various computations or complement the image processing provided by the processing unit (s) 1210.
  • the read-only-memory (ROM) 1230 stores static data and instructions that are used by the processing unit (s) 1210 and other modules of the electronic system.
  • the permanent storage device 1235 is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1200 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1235.
  • the system memory 1220 is a read-and-write memory device. However, unlike storage device 1235, the system memory 1220 is a volatile read-and-write memory, such a random access memory.
  • the system memory 1220 stores some of the instructions and data that the processor uses at runtime.
  • processes in accordance with the present disclosure are stored in the system memory 1220, the permanent storage device 1235, and/or the read-only memory 1230.
  • the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1210 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
  • the bus 1205 also connects to the input and output devices 1240 and 1245.
  • the input devices 1240 enable the user to communicate information and select commands to the electronic system.
  • the input devices 1240 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc.
  • the output devices 1245 display images generated by the electronic system or otherwise output data.
  • the output devices 1245 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
  • CTR cathode ray tubes
  • LCD liquid crystal displays
  • bus 1205 also couples electronic system 1200 to a network 1225 through a network adapter (not shown) .
  • the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1200 may be used in conjunction with the present disclosure.
  • Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) .
  • computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.
  • the computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • integrated circuits execute instructions that are stored on the circuit itself.
  • PLDs programmable logic devices
  • ROM read only memory
  • RAM random access memory
  • the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people.
  • display or displaying means displaying on an electronic device.
  • the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
  • any two components so associated can also be viewed as being “operably connected” , or “operably coupled” , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable” , to each other to achieve the desired functionality.
  • operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

Abstract

A video coder receives data for a block of pixels to be encoded or decoded as the current block of a current picture of a video. The video coder identifies multiple candidate coding modes applicable to the current block. The video coder identifies a first group of candidate coding modes that is a subset of the plurality of candidate coding modes. The first group of candidate coding modes may be the highest priority candidate coding modes identified based on cost. The number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes. The video coder selects a candidate coding mode in the first group of candidate coding modes. the video coder encodes or decodes the current block by using the selected candidate coding mode.

Description

BOUNDARY MATCHING FOR VIDEO CODING
CROSS REFERENCE TO RELATED PATENT APPLICATION (S)
The present disclosure is part of a non-provisional application that claims the priority benefit of U.S. Provisional Patent Application No. 63/297, 252, filed on 7 January 2022. Contents of above-listed application is herein incorporated by reference.
TECHNICAL FIELD
The present disclosure relates generally to video coding. In particular, the present disclosure relates to ordering of candidate coding modes based on boundary matching.
BACKGROUND
Unless otherwise indicated herein, approaches described in this section are not prior art to the claims listed below and are not admitted as prior art by inclusion in this section.
High-Efficiency Video Coding (HEVC) is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) . HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs) .
Versatile Video Coding (VVC) is a codec designed to meet upcoming needs in videoconferencing, over-the-top streaming, mobile telephony, etc. VVC is meant to be very versatile and address all the video needs from low resolution and low bitrates to high resolution and high bitrates, high dynamic range (HDR) , 360 omnidirectional, etc.
For each inter-predicted CU, motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation. The motion parameter can be signalled in an explicit or implicit manner. When a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC. The merge mode can be applied to any inter-predicted CU. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.
Beyond the inter coding features in HEVC, VVC includes several new and refined inter prediction coding tools listed as follows:
- Extended merge prediction
- Merge mode with MVD (MMVD)
- Symmetric MVD (SMVD) signalling
- Affine motion compensated prediction
- Subblock-based temporal motion vector prediction (SbTMVP)
- Adaptive motion vector resolution (AMVR)
- Motion field storage: 1/16 th luma sample MV storage and 8x8 motion field compression
- Bi-prediction with CU-level weight (BCW)
- Bi-directional optical flow (BDOF)
- Decoder side motion vector refinement (DMVR)
- Geometric partitioning mode (GPM)
- Combined inter and intra prediction (CIIP)
SUMMARY
The following summary is illustrative only and is not intended to be limiting in any way. That is, the following summary is provided to introduce concepts, highlights, benefits and advantages of the novel and non-obvious techniques described herein. Select and not all implementations are further described below in the detailed description. Thus, the following summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.
Some embodiments provide a method for using costs to select a candidate coding mode to encode or decode a current block. A video coder receives data for a block of pixels to be encoded or decoded as the current block of a current picture of a video. The video coder identifies multiple candidate coding modes applicable to the current block. The video coder identifies a first group of candidate coding modes that is a subset of the plurality of candidate coding modes. The first group of candidate coding modes may be the highest priority candidate coding modes identified based on cost. The number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes. The video coder selects a candidate coding mode in the first group of candidate coding modes. the video coder encodes or decodes the current block by using the selected candidate coding mode.
In some embodiments, the plurality of candidate coding modes includes merge candidates of the current block. The merge candidates of the current block may include (i) merge candidates that use combined inter and intra predictions and/or (ii) merge candidates that use affine transform motion compensation prediction. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different combinations of distances and offsets for refining motion information. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different linear models for deriving predictors of chroma samples of the current block based on luma samples of the current block. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different candidate weights for combining inter predictions of different directions.
In some embodiments, the first group of candidate coding modes are highest priority candidate coding modes identified based on costs of the plurality of candidate coding modes. The encoder may index the candidate coding modes or assign codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes. In some embodiments, the cost of a candidate coding mode is a boundary matching cost computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current block that are generated according to the candidate coding mode.
In some embodiments, each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes. For example, in some embodiment, each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K. In some embodiments, the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes. In some embodiments, the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes. In some embodiments, candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group. In some embodiments, candidates coding modes that are merge candidates with motion differences greater than a threshold are assigned to a same group. In some embodiments, the encoder identifies a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes and signals an index selecting one candidate coding mode from the identified group of candidate coding modes. The representative cost of the identified group may be a mean, a maximum, or a minimum of the costs (e.g., boundary matching costs) of the candidate coding modes of the identified group. In some embodiments, the encoder signals an index selecting a group of candidate coding modes and identifies a candidate coding mode from the selected group of candidate coding modes based on costs (e.g., boundary matching costs) of the  candidate coding modes of the selected group.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of the present disclosure. The drawings illustrate implementations of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. It is appreciable that the drawings are not necessarily in scale as some components may be shown to be out of proportion than the size in actual implementation in order to clearly illustrate the concept of the present disclosure.
FIG. 1 illustrates reconstructed neighboring samples and predicted samples of the current block used for boundary matching.
FIG. 2 shows positions of spatial merge candidates.
FIG. 3 illustrates motion vector scaling for temporal merge candidate.
FIG. 4 illustrates candidate positions for the temporal merge candidate for the current block.
FIG. 5 conceptually illustrates Merge Mode with Motion Vector Difference (MMVD) candidates and their corresponding offsets.
FIG. 6 illustrates an example video encoder that may implement candidate coding mode selection based on boundary matching cost.
FIG. 7 illustrates portions of the video encoder that implement candidate coding mode selection based on boundary matching costs.
FIG. 8 conceptually illustrates a process for using boundary matching costs to select a candidate coding mode to encode the current block.
FIG. 9 illustrates an example video decoder that may implement candidate coding mode selection based on boundary matching cost.
FIG. 10 illustrates portions of the video decoder that implement candidate coding mode selection based on boundary matching costs.
FIG. 11 conceptually illustrates a process for using boundary matching costs to select a candidate coding mode to decode the current block.
FIG. 12 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.
DETAILED DESCRIPTION
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. Any variations, derivatives and/or extensions based on teachings described herein are within the protective scope of the present disclosure. In some instances, well-known methods, procedures, components, and/or circuitry pertaining to one or more example implementations disclosed herein may be described at a relatively high level without detail, in order to avoid unnecessarily obscuring aspects of teachings of the present disclosure
With improvement in video coding, more coding tools are developed. However, the coding gains for these various coding tools are not additive. This is because (1) not all new coding modes can be candidate modes for a block, especially when syntax overhead is considered, and (2) as candidate modes increases for a block, longer codewords are required for indicating a coding mode from multiple candidate modes.
For example, after HEVC, new merge candidates such as pair-wise average merge candidates, HMVP merge candidates, etc., are proposed to be added into the merge candidate list. An index of the best merge candidate is encoded/decoded to indicate the selected merge candidate for the current block. However, the number of merge candidates in a merge candidate list is limited to a pre-defined number, so not all merge candidates can be added into the merge candidate list. As the number of merge candidates in a merge candidate list increases, the codeword length of the index of the best merge candidate also increases.
Some embodiments of the disclosure provide a scheme for adaptively reordering the candidate modes. According to the scheme, the video encoder /decoder calculates costs for each candidate mode, which can be a merge candidate and/or a candidate mode of another tool. The video coder determines the priority order of the candidate modes according to the costs. (In some embodiments, the candidate modes with smaller costs get higher priority. In some other embodiments, candidate modes with smaller costs get lower priority. ) The candidate modes are then reordered according to the priority order.
In some embodiments, the video coder uses a reduced, reordered candidate mode set that includes only the first k candidate modes with higher priorities (k < number of all possible candidate modes. ) Since the number of candidate modes in the candidate mode set is reduced, the syntax for indicating the selected candidate mode is also reduced. An index may be used to refer to the selected candidate mode after reordering, such that a smaller index value may refer to a candidate mode with a higher priority. (In other words, the value of index refers to the index of candidate mode in the list of candidates, and after applying reordering, the value of index refers to the reordered index of candidate mode. ) In some embodiments, for candidate modes with higher priorities, shorter codewords are used for encoding/decoding. The candidate mode with the highest priority may be implicitly set as the coding mode for the current block.
In some embodiments, the video coder determines the priority order of the candidate modes in the candidate mode set based on boundary matching. For each candidate mode, a boundary matching cost for coding the current block using the candidate mode is calculated, and the priority of the candidate mode in the reordered candidate mode list is determined based on the candidate mode’s boundary matching cost.
I. Boundary Matching Cost
A boundary matching cost for a candidate mode refers to the discontinuity measurement (including top boundary matching and/or left boundary matching) between the current prediction (the predicted samples or the predictor of the current block) , generated from the candidate mode, and the neighboring reconstruction (the reconstructed samples within one or more neighboring blocks) . Top boundary matching means the comparison between the current top predicted samples and the neighboring top reconstructed samples, and left boundary matching means the comparison between the current left predicted samples and the neighboring left reconstructed samples.
FIG. 1 illustrates the reconstructed neighboring samples and predicted samples used in boundary matching. As illustrated, pred x, 0 are predictor samples along the top boundary, reco x, -1 are reconstructed neighboring samples along the top boundary; where pred 0, y are predictor samples along the left boundary, reco - 1, y are reconstructed neighboring samples along the left boundary.
In some embodiment, a pre-defined subset of the current prediction is used to calculate the boundary matching cost. For example, n line (s) of top boundary within the current block and/or m line (s) of left boundary within the current block can be used. In some of these embodiments, n 2 line (s) of top neighboring reconstruction and/or m 2 line (s) of left neighboring reconstruction can be used for boundary matching.
An example boundary matching cost is calculated with n = 2, m = 2, n 2 = 2, m 2 = 2, according to the following:
Figure PCTCN2023071007-appb-000001
In this example, two lines of predictor samples and two lines of reconstructed neighboring samples along the top and left boundaries are used to compute the difference measure (or similarity measure) . The weights (a, b, c, d, e, f, g, h, i, j, k, l) can be any positive integers, e.g., a = 2, b = 1, c = 1, d = 2, e = 1, f = 1, g = 2, h =1, i = 1, j = 2, k = 1, l = 1. Another example of calculating a boundary matching cost with n = 2, m = 2, n 2 = 1, m 2 = 1:
Figure PCTCN2023071007-appb-000002
where the weights (a, b, c, g, h, i) can be any positive integers e.g., a = 2, b = 1, c = 1, g = 2, h = 1, i = 1. Another example of calculating a boundary matching cost with n = 1, m = 1, n 2 = 2, m 2 = 2:
Figure PCTCN2023071007-appb-000003
where the weights (d, e, f, j, k, l) can be any positive integers, e.g., d = 2, e = 1, f = 1, j = 2, k = 1, l = 1. Another example of calculating a boundary matching cost with n = 1, m = 1, n 2 = 1, m 2 = 1:
Figure PCTCN2023071007-appb-000004
where the weights (a, c, g, i) can be any positive integers, e.g., a = 1, c = 1, g = 1, i = 1. Another example of calculating a boundary matching cost with n = 2, m = 1, n 2 = 2, m 2 = 1:
Figure PCTCN2023071007-appb-000005
where the weights (a, b, c, d, e, f, g, i) can be any positive integers, e.g., a = 2, b = 1, c = 1, d = 2, e = 1, f = 1, g = 1, i = 1. Another example of calculating a boundary matching cost with n = 1, m = 2, n 2 = 1, m 2 = 2:
Figure PCTCN2023071007-appb-000006
where the weights (a, c, g, h, i, j, k, l) can be any positive integers, e.g., a = 1, c = 1, g = 2, h = 1, i = 1, j = 2, k = 1, l = 1. To generalize, n or n 2 can be any positive integer such as 1, 2, 3, 4, etc., and m or m 2 can be any positive integer such as 1, 2, 3, 4, etc. In some embodiments, n and/or m vary with block width, height, or area. For example, for a larger block (area > threshold) , m becomes larger, threshold may be 64, 128, or 256, and when area > threshold, m may be increased to 2 from 1, or increased to 4 from 1 or 2. For another example, for a taller block (height > threshold *width) , m becomes larger and/or n becomes smaller, threshold may be 1, 2, or 4; when height > threshold *width, m maybe increased to 2 from 1, or increased to 4 from 1 or 2. For another example, for a larger block (area > threshold) , n becomes larger, threshold maybe 64, 128, or 256; when area > threshold, n is increased to 2 from 1, or increased to 4 from 1 or 2. For another example, for a wider block (width > threshold *height) , n becomes larger and/or m becomes smaller, threshold = 1, 2, or 4; when width > threshold *height, n is increased to 2 from 1; when width > threshold *height, n is increased to 4 from 1 or 2.
For another example, n and/or m can be defined in a video coding standard or depend on the signaling or parsing from the coded video syntax at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, sps level, and/or pps level.
In some embodiments, when the current block is located at the top boundary within a CTU row, top boundary matching is not used and/or only left boundary matching is used. (The neighboring reconstructed samples at cross CTU rows are not used. ) In some embodiment, when the current block is located at the left boundary within a CTU, left boundary matching is not used and/or only top boundary matching is used. In some embodiments, when the current block is taller (height > threshold *width) , only left boundary matching is used. In some embodiments, when the current block is wider (width > threshold *height) , only top boundary  matching is used.
In some embodiments, top-left neighboring reconstructed samples (reco -1, 1) can be used for boundary matching. For example, the boundary matching cost is added with the following term:
|reco -1, -1-pred 0, 0|
In some embodiments, when calculating boundary matching cost, the current prediction (or predictor of the current block) can be further added with residuals to reconstruct the samples of the current block, and the reconstructed samples of the current block are used for calculating boundary matching cost. For example, the residuals are generated by recovering the DC and/or all AC coefficients or any subset of AC coefficients after transform process. The transform process can use any pre-defined transform kernel for secondary transform and/or primary transform. For example, the transform kernel for secondary transform refers to Low Frequency Non-Separable Transform (LFNST) transform kernel. For example, the transform kernel for primary transform refers to DCT2 and/or any transform kernel for Multiple Transform Selection (MTS) such as DST7. For example, the transform kernel refers to the real transform applied in the transform module for the current block.
II. Merge Candidates as Candidate Modes
Some embodiments of the disclosure provide a scheme to reorder and/or reduce merge candidates. Instead of having the index of the best merge candidate referring to the order of merge candidates in the merge candidate list, in some embodiments, the index (index_best_merge) of the best merge candidate refers to the priority order based on boundary matching costs. In some embodiments, only the first k merge candidates (with higher priorities) can be the candidate modes.
For example, for a merge candidate list {cand0, cand1, cand2, cand3, cand4, cand5, …} , the boundary matching costs of the merge candidate are calculated as {cost_cand0, cost_cand1, cost_cand2, …} , such that cost_cand0 refers to the boundary matching cost for cand0, cost_cand1 refers to the boundary matching cost for cand1, cost_cand2 refers to the boundary matching cost for cand2, etc. The video coder then reorders {cand0, cand1, cand2, …} based on boundary matching costs. For example, if cost_cand0 > cost_cand1 >cost_cand2 > cost_cand3 > cost_cand4 > cost_cand5, the reordered merge candidates are formed as {cand5, cand4, cand3, cand2, cand1, cand0} , and index_best_merge 0 may refer to cand5 (The merge candidate with the smallest cost is signaled with the shortest codewords. ) , index_best_merge 1 may refer to cand4, index_best_merge 2 may refer to cand3, etc.
In some embodiments, the candidate mode set includes only the first k merge candidates (according to ordering by boundary matching costs) , rather than determined by a constant (e.g., MaxNumMergeCand in VVC, which may be 6) . For example, in some embodiments, k may be 4, and the signaling of each merge candidate may include: index_best_merge 0 referring to cand5 having codewords 0 (the merge candidate with the largest cost is signaled with the shortest codeword) ; index_best_merge 1 referring to cand4 having codewords 10; index_best_merge 2 referring to cand3 having codewords 110; index_best_merge 3 referring to cand2 having codewords 111. Otherwise, for example, if cost_cand0 < cost_cand1 < cost_cand2 <cost_cand3 < cost_cand4 < cost_cand5, the order of the merge candidates may be the same as the original list without reordering: index_best_merge 0 referring to cand0 having codeword 0; index_best_merge 1 referring to cand1 having codeword 10; index_best_merge 2 referring to cand2 having codeword 110; index_best_merge 3 referring to cand3 having codeword 111, etc.
In some embodiments, the merge candidate with the largest cost is signaled with the shortest codewords. For another example, if cost_cand0 < cost_cand1 < cost_cand2 < cost_cand3 < cost_cand4 < cost_cand5, the reordered merge candidates are formed as {cand5, cand4, cand3, cand2, cand1, cand0} , such that index_best_merge 0 refers to cand5 having codeword 0 (largest cost having the shortest codeword) ; index_best_merge 1 refers to cand4 having codeword 10; index_best_merge 2 refers to cand3 having code word 110, etc.
In some sub-embodiments, whether to use the reduced and reordered merge candidate list (reordered based on boundary matching cost and limited to k candidates) or the original merge candidate list (without reordering) is determined based on a predefined rule (e.g. implicitly depending on block width, block height,  or block area, or explicitly depending on one or more flags at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, SPS level, and/or PPS level. ) . When the reduced and reordered merge candidate list is used, the entropy coding contexts for signaling may be different than using the original merge candidate list.
In some embodiments, for the index_best_merge variable, a smaller value is coded with a shorter length of codewords. The index_best_merge variable may be coded with truncated unary codewords.
In some embodiments, reordering is applied to only a subset of merge candidate list. For example, the subset may refer to the original first n candidates such as cand0, cand1, and cand2, such that index_best_merge 0/1/2 refers to the priority order based on boundary matching and index_best_merge 3/4/5 refers to original cand 3/4/5. For another example, the subset refers to the original last n candidates such as cand3, cand4, and cand5. Then, index_best_merge 3/4/5 refers to the priority order based on boundary matching and index_best_merge 0/1/2 refers to original cand 0/1/2. For another example, the subset may refer to spatial merge candidates.
In some embodiments, the best merge candidate is inferred to be the merge candidate with the smallest boundary matching cost among all merge candidates. In some embodiments, the best merge candidate is inferred to be the merge candidate with the largest boundary matching cost among all merge candidates. In some of these embodiments, and index_best_merge is not signaled/parsed by the encoder/decoder and can be inferred as 0.
In some embodiments, the merge candidates are split into several groups. A boundary matching cost is calculated for each group. A promising group is implicitly identified as the group with highest priority. If more than one merge candidates are included in the identified promising group, a reduced merge index is signaled/parsed to indicate a merge candidate from the promising group. Since the number of merge candidates in each group is less than the number of merge candidates in the original merge candidate list, the reduced merge index would take less codewords than the original merge index.
In some embodiments, if k groups are used and the number of merge candidates in the original merge candidate list is N, the number of merge candidates in each group is “N/k” and the grouping rule depends on the merge index and/or merge type. For example, the merge candidates with the same value of “merge index %k” (remainder from modulo operator) are in the same group. For another example, the merge candidates with the same value of “merge index /k” (quotient from division operator) are in the same group. For another example, spatial merge candidates are in the same group. For another example, k varies with block width, block height, and/or block area. For another example, the merge candidates with small motion difference are in the same group. The motion difference includes mv difference and/or reference picture difference. An example of calculating mv difference (denoted as mv_diff) between candidate 0 and candidate 1 is:
mv_diff = | (mvx_cand0 –mvx_cand1) | + | (mvy_cand0 –mvy_cand1) |
Motion difference is small if the reference pictures are the same and/or mv difference is smaller than a pre-defined threshold.
In some embodiments, when a group contains more than one merge candidates, the cost for this group (the group’s representative cost) is the average cost of the costs from all merge candidates in this group. In some sub-embodiments, when a group contains more than one merge candidates, the representative cost for this group is the mean, maximum or minimum cost of the costs from all merge candidates in this group.
In some embodiments, the merge candidates are partitioned into several subsets, and the selection of a subset is explicitly signaled to the decoder. Furthermore, the selection of a merge candidate in the subset of merge candidates is made by using boundary matching costs. The subset partitioning rule can depend on the merge index and/or merge type. For example, the merge candidates with the same value of “merge index %k” are in the same subset. For another example, the merge candidates with the same value of merge index /k are in the same subset (k may vary with block width, block height, and/or block area. ) For another example, spatial merge candidates are in different subsets. For another example, the merge candidates with large motion difference are in the same subset. The motion difference includes mv difference and/or reference picture difference. An example of calculating mv difference (denoted as mv_diff) between candidate 0 and candidate  1 is:
mv_diff = | (mvx_cand0 –mvx_cand1) | + | (mvy_cand0 –mvy_cand1) |
Motion difference is large if the reference pictures are different and/or mv difference is larger than a pre-defined threshold.
In some embodiments, the merge candidates of a CU including one or more following candidates: (1) Spatial merge candidates or Spatial MVP from spatial neighbour CUs, (2) Temporal MVP from collocated CUs, (3) History-based MVP from a FIFO table, (4) Pairwise average MVP, and/or (5) Zero MVs.
In some embodiments, the merge candidates in this section refers to the merge candidates for combined inter and intra prediction (CIIP) . The predicted samples within the current block are generated as CIIP process. In some embodiments, the merge candidates in this section refers to the merge candidates for subblock merging candidates such as affine merge candidates. The predicted samples within the current block are generated by an affine process, e.g., block-based affine transform motion compensation prediction.
a. Spatial Merge Candidates
A maximum of four merge candidates are selected among candidates located in the positions around the CU as shown in FIG. 2, which shows positions of spatial merge candidates. The order of derivation is B 0, A 0, B 1, A 1 and B 2. Position B 2 is considered only when one or more than one CUs of position B 0, A 0, B 1, A 1 are not available (e.g., because it belongs to another slice or tile) or is intra coded. After candidate at position A 1 is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. A candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information. Only the following pairs are considered: (A 1, B 1) , (A 1, A 0) , (A 1, B 2) , (B 1, B 0) , (B 1, B 2
b. Temporal Merge Candidates
Only one temporal merge candidate is added to the merge candidate list. Particularly, in the derivation of this temporal merge candidate, a scaled motion vector is derived based on co-located CU belonging to the collocated reference picture. The reference picture list and the reference index to be used for derivation of the co-located CU is explicitly signaled in the slice header. The scaled motion vector for temporal merge candidate is obtained as illustrated by the dotted line in FIG. 3, which illustrates motion vector scaling for temporal merge candidate. The scaled motion vector is scaled from the motion vector of the co-located CU using the picture order count (POC) distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero. The position for the temporal candidate is selected between candidates C 0 and C 1, as depicted FIG. 4, which illustrates candidate positions for the temporal merge candidate for a current block. If a CU at position C 0 is not available, is intra coded, or is outside of the current row of CTUs, position C 1 is used for the temporal merge candidate. Otherwise, position C 0 is used in the derivation of the temporal merge candidate.
c. History-Based Merge Candidates
The history-based MVP (HMVP) merge candidates are added to merge list after the spatial MVP and TMVP. In this method, the motion information of a previously coded block is stored in a table and used as MVP for the current CU. The table with multiple HMVP candidates is maintained during the encoding/decoding process. The table is reset (emptied) when a new CTU row is encountered. Whenever there is a non-subblock inter-coded CU, the associated motion information is added to the last entry of the table as a new HMVP candidate.
The HMVP table size S is set to be 6, which indicates up to 5 History-based MVP (HMVP) candidates may be added to the table. When inserting a new motion candidate to the table, a constrained first-in-first-out (FIFO) rule is utilized wherein redundancy check is firstly applied to find whether there is an identical HMVP in the table. If found, the identical HMVP is removed from the table and all the HMVP candidates afterwards  are moved forward, and the identical HMVP is inserted to the last entry of the table.
HMVP candidates could be used in the merge candidate list construction process. The latest several HMVP candidates in the table are checked in order and inserted to the candidate list after the TMVP candidate. Redundancy check is applied on the HMVP candidates to the spatial or temporal merge candidate.
In some embodiments, to reduce the number of redundancy check operations, the last two entries in the table are redundancy checked to A 1 and B 1 spatial candidates, respectively. Once the total number of available merge candidates reaches the maximally allowed merge candidates minus 1, the merge candidate list construction process from HMVP is terminated.
d. Pair-wise average merge candidates
Pairwise average candidates are generated by averaging predefined pairs of candidates in the existing merge candidate list, using the first two merge candidates. Let the first merge candidate is defined as p0Cand and the second merge candidate can be defined as p1Cand, respectively. The averaged motion vectors are calculated according to the availability of the motion vector of p0Cand and p1Cand separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures, and its reference picture is set to the one of (or for) p0Cand and p1Cand (e.g. set to the one for p0Cand or set to the one for p1Cand) "; if only one motion vector is available, use the one directly; if no motion vector is available, keep this list invalid. Also, if the half-pel interpolation filter indices of p0Cand and p1Cand are different, it is set to 0. In some embodiments, when the merge list is not full, after pair-wise average merge candidates are added, the zero MVPs are inserted in the end until the maximum merge candidate number is encountered.
e.Combined inter and intra prediction (CIIP) 
When a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64) , and if both CU width and CU height are less than 128 luma samples, an additional flag maybe signaled to indicate if the combined inter/intra prediction (CIIP) mode is applied to the current CU. The CIIP prediction combines an inter prediction signal with an intra prediction signal. The inter prediction signal in the CIIP mode P inter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal P intra is derived following the regular intra prediction process with the planar mode or the one or more intra prediction modes derived from a pre-defined mechanism. For example, in the following, the pre-defined mechanism is based on the neighboring reference regions (template) of the current block. The intra prediction mode of a CU is implicitly derived by a neighboring template at both encoder and decoder, instead of being signalled as the exact intra prediction mode bits to the decoder. The prediction samples of the template are generated using the reference samples of the template for each candidate mode. A cost is calculated as the SATD between the prediction and the reconstruction samples of the template. The intra prediction mode with the minimum cost and/or some intra prediction modes with the smaller costs are selected and used for intra prediction of the CU. The candidate modes may be all MPMs and/or any subset of MPMs, 67 intra prediction modes as in VVC or extended to 131 intra prediction modes. The intra and inter prediction signals are combined using weighted averaging, where the weight value is calculated depending on the coding modes of the top and left neighbouring blocks. The CIIP prediction P CIIP is formed as follows: (wt is the weight value)
P CIIP= ( (4-wt) *P inter+wt*P intra+2) >>2
f. Affine Merge Candidates
An object in a video may have different types of motion, including translation motions, zoom in/out motions, rotation motions, perspective motions and the other irregular motions. In some embodiments, a block-based affine transform motion compensation prediction is used to account for these various types of motion. VVC provides a block-based affine transform motion compensation prediction. Specifically, the affine motion field of the current block can be described by motion information of two control points (at e.g., top-right and top-left corners of the block) (4-parameter) or motion information of three control points (at e.g., top-right,  top-left, and bottom-left corners of the block) (6-parameter) . For 4-parameter affine motion model, motion vector at sample location (x, y) in a block is derived as:
Figure PCTCN2023071007-appb-000007
For 6-parameter affine motion model, motion vector at sample location (x, y) in a block is derived as:
Figure PCTCN2023071007-appb-000008
Where (mv 0x, mv 0y) is motion vector of the top-left corner control point, (mv 1x, mv 1y) is motion vector of the top-right corner control point, and (mv 2x, mv 2y) is motion vector of the bottom-left corner control point.
Affine merge mode, or AF_MERGE mode can be applied for CUs with both width and height larger than or equal to 8. In this mode, the motion vectors at the control points (CPMVs) of the current CU are generated based on the motion information of the spatial neighboring CUs. There can be up to five CPMVP candidates and an index is signalled to indicate the one to be used for the current CU. The following three types of CPMV candidates are used to form the affine merge candidate list: (1) inherited affine merge candidates that extrapolated from the CPMVs of the neighbour CUs; (2) constructed affine merge candidates CPMVPs that are derived using the translational MVs of the neighbour CUs; (3) zero MVs.
In VVC, there are maximum two inherited affine candidates, which are derived from affine motion model of the neighboring blocks, one from left neighboring CUs (left predictor) and one from above neighboring CUs (above predictor) . For the left predictor, the scan order is A0->A1, and for the above predictor, the scan order is B0->B1->B2. Only the first inherited candidate from each side is selected. No pruning check is performed between two inherited candidates. When a neighboring affine CU is identified, its control point motion vectors are used to derive the CPMVP candidate in the affine merge list of the current CU. When the neighbouring affine CU is coded in 6-parameter affine mode, the three CPMVs of the current CU are calculated according to the motion vectors of the top-left, top-right, and left-bottom corners of the neighbouring affine CU.When the neighboring affine CU is coded with 4-parameter affine model, the two CPMVs of the current CU are calculated according to motion vectors of the top-left and top-right corners of the neighbouring affine CU.
A constructed affine candidate is constructed by combining the neighbor translational motion information of each control point. The motion information for the control points is derived from the spatial neighbors (A0, A1, A2, B0, B1, B2, B3) and temporal neighbors of the current block. CPMV k (k=1, 2, 3, 4) represents the MV of the k-th control point. For CPMV 1, the B2->B3->A2 blocks are checked and the MV of the first available block is used. For CPMV 2, the B1->B0 blocks are checked and for CPMV 3, the A1->A0 blocks are checked. For TMVP is used as CPMV 4 if it’s available.
After MVs of four control points are attained, affine merge candidates are constructed based on those motion information. The following combinations of control point MVs are used to construct in order: {CPMV 1, CPMV 2, CPMV 3} , {CPMV 1, CPMV 2, CPMV 4} , {CPMV 1, CPMV 3, CPMV 4} , {CPMV 2, CPMV 3, CPMV 4} , {CPMV 1, CPMV 2} , {CPMV 1, CPMV 3} . The combination of 3 CPMVs constructs a 6-parameter affine merge candidate and the combination of 2 CPMVs constructs a 4-parameter affine merge candidate. To avoid motion scaling process, if the reference indices of control points are different, the related combination of control point MVs is discarded. After inherited affine merge candidates and constructed affine merge candidate are checked, if the candidate list is still not full, zero MVs are inserted to the end of the list.
III. MMVD as candidate modes
Merge Mode with Motion Vector Difference (MMVD) is a new coding tool for the Versatile Video Coding (VVC) standard. Unlike regular merge mode in which the implicitly derived motion information is directly used for prediction samples generation of the current CU, in MMVD, the derived motion information is further refined by a motion vector difference MVD. MMVD also extends the list of candidates for merge mode by  adding additional MMVD candidates based on predefined offsets (also referred to as MMVD offsets) .
A MMVD flag may be signaled after sending a skip flag and merge flag to specify whether MMVD mode is used for a CU. If MMVD mode is used, a selected merge candidate is refined by MVD information. The MVD information also include a merge candidate flag, a distance index to specify motion magnitude, and an index for indication of motion direction. The merge candidate flag is signaled to specify which of the first two merge candidates is to be used as a starting MV.
The distance index is used to specify motion magnitude information by indicating a pre-defined offset from the starting MV. The offset may be added to either horizontal component or vertical component of the starting MV. An example mapping from the distance index to the pre-defined offset is specified in Table lII-1 below:
Table lII-1. Distance Index
Figure PCTCN2023071007-appb-000009
The direction index represents the direction of the MVD relative to the starting point. The direction index can represent one of the four directions as shown in Table lII-2.
Table lII-2 Sign of MV offset specified by direction index
Direction Index 00 01 10 11
x-axis + N/A N/A
y-axis N/A N/A +
It’s noted that the meaning of MVD sign may vary according to the information of the starting MV. When the starting MV is an un-prediction MV or a bi-prediction MV with both lists pointing to the same side of the current picture (i.e., picture order counts or POCs, of the two reference pictures are both larger than the POC of the current picture, or are both smaller than the POC of the current picture) , the sign in Table lII-2 specifies the sign of MV offset added to the starting MV. When the starting MVs is bi-prediction MVs with the two MVs point to the different sides of the current picture (i.e. the POC of one reference is larger than the POC of the current picture, and the POC of the other reference is smaller than the POC of the current picture) , each sign in Table lII-2 specifies the sign of the MV offset added to the list0 MV component of starting MV, and the sign for the list1 MV has opposite value. In some embodiments, a predefined offset (MmvdOffset) of a MMVD candidate is derived from or expressed as a distance value (MmvdDistance) and a directional sign (MmvdSign or MmvdDirection)
FIG. 5 conceptually illustrates MMVD candidates and their corresponding offsets. The figure illustrates a merge candidate 510 as the starting MV and several MMVD candidates in the vertical direction and the in the horizontal direction. Each of the MMVD candidate is derived by applying an offset to the starting MV 510. For example, the MMVD candidate 522 is derived by adding offset of 2 to the horizontal component of the merge candidate 510, and the MMVD candidate 524 is derived by adding offset -1 to the vertical component to the merge candidate 510. MMVD candidates with offsets in the horizontal direction, such as the MMVD candidate 522, are referred to as horizontal MMVD candidates. MMVD candidates with offsets in the vertical direction, such as the MMVD candidate 524, are referred to as vertical MMVD candidates. 
In some embodiments, the candidate modes reduce/reordering scheme is used to reorder MMVD candidates to improve the certain syntax elements. The following is the syntax table of MMVD in VVC standard. 
if (mmvd_merge_flag [x0] [y0] = = 1) {  
if (MaxNumMergeCand > 1)   
mmvd_cand_flag [x0] [y0]  ae (v) 
mmvd_distance_idx [x0] [y0]  ae (v) 
mmvd_direction_idx [x0] [y0]  ae (v) 
The syntax element mmvd_cand_flag [x0] [y0] specifies whether the first (0) or the second (1) candidate in the merging candidate list is used with the motion vector difference derived from the syntax elements mmvd_distance_idx [x0] [y0] and mmvd_direction_idx [x0] [y0] . The array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture. When mmvd_cand_flag [x0] [y0] is not present, it is inferred to be equal to 0. 
The syntax element mmvd_distance_idx [x0] [y0] specifies the index used to derive the variable MmvdDistance [x0] [y0] as specified below. The array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture. 
Figure PCTCN2023071007-appb-000010
Syntax element mmvd_direction_idx [x0] [y0] specifies index used to derive the variable MmvdSign [x0] [y0] as specified below. The array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture. 
mmvd_direction_idx [x0] [y0]  MmvdSign [x0] [y0] [0]  MmvdSign [x0] [y0] [1] 
0 +1 0
1 -1 0
2 0 +1
3 0 -1
In some embodiments, the syntax element mmvd_cand_flag can be improved by using reduced/reordered candidate mode list. Specifically, boundary matching costs is calculated for each MMVD mode, including MMVD mode 0, in which MMVD with the first candidate is in the merging candidate list; and MMVD mode 1, in which MMVD with the second candidate is in the merging candidate list. In some embodiments, after reordering according to the costs, if cost for MMVD mode 0 > cost for MMVD mode 1, mmvd_cand_flag being equal to 0 refers to MMVD mode 1 and mmvd_cand_flag equal to 1 refers to MMVD mode 0. Alternatively, mmvd_cand_flag is implicit and the first or second candidate (which has a smallest cost) in the merging candidate list is used for MMVD.
In some embodiments, after reordering according to the boundary matching costs, if cost for MMVD mode 0 < cost for MMVD mode 1, then mmvd_cand_flag being equal to 0 refers to MMVD mode 1 and mmvd_cand_flag being equal to 1 refers to MMVD mode 0. Alternatively, mmvd_cand_flag is implicit and the first or second candidate (which has a largest cost) in the merging candidate list is used for MMVD.
In some embodiments, the signaling of syntax elements mmvd_cand_flag, mmvd_distance_idx, and mmvd_direction_idx can be improved by using reduced/reordered candidate mode list. A joint indication (denoted as MMVD_joint_idx) is signaled/parsed/assigned to specify the selected combination of mmvd_cand_flag, mmvd_distance_idx, and mmvd_direction_idx. MMVD_joint_idx is ranging from 0 to the  value (e.g., 63) calculated by “number of MMVD candidates” (e.g., 2) * “number of MMVD distances” (e.g., 8) * “number of MMVD directions” (e.g., 4) minus one.
In some embodiments, boundary matching cost is calculated for each MMVD combination. After reordering of the different MMVD combinations according to the costs, MMVD_joint_idx may be used to select a MMVD combination based on the reordering. For example, in some embodiments, if the cost for MMVD combination 0 > cost for MMVD combination 1 > …, MMVD_joint_idx being equal to 0 refers to MMVD combination 63 and MMVD_joint_idx being equal to 63 refers to MMVD combination 0. In some embodiments, MMVD_joint_idx is implicit and the MMVD combination that has a smallest cost is used for MMVD. In some embodiments, the number of MMVD combinations is reduced and the MMVD combinations with smaller costs are kept in the candidate mode set. The codewords for signaling/parsing MMVD_joint_idx are thereby reduced.
In some embodiment, after the reordering of the different MMVD combinations according to the costs, MMVD_joint_idx may be used to select a MMVD combination based on the reordering. In some embodiments, if the cost for MMVD combination 0 < cost for MMVD combination 1 < …, then MMVD_joint_idx being equal to 0 refers to MMVD combination 63 and MMVD_joint_idx being equal to 63 refers to MMVD combination 0. In some embodiments, MMVD_joint_idx is implicit and the MMVD combination with the largest cost is used for MMVD. In some embodiments, the number of MMVD combinations is reduced and the MMVD combinations with larger costs are kept in the candidate mode set. The codewords for signaling/parsing MMVD_joint_idx are thereby reduced. Similar method can be used to improve the signaling of mmvd_distance_idx, and/or mmvd_direction_idx.
IV. BCW weights as Candidate Modes
Bi-prediction with CU-level Weight (BCW) is a coding tool that is used to enhance bidirectional prediction. BCW allows applying different weights to L0 prediction and L1 prediction before combining them to produce the bi-prediction for the CU. For a CU to be coded by BCW, one weighting parameter w is signaled for both L0 and L1 prediction, such that the bi-prediction result Pbi-pred is computed based on w according to the following:
P bi-pred= ( (8-w) *P 0+w*P 1+4) >>3
P 0 represents pixel values predicted by L0 MV (or L0 prediction) . P 1 represents pixel values predicted by L1 MV (or L1 prediction) . P bi-pred is the weighted average of P 0 and P 1 according to w. For low delay pictures, i.e., pictures using reference frames with small picture order counts (POCs) , the possible values for w include {-2, 3, 4, 5, 10} , these are also referred to as BCW candidate weights. For non-low-delay pictures, the possible values for w (BCW candidate weights) include {3, 4, 5} . In some embodiments, to find a best w for coding the current CU, instead of searching all possible values of w for all candidate bi-prediction MV positions, the LC-RDO stage may employ an interleaving search pattern for finding the best value for the BCW weighting parameter w. More weights can be supported as follows. For example, for merge mode, weights are extended from {-2, 3, 4, 5, 10} to {-4, -3, -2, -1, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12} or any subset of above. When negative bi-predicted weights are not supported, weights for merge mode are extended from {-2, 3, 4, 5, 10} to {1, 2, 3, 4, 5, 6, 7} . In addition, the negative bi-predicted weights for non-merge mode are replaced with positive weights, that is, the weights {-2, 10} is replaced with {1, 7} .
In some embodiments, the proposed candidate modes reduce/reordering scheme is used to reorder BCW candidates to improve the signaling of certain syntax elements, e.g., bcw_idx. The following is the syntax table of BCW in VVC standard.
Figure PCTCN2023071007-appb-000011
The syntax element bcw_idx [x0] [y0] specifies the weight index of bi-prediction with CU weights. The array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture. For each bi-predicted CU, the weight w is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; 2) for a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index.
Figure PCTCN2023071007-appb-000012
To reduce and reorder BCW candidates, the boundary matching cost of each BCW candidate weights is computed, and the different BCW candidate weights are reordered according to the costs. In one embodiment, the cost for a pre-defined candidate can be reduced. For example, the pre-defined candidate refers to the candidate inferred from one or more neighbouring blocks. In another embodiment, the candidate weights include (only) the two neighboring bi-predicted weights (i.e. ±1) and the inherited (inferred) bi-predicted weight. In some embodiments, the bcw_idx being equal to 0 refers to the BCW candidate weight with the smallest cost and bcw_idx being equal to 4 refers to the BCW candidate weight with the largest cost. In some embodiments, bcw_idx is implicit and the BCW candidate weight having a smallest cost is used. In some embodiments, the bcw_idx being equal to 0 refers to the BCW candidate weight with the largest cost and bcw_idx being equal to 4 refers to the BCW candidate weight with the smallest cost. In some embodiments, bcw_idx is implicit and the BCW candidate weight having a largest cost is used.
In some embodiments, only the first k BCW weights (with higher priorities because of costs) can be the candidate weights (the original number of BCW weights is 5) in a candidate mode set. In some embodiments, the number of BCW weights in the candidate mode set is k (e.g., 3) , and the signaling of each BCW weight is shown below: bcw_idx 0 refers to the BCW candidate with highest priority using codewords 0; bcw_idx 1 refers to the BCW candidate with second highest priority using codewords 10; bcw_idx 2 refers to the BCW candidate with third highest priority using codewords 11.
In some sub-embodiments, whether to use the reduced and reordered BCW candidates (reordered based on boundary matching cost and limited to k candidates) or the original BCW candidates (without reordering) is determined based on a predefined rule (e.g. implicitly depending on block width, block height, or block area, or explicitly depending on one or more flags at CU/CB, PU/PB, TU/TB, CTU/CTB, tile, slice level, picture level, SPS level, and/or PPS level. ) . When the reduced and reordered merge candidate list is used, the entropy coding contexts for signaling may be different from using the original merge candidate list.
In some embodiments, the BCW weights are split into several groups. Next, a boundary matching cost is calculated for each group. Then, a promising group is implicitly selected to be the group with highest priority. If more than one BCW weights are included in the promising group, a reduced BCW weight index is signaled/parsed to indicate a BCW weight from the promising group. Since the number of BCW weights in  each group is less than the number of BCW weights in the original BCW weight set, the reduced BCW weight index may take less codewords than the original BCW weight index.
In some embodiments, if k groups are used and the number of BCW weights in the original BCW weight set is N, the number of BCW weights in each group is “N/k” and the grouping rule depends on the BCW weight index and/or BCW weight values. For example, the BCW weights with the same value of “BCW weight index %k” are in the same group. For another example, the BCW weights with the same value of “BCW weight index /k” are in the same group. For another example, BCW weight with similar values are in the same group (e.g., the BCW weight with negative values are in one group and/or the BCW weight with positive values are in another group) . In some embodiments, k (the number of candidates in the group) varies with block width, block height, and/or block area.
In some embodiments, when a group contains more than one BCW weights, the cost for this group is the average cost of the costs from all BCW weights in this group. In some embodiments, when a group contains more than one BCW weights, the cost for this group is the mean, maximum, or minimum cost of the costs from all BCW weights in this group. In some embodiments, the BCW weights are partitioned into several subsets, and the selection of a subset is explicitly signaled to the decoder. And the selection of the subset is made by using boundary matching costs.
In some embodiments, the BCW weights are partitioned into multiple subsets. In some embodiments, BCW weights having similar values are partitioned into different subsets such that the BCW weights of a same subset have large differences with each other. For example, in some embodiments, the weights having negative values may be partitioned into different group. For another example, in some embodiments, the weights with weight differences larger than a pre-defined threshold are partitioned into the same group. The pre-defined threshold can be fixed or explicitly decided. This may help to improve the hit rate of boundary matching.
V. Linear Models as Candidate Modes
Cross Component Linear Model (CCLM) or Linear Model (LM) mode is a cross component prediction mode in which chroma components of a block is predicted from the collocated reconstructed luma samples by linear models. The parameters (e.g., scale and offset) of the linear model are derived from already reconstructed luma and chroma samples that are adjacent to the block. For example, in VVC, the CCLM mode makes use of inter-channel dependencies to predict the chroma samples from reconstructed luma samples. This prediction is carried out using a linear model in the form of:
P (i, j) =α·rec′ L (i, j) +β
P (i, j) represents the predicted chroma samples in a CU (or the predicted chroma samples of the current CU) and rec′ L (i, j) represents the down-sampled reconstructed luma samples of the same CU (or the corresponding reconstructed luma samples of the current CU) .
The CCLM model parameters α (scaling parameter) and β (offset parameter) are derived based on at most four neighboring chroma samples and their corresponding down-sampled luma samples. In LM_Amode (also denoted as LM-T mode or INTRA_T_CCLM) , only the above or top-neighboring template is used to calculate the linear model coefficients. In LM_L mode (also denoted as LM-L mode or INTRA_L_CCLM) , only left template is used to calculate the linear model coefficients. In LM-LA mode (also denoted as LM-LT mode or INTRA_LT_CCLM) , both left and above templates are used to calculate the linear model coefficients. The proposed methods can apply to any cross-component mode and not be limited to applying to LM modes. For example, the cross-component mode is to use the information of the first color component (e.g. Y) to predict the second/third color components (e.g. Cb/Cr) .
In some embodiments, the proposed candidate modes reduce/reordering scheme is used to reorder LM candidates to improve the signaling of certain syntax, e.g., cclm_mode_idx. The following is the syntax table of LM in VVC standard.
if (cclm_mode_flag)   
cclm_mode_idx ae (v) 
The syntax element cclm_mode_idx specifies which one of the LM candidate modes (e.g., INTRA_LT_CCLM, INTRA_L_CCLM and INTRA_T_CCLM) is applied. For example, boundary matching costs are calculated for the different LM candidate modes. After reordering of the different LM canddiate modes according to the boundary matching costs, cclm_mode_idx may be used to select a candidate LM candidate modes based on the reordering. In some embodiments, the cclm_mode_idx being equal to 0 refers to the LM candidate mode with the smallest cost and cclm_mode_idx being equal to 2 refers to the LM candidate mode with the largest cost. In some embodiments, cclm_mode_idx is implicit and the LM candidate mode having a smallest cost is used.
In some embodiments, only the first k LM candidate modes (with higher priorities because of costs) can be the candidate LM modes (the original number of LM candidate modes is 3) . In some embodiments, k = 2 and the signaling of the LM candidate modes are as follows: cclm_mode_idx 0 refers to the candidate LM mode with smallest cost using codewords 0; cclm_mode_idx 1 refers to the candidate LM mode with second smallest cost using codewords 1.
In some embodiemnts, after reordering according to the costs, with k=2, the cclm_mode_idx being equal to 0 refers to the LM candidate mode with the largest cost and cclm_mode_idx being equal to 1 refers to the LM candidate mode with the smallest cost. In some embodiments, cclm_mode_idx is implicit and the LM candidate mode having a largest cost is used. In some of these embodiments, cclm_mode_idx 0 referring to the candidate LM mode with the largest cost using codewords 0; cclm_mode_idx 1 refers to the candidate LM mode with second largest cost using codewords 1.
In addition applying reduced/reordered candidate modes list to conventional CCLM modes, in which chroma components (Cr or Cb) are predicted from the luma compoenent, the reduced/reordered candidate modes list may also be used in other variations of CCLM. The variations of CCLM here mean that some optional modes can be selected when the block indication refers to using one of cross-component modes (e.g. CCLM_LT, MMLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, and/or an intra prediction mode, which is not one of traditional DC, planar, and angular modes) for the current block.
The following is an example of having convolutional cross-component mode (CCCM) as an optional mode. When this optional mode is applied to the current block, cross-component information with a model, including non-linear term, is used to generate the chroma prediction. The optional mode may follow the template selection of CCLM, so CCCM family includes CCCM_LT CCCM_L, and/or CCCM_T.
In some embodiments, the cost for a pre-defined candidate can be reduced. The pre-defined candidate may refer to the candidate inferred from one or more neighbouring blocks and/or a popular or common mode such as CCLM_LT, MMLM_LT, CCCM_LT. In some embodiments, the candidate modes include (only) the related popular modes and the popular mode. In one example, the related popular modes are CCCM_L and/or CCCM_T and the popular mode is CCCM_LT. For example, the related popular modes are CCLM_L and/or CCLM_T and the popular mode is CCLM_LT. In another example, the related popular modes are MMLM_L and/or MMLM_T and the popular mode is MMLM_LT. In another example, the popular mode indicates whether the used reference is from L, T, or LT. If the used reference is from LT, the related popoular modes are all or subset of variations of LT such as CCLM_LT, MMLM_LT, and CCCM_LT.
In some embodiments, the candidate modes correspond to linear models that are derived by using neighboring reconstructed samples of Cb and Cr as the inputs X and Y, (i.e., using reconstructed Cb samples to derive or predict Cr samples or vice versa) . In some embodiments, the candidate modes correspond to linear models that are derived from multiple collocated luma blocks for predicting or deriving chroma components. In some embodiments, a candidate mode in the reduced/reorder candidate list may correspond to a multi-model linear model (MMLM) mode, such that different linear models may be selected from multiple linear models for predicting chroma components of different regions or groups of pixels. Similar to CCLM, MMLM can have MMLM_LT, MMLM_L, and/or MMLM_T.
The LM candidate modes in the reduced/reordered candidate modes list may also include any LM extentions/variations, such that the number of candidate modes in the list may also increase. With increased  number of LM candidiate modes, the coding performance improvement from using the reduced/reordered candidate modes list based on boundary matching cost becomes more significant.
VI. MHP as candidate modes
For a block with multi-hypothesis prediction (MHP) applied, one or more hypotheses of prediction (prediciton signal) are combined with the existing hypothesis of prediction to form the final (resulting) prediction of the current block. MHP can be applied to many inter modes such as merge, subblock merge, inter AMVP and/or affine. The final prediction may be accumulated iteratively with each additional hypothesis of prediction signal. The example of accumulation is shown below:
p n+1= (1-α n+1) p nn+1h n+1
The resulting prediction signal is obtained as the last p n (i.e., the p n having the largest index n) . P 0 is the first (existing) prediction of the current block. For example, if MHP is applied to a merge candidate, p 0 is indicated with the existing merge index. The additional prediction is denoted as h and will be further combined with the previously-accumulated prediction by a weighting α. Therefore, for each additional hypothesis of the prediction, a weighting index is signaled/parsed to indicate the weighting and/or for each hypothesis of prediction a inter index is signaled/parsed to indicate the motion candidate (used for generate the predicted samples for this hypothesis) .
The proposed reordering scheme is required for making the signalling of MHP weights and/or the signalling of motion canidates more efficient. In some embodiment, boundary matching costs are calculated for each MHP candidate weight for each additional hypothesis of prediction. For example, assume that there are two candidate weights for a hypothesis of prediction:
Step 0: for an additional hypothesis of prediction, cost_w0 and cost_w1 are calculated as the costs of
the first and second candidate weights, respectively;
Step 1: for that hypothesis of prediction, the candidate weights are reordered depending on the costs.
In some sub-embodiments, the candidate weight with a smaller cost gets a higher priiority. If cost_w0 >cost_w1, the weight index 0 refers to w1 and the weight index 1 refers to w0. Otherwise, reordering is not used and the  weight index  0 and 1 refers to w0 and w1, as original. In some embodiments, the candidate weight with a larger cost gets a higher priiority. If cost_w0 < cost_w1, the weight index 0 refers to w1 and the weight index 1 refers to w0. Otherwise, reordering is not used and the  weight index  0 and 1 refers to w0 and w1, as original.
In some embodiments, the candidate weight with a smallest cost is used for the current additional hypothesis of prediction. The weight index for the current additional hypothesis of prediction is inferred in this case. In some embodiments, the weights for each hypothesis or any subset of hypotheses are implicitly set according to the costs. One possible way is that, the weights are scaled values of the costs or scale values of multiplicative inverse of the costs. (For example, if cost = 2, multiplicative inverse of the cost = 1/2. ) In some sub-embodiment, the candidate weight with a largest cost is used for the current additional hypothesis of prediction. The weight index for the current additional hypothesis of prediction is inferred in this case.
Steps  0 and 1 may be repeated for each additional hypothesis of prediction and get the meaning of each weight index for each additional hypothesis of prediction. In some embodiments, assume that there are t candidate weights for a hypothesis of prediction (t can be any positive integer. Take t = 5 in the following examples. )
Step 0: For an additional hypothesis of prediction, calculate the costs of each candidate weight,
respectively.
Step 1: For that hypothesis of prediction, the candidate weights are reordered depending on the costs.
For example, the candidate weight with a smaller cost gets a higher priority. For another example, the candidate weight with a larger cost gets a higher priority. Only the first k candidate weights (with higher priorities) can be the candidate weights. The candidate mode set includes only the first k candidate weights. In some embodiments, the number of candidate weights in the candidate mode set is k (e.g., 2) and the signaling  of each candidate weight is as follows: weight_idx 0 referring to cand with highest priority with codewords 0; weight_idx 1 referring to cand with second highest priority with codewords 1.  Steps  0 and 1 may be repeated for each additional hypothesis of prediction and get the meaning of each weight index for each additional hypothesis of prediction.
In some embodiment, the MHP weights are split into several groups. Next, a boundary matching cost is calculated for each group. Then, the promising group is implicitly decided to be the group with highest priority. If more than one MHP weights are included in the promising group, a reduced MHP weight index is signaled/parsed to indicate a MHP weight from the promising group. Since the number of MHP weights in each group is less than the number of MHP weights in the original MHP weight set, the reduced MHP weight index should take less codewords than the original MHP weight index.
In some embodiments, if k groups are used and the number of MHP weights in the original MHP weight set is N, the number of MHP weights in each group is “N/k” and the grouping rule depends on the MHP weight index and/or MHP weight values. For example, the MHP weights with the same value of “MHP weight index %k” are in the same group. For another example, the MHP weights with the same value of “MHP weight index /k” are in the same group. For another example, MHP weight with similar values are in the same group (e.g., the MHP weights with negative values are in one group and/or the MHP weight with positive values are in another group) For another example, k varies with block width, block height, and/or block area. In some embodiments, when a group contains more than one merge candidates, the cost for this group is the average cost of the costs from all merge candidates in this group. In some embodiments, when a group contains more than one merge candidates, the cost for this group is the mean, maximum or minimum cost of the costs from all merge candidates in this group.
In some embodiments, the MHP weights are partitioned into several subsets, and the subset selection is explicitly signaled to the decoder. And the selection in the subset is made by using boundary matching costs. In some embodiments, the weights with large different values with each other will be partitioned in the same subset. That is, the weights with similar values will be partitioned in the different subsets. This may help to improve the hit rate of boundary matching. For example, the weights with negative values are partitioned into different group. For another example, those weights with weight difference larger than a pre-defined threshold are partitioned into the same group. The pre-defined threshold can be fixed or explicitly decided.
In some embodiments, for each hypothesis, the current prediction (used in calculating boundary matching cost) for a candidate weight is the combined preidction (the weighted average of the prediction from a motion candidate and the previously accumulated prediction) . (The motion candidate is indicated by a signaled/parsed motion index. ) A example is shown in the following figure. When applying the propose methods to reorder the signalling of candidate weights for h2 and there are 4 candidate weights (including cand 0 to 3) for h2. The cost for cand n is calculated as the weighted average of previously accumulated prediction and h2’s prediction. (Weighting is from cand n. )
When applying reordering to the signalling of candidate weights for h1 and there are 4 candidate weights (including cand 0 to 3) : for h1, current prediction for candidate 0 = weighted average of prediction of p0, h1’s prediction, weighting is from cand 0; current prediction for candidate 1 = weighted average of prediction of p0, h1’s prediction, weighting is from cand 1.
In some embodiments, boundary matching costs are calculated for each MHP motion candidate. Take MHP for merge mode as an example in the following sub-embodiments. (MHP can be applied to other inter modes such as inter AMVP and/or affine and when inter AMVP or affine is used, “merge” in the following example is replaced with the naming of that inter mode. ) In some embodiments, the boundary matching costs for each motion candidate are calculated. For example, the candidate modes include cand 0 to 4. Originally, the index 0 (shorter codewords) refers to cand 0 and the index 4 (longer codewords) refers to cand 4. With the proposed method, the meaning of index follows the priority. If the priority order (based on the boundary matching costs) specifies that cand 4 has the highest priority. The index 0 is mapped to cand 4.
In some embodiments, the boundary matching costs for each motion candidate are calculated. For  example, the candidate modes include cand 0 to 4. Originally, the index 0 (shorter codewords) refers to cand 0 and the index 4 (longer codewords) refers to cand 4. With the proposed method, the meaning of index follows the priority. If the priority order (based on the boundary matching costs) specifies that cand 4 has the highest priority. The index is not signaled/parsed and the selected motion candidate is inferred as the one with highest priority.
In some embodiments, for each hypothesis, the current prediction (used in calculating boundary matching cost) for a motion candidate is the motion compensation result generated by that motion candidate.
An example of combining the above two sub-embodiments is shown below. For example, if the number of addition hypothesis is equal to 2, the first three motion candidates with the higher priority are used to form the resulting prediction of current MHP block. For another example, if the number of addition hypothesis is equal to 2, the existing hypothesis is kept as original, the two motion candidates with the higher priority are used to form the prediction of addtiional hypothese, and the resulting prediciton is formed by the existing and additional hypothesis.
In some embodiments, the current prediction (used in calculating boundary matching cost) for a motion candidate is the combined preidction (the weighted average of the prediction from a motion candidate and the existing prediction (p0) ) . (Weight is indicated by a signaled/parsed weight index. )
In some embodiments, for each hypothesis, the current prediction (used in calculating boundary matching cost) for a motion candidate is the combined preidction (the weighted average of the prediction from a motion candidate and the previously accumulated prediction) . (Weight is indicated by a signaled/parsed weight index. ) A example is shown in the following figure. When applying the propose methods to reorder the signalling of motion candidates for h2 and there are 4 motion candidates (including cand 0 to 3) for h2. The cost for cand n is calculated as the weighted average of p0’s prediction, h1’s prediction, and the prediction from cand n. In other words, for h2: current prediction for candidate 0 = weighted average of prediction of h1, prediction from candidate 0; current prediction for candidate 1 = weighted average of prediction of h1, prediction from candidate 1.
When applying the propose methods to reorder the signalling of motion candidates for h1 and there are 4 motion candidates (including cand 0 to 3) for h1: current prediction for candidate 0 = weighted average of prediction of p0, prediction from candidate 0; current prediction for candidate 1 = weighted average of prediction of p0, prediction from candidate 1.
In some embodiments, boundary matching costs are calculated for each MHP combination (motion candidate and weight) for each hypothesis of prediction. Take MHP for merge mode as an example in the following sub-embodiments. (MHP can be applied to other inter modes such as inter AMVP and/or affine and when inter AMVP or affine is used, “merge” in the following example is replaced with the naming of that inter mode. )
In some embodiments, one combination refers to a motion cahdiadte and a weight. If there are m motion canidate and n meights for each motion candiadte, the number of cmbinations is m*n. In some sub-embodiments, the current prediction (used in calculating boundary matching cost) for a combination is the combined preidction (the weighted average of the prediction from a motion candidate and the existing prediction (p 0) ) . With this method, the merge index and the weight for indicating the additional hypothese of prediction are jointly decided. For an example, the combination with the highest priority is the selected MHP combination. (No need to signal/parse the merge index and the weight for indicate the additional hypothese of prediction) . For another example, a joint index is signalled/parased to decide the MHP combination. The number of addition hypotheses can be fixed this embodiment.
In some embodiments, boundary matching costs are calculated for each MHP motion candidate. Take MHP for merge mode as an example. The merge index (which is used to inidcate a motion candidate for each hypothesis) maybe inferred. For example, depending on the order of merge canidates in the merge candidate list, merge candidate 0 for hypothesis 0, merge candidate 1 for hypothesis 1, etc. For another example, depending on the costs, merge candidate with a smaller cost is used first. For another example, depending on  the predefined number of merge canidates. If number of hypotheses is 4, use 4 merge candidates from the merge candidate list to be the motion candidate for each hypothesis. The first 4 merge candidates are used, and any 4 merge candidates from the merge caniddaite list are used. The weights (which are used to combine hypotheses of prediction) are implicit depending on the costs. MHP resulting prediction is formed by
(weight0) * (hypothesis0) + (weight1) * (hypothesis1) + (weight2) * (hypothesis2) +…
In some embodiments, a fixed number of hypotheses of prediction is used. That is, blend fixed number of hypotheses and use matching cost as weights implicitly. In some embodiments, the weight for the motion candidate (or can be said as for the hypothesis) with higher priority is larger than the weight for the motion candidate with lower priority. In some embodiments, the current prediction (used in calculating boundary matching cost) for a motion candidate is the motion compensation result generated by that motion candidate. In some embodiments, the first n motions candidates in the merge candidate list are used for generating the hypotheses of predictions. With this proposed method, no merge index are signalled for MHP. In some embodiments, all motions candidates in the merge candidate list are used for generating the hypotheses of predictions. (Weights can decide whether a motion candidate is usedfull. If its weight is zero, this motion candidate is accatually not used. ) With this proposed method, no merge index are signalled for MHP. In some embodiments, the weight for the motion candidate with higher priority is larger than the weight for the motion candidate with lower priority. For example, the weights follow the ratios of costs for different motions candidates. If there are two motion candiadtes and cost_cand0 = 2*cost_cand1, weight_cand0 = 2 *weight_cand1 or weight_cand0 = 1/2 *weight_cand1.
For another example, the cost for each motion candidate is normalized to an interval [MIN_VALUE, MAX_VALUE] first. The MAX_VALUE is pre-defined such as number of hypotheses of prediction. The MIN_VALUE is pre-defined such as 0. For example, (MAX_VALUE-the normalized costs) can be the weights for motion candidates or the normalized costs can be the weights for motion candidates.
For another example, the weights are scaled values of the costs or scaled values of multiplicative inverse of the costs. (For example, if cost = 2, multiplicative inverse of the cost = 1/2. ) Scaled value means a scaling factor *the original value. If scaling factor = 1, no scale.
In some embodiments, the weight and merge index are implicit with the proposed method. The generation of current prediction for this method can reference any other proposed methods in this invention. In some embodiments, the proposed scheme is applied to a subset of all additional hypotheses of prediction. (That is, the  above step  0 and 1 are repeated for the subset of all additional hypotheses of prediction. ) For example, only candiadte weights of the first additional hypothesis of prediction (which is combined with the existing hypothesis of prediction) are reordered with the proposed scheme. In some embodiments, the subset is pre-defined in the video coding standard. In embodiments, the subset depends on the current block width, height, or area. For example, for a block with the block area larger (or smaller) than a threshold, the subset includes more hypotheses of prediction.
In some embodiments, the reordering results from the subset can be reused for the remaining additional hypotheses of prediction. For example, based on the reordering results from the first hypothesis of prediction, the  weight index  0 and 1 refer to w1 and w0, respectively. For the following hypotheses of prediction, the  weight index  0 and 1 also refer to w1 and w0, respectively.
In some embodiments, the proposed scheme is applied to a subset of all candidate weights for an additional hypothesis of prediction. That is, the  above step  0 and 1 are repeated for the subset of all candidate weights for an additional hypothesis of prediction. Take the number of candidate weights (for an additional hypothesis of prediction) equal to 4 as an example. For an additional hypothesis of prediction, only the first (or last) two candiadte weights are reordered with the proposed scheme. In some embodiments, the subset is pre-defined in the video coding standard. In some embodiments, the subset depends on the current block width, height, or area. For example, for a block with the block area larger (or smaller) than a threshold, the subset includes more candidate weights.
In some embodiments, a hypothesis of prediction can be the prediction signal from an uni-prediction or bi-prediction motion compendation result. sThe proposed reordering scheme for different tools (not limited to those tools in the following example) can be unified. For example, the proposed reordering scheme for MHP, LM, BCW, MMVD, and/or merge candidates is unified with the same rule of calculating boundary matching costs.
The proposed methods in this invention can be enabled and/or disabled according to implicit rules (e.g. block width, height, or area) or according to explicit rules (e.g., based on syntax on block, tile, slice, picture, SPS, or PPS level) . For example, the proposed reordering is applied when the block area is smaller than a threshold. Any combination of the proposed methods in this invention can be applied.
Any of the foregoing proposed methods can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in an intra/inter coding module of an encoder, a motion compensation module, a merge candidate derivation module of a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the intra/inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder.
VII. Example Video Encoder
FIG. 6 illustrates an example video encoder 600 that may implement candidate coding mode selection based on boundary matching cost. As illustrated, the video encoder 600 receives input video signal from a video source 605 and encodes the signal into bitstream 695. The video encoder 600 has several components or modules for encoding the signal from the video source 605, at least including some components selected from a transform module 610, a quantization module 611, an inverse quantization module 614, an inverse transform module 615, an intra-picture estimation module 620, an intra-prediction module 625, a motion compensation module 630, a motion estimation module 635, an in-loop filter 645, a reconstructed picture buffer 650, a MV buffer 665, and a MV prediction module 675, and an entropy encoder 690. The motion compensation module 630 and the motion estimation module 635 are part of an inter-prediction module 640.
In some embodiments, the modules 610 –690 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 610 –690 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 610 –690 are illustrated as being separate modules, some of the modules can be combined into a single module.
The video source 605 provides a raw video signal that presents pixel data of each video frame without compression. A subtractor 608 computes the difference between the raw video pixel data of the video source 605 and the predicted pixel data 613 from the motion compensation module 630 or intra-prediction module 625. The transform module 610 converts the difference (or the residual pixel data or residual signal 608) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) . The quantization module 611 quantizes the transform coefficients into quantized data (or quantized coefficients) 612, which is encoded into the bitstream 695 by the entropy encoder 690.
The inverse quantization module 614 de-quantizes the quantized data (or quantized coefficients) 612 to obtain transform coefficients, and the inverse transform module 615 performs inverse transform on the transform coefficients to produce reconstructed residual 619. The reconstructed residual 619 is added with the predicted pixel data 613 to produce reconstructed pixel data 617. In some embodiments, the reconstructed pixel data 617 is temporarily stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction. The reconstructed pixels are filtered by the in-loop filter 645 and stored in the reconstructed picture buffer 650. In some embodiments, the reconstructed picture buffer 650 is a storage external to the video encoder 600. In some embodiments, the reconstructed picture buffer 650 is a storage internal to the video encoder 600.
The intra-picture estimation module 620 performs intra-prediction based on the reconstructed pixel data 617 to produce intra prediction data. The intra-prediction data is provided to the entropy encoder 690 to be encoded into bitstream 695. The intra-prediction data is also used by the intra-prediction module 625 to  produce the predicted pixel data 613.
The motion estimation module 635 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 650. These MVs are provided to the motion compensation module 630 to produce predicted pixel data.
Instead of encoding the complete actual MVs in the bitstream, the video encoder 600 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 695.
The MV prediction module 675 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 675 retrieves reference MVs from previous video frames from the MV buffer 665. The video encoder 600 stores the MVs generated for the current video frame in the MV buffer 665 as reference MVs for generating predicted MVs.
The MV prediction module 675 uses the reference MVs to create the predicted MVs. The predicted MVs can be computed by spatial MV prediction or temporal MV prediction. The difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 695 by the entropy encoder 690.
The entropy encoder 690 encodes various parameters and data into the bitstream 695 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding. The entropy encoder 690 encodes various header elements, flags, along with the quantized transform coefficients 612, and the residual motion data as syntax elements into the bitstream 695. The bitstream 695 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.
The in-loop filter 645 performs filtering or smoothing operations on the reconstructed pixel data 617 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering operation performed includes sample adaptive offset (SAO) . In some embodiment, the filtering operations include adaptive loop filter (ALF) .
FIG. 7 illustrates portions of the video encoder 600 that implement candidate coding mode selection based on boundary matching costs. Specifically, the figure illustrates the components of the inter-prediction module 640 of the video encoder 600 that calculate boundary matching costs of all or any subset of the different candidate coding modes and select a candidate coding mode based on group assignment and the computed costs.
The inter-prediction module 640 includes various boundary prediction modules 710. These boundary prediction modules 710 generates predictor samples 715 for the current block along the boundary of the current block for various candidate coding modes. The boundary predictor samples 715 of each candidate coding modes are compared with neighboring samples 725 of the current block (retrieved from the reconstructed picture buffer 650) by a boundary matching cost calculator 730 to compute the boundary matching cost of each of all or any subset of the candidate coding modes. Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
A group assignment and selection module 740 assigns the different candidate coding modes into different groups and select one of the groups. A candidate selection module 750 then select one candidate coding mode from the selected group. The inter-prediction module 640 then uses the selected candidate coding mode to perform motion compensation. In some embodiments, the group assignment module 740 assigns certain number of candidate coding modes having the lowest boundary matching costs to form one lowest cost group and the candidate selection module 750 selects a candidate coding mode from this lowest cost group. In some embodiments, the group assignment module 740 assigns certain number of candidate coding modes to form a group with a pre-defined rule (may depending or not depending costs) and the candidate selection module 750 selects a candidate coding mode from the group depending on the costs. For example, the selected candidate coding mode is the candidate mode with the highest priority in the group.
In some embodiments, the identity of the selected group is sent to the entropy encoder 690 to be signaled  in the bitstream 695. In some embodiments, the identity of the selected group is to be implicitly determined and not signaled in the bitstream. In some embodiments, the selection of the group is determined based on the computed boundary matching costs of the different groups, e.g., the group selection module 740 may select a group having a lowest representative cost.
In some embodiments, the identity of the selected candidate coding mode within the selected group is provided to the entropy encoder 690 to be signaled in the bitstream 695. In some embodiments, the candidate coding modes within a group are reordered according to the computed boundary matching costs such that the lowest cost (or the highest cost) candidate will be signaled using the shortest codeword. In some embodiments, the identity of the selected candidate coding mode is to be implicitly determined and not signaled in the bitstream, e.g., by selecting the candidate coding mode with the lowest boundary matching cost within the group. The selection of a group of candidate coding modes and the selection of a candidate coding mode from the group is described in Sections II-VI above.
The boundary prediction modules 710 include modules for various different candidate coding modes. These candidate coding modes may use luma and chroma samples stored in the reconstructed buffer 650, motion information from MV buffer 665, and/or incoming luma and chroma samples from the video source 605 to generate the boundary predictor samples 715 (by e.g., deriving merge candidates, deriving MMVD candidates, using motion information to fetch reference samples, generating linear models, etc. ) In some embodiments, the various candidate coding modes include those that correspond to various merge candidates of the current block. Merge candidates of various types (e.g., spatial, temporal, affine, CIIP, etc., ) as candidate coding modes are described in Section II above. Different MMVD distances and directions as candidate coding modes are described in Section III above. Different BCW weights as candidate coding modes are described in Section IV above. Linear models of various types (e.g., LM-L, LM-T, LM-LT) as candidate coding modes are described in Section V above.
FIG. 8 conceptually illustrates a process 800 for using boundary matching costs to select a candidate coding mode to encode the current block. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the encoder 600 performs the process 800 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the encoder 600 performs the process 800.
The encoder receives (at block 810) data to be encoded as a current block of pixels in a current picture. The encoder identifies (at block 820) a plurality of candidate coding modes applicable to the current block. In some embodiments, the plurality of candidate coding modes includes merge candidates of the current block. The merge candidates of the current block may include (i) merge candidates that use CIIP and/or (ii) merge candidates that use affine transform motion compensation prediction. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different MMVD combinations of distances and offsets for refining motion information. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different linear models for deriving predictors of chroma samples of the current block based on luma samples of the current block. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different candidate BCW weights for combining inter predictions of different directions.
The encoder identifies (at block 830) a first group of candidate coding modes that is a subset of the plurality of candidate coding modes, such that the number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes.
In some embodiments, the first group of candidate coding modes are highest priority candidate coding modes identified based on costs of the plurality of candidate coding modes. The encoder may index the candidate coding modes or assign codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes. In some embodiments, the cost of a candidate coding mode is a boundary matching cost computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current  block that are generated according to the candidate coding mode. Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
In some embodiments, each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes. For example, in some embodiment, each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K. In some embodiments, the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes. In some embodiments, the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes. In some embodiments, candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group. In some embodiments, candidates coding modes that are merge candidates with motion differences greater than a threshold are assigned to a same group. In some embodiments, the encoder identifies a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes and signals an index selecting one candidate coding mode from the identified group of candidate coding modes. The representative cost of the identified group may be a mean, a maximum, or a minimum of the costs (e.g., boundary matching costs) of the candidate coding modes of the identified group. In some embodiments, the encoder signals an index selecting a group of candidate coding modes and identifies a candidate coding mode from the selected group of candidate coding modes based on costs (e.g., boundary matching costs) of the candidate coding modes of the selected group. The selection of a group of candidate coding modes and the selection of a candidate coding mode from the group is described in Sections II-VI above.
The encoder selects (at block 840) a candidate coding mode in the first group of candidate coding modes. The selected candidate coding mode in the first group of candidate coding modes may be selected based on cost.
The encoder encodes (at block 850) the current block by using the selected candidate coding mode. Specifically, the encoder constructs a predictor of the current block according to the selected candidate coding mode and use the predictor to encode the current block.
VIII. Example Video Decoder
In some embodiments, an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.
FIG. 9 illustrates an example video decoder 900 that may implement candidate coding mode selection based on boundary matching cost. As illustrated, the video decoder 900 is an image-decoding or video-decoding circuit that receives a bitstream 995 and decodes the content of the bitstream into pixel data of video frames for display. The video decoder 900 has several components or modules for decoding the bitstream 995, including some components selected from an inverse quantization module 911, an inverse transform module 910, an intra-prediction module 925, a motion compensation module 930, an in-loop filter 945, a decoded picture buffer 950, a MV buffer 965, a MV prediction module 975, and a parser 990. The motion compensation module 930 is part of an inter-prediction module 940.
In some embodiments, the modules 910 –990 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 910 –990 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 910 –990 are illustrated as being separate modules, some of the modules can be combined into a single module.
The parser 990 (or entropy decoder) receives the bitstream 995 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard. The parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 912. The parser 990 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
The inverse quantization module 911 de-quantizes the quantized data (or quantized coefficients) 912 to obtain transform coefficients, and the inverse transform module 910 performs inverse transform on the transform coefficients 916 to produce reconstructed residual signal 919. The reconstructed residual signal 919 is added with predicted pixel data 913 from the intra-prediction module 925 or the motion compensation module 930 to produce decoded pixel data 917. The decoded pixels data are filtered by the in-loop filter 945 and stored in the decoded picture buffer 950. In some embodiments, the decoded picture buffer 950 is a storage external to the video decoder 900. In some embodiments, the decoded picture buffer 950 is a storage internal to the video decoder 900.
The intra-prediction module 925 receives intra-prediction data from bitstream 995 and according to which, produces the predicted pixel data 913 from the decoded pixel data 917 stored in the decoded picture buffer 950. In some embodiments, the decoded pixel data 917 is also stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
In some embodiments, the content of the decoded picture buffer 950 is used for display. A display device 955 either retrieves the content of the decoded picture buffer 950 for display directly, or retrieves the content of the decoded picture buffer to a display buffer. In some embodiments, the display device receives pixel values from the decoded picture buffer 950 through a pixel transport.
The motion compensation module 930 produces predicted pixel data 913 from the decoded pixel data 917 stored in the decoded picture buffer 950 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 995 with predicted MVs received from the MV prediction module 975.
The MV prediction module 975 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 975 retrieves the reference MVs of previous video frames from the MV buffer 965. The video decoder 900 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 965 as reference MVs for producing predicted MVs.
The in-loop filter 945 performs filtering or smoothing operations on the decoded pixel data 917 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering operation performed includes sample adaptive offset (SAO) . In some embodiment, the filtering operations include adaptive loop filter (ALF) .
FIG. 10 illustrates portions of the video decoder 900 that implement candidate coding mode selection based on boundary matching costs. Specifically, the figure illustrates the components of the inter-prediction module 940 of the video decoder 900 that calculate boundary matching costs of all or any subset of the different candidate coding modes and select a candidate coding mode based on group assignment and the computed costs.
The inter-prediction module 940 includes various boundary prediction modules 1010. These boundary prediction modules 1010 generates predictor samples 1015 for the current block along the boundary of the current block for various candidate coding modes. The boundary predictor samples 1015 of each candidate coding modes are compared with neighboring samples 1025 of the current block (retrieved from the reconstructed picture buffer 950) by a boundary matching cost calculator 1030 to compute the boundary matching cost of each of all or any subset of the candidate coding modes. Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
A group assignment and selection module 1040 assigns the different candidate coding modes into different groups and select one of the groups. A candidate selection module 1050 then select one candidate coding mode from the selected group. The inter-prediction module 940 then uses the selected candidate coding mode to perform motion compensation. In some embodiments, the group assignment module 1040 assigns certain number of candidate coding modes having the lowest boundary matching costs to form one lowest cost group and the candidate selection module 1050 selects a candidate coding mode from this lowest cost group. In some embodiments, the group assignment module 1040 assigns certain number of candidate coding modes  to form a group with a pre-defined rule (may depending or not depending costs) and the candidate selection module 1050 selects a candidate coding mode from the group depending on the costs. For example, the selected candidate coding mode is the candidate mode with the highest priority in the group.
In some embodiments, the entropy decoder 990 parses the bitstream 995 for the identity of the selected group and provide the identity of the selected group to the group assignment and selection module 1040. In some embodiments, the identity of the selected group is implicitly determined and not signaled in the bitstream. In some embodiments, the selection of the group is determined based on the computed boundary matching costs of the different groups, e.g., the group selection module 1040 may select a group having a lowest representative cost. In some embodiments, the entropy decoder 990 parses the bitstream 995 for the identity of the selected candidate coding mode within the selected group and provide the identity of the selected candidate coding mode to the candidate selection module 1050.
In some embodiments, the candidate coding modes within a group are reordered according to the computed boundary matching costs such that the lowest cost (or the highest cost) candidate will be signaled using the shortest codeword. In some embodiments, the identity of the selected candidate coding mode is to be implicitly determined and not signaled in the bitstream, e.g., by selecting the candidate coding mode with the lowest boundary matching cost within the group. The selection of a group of candidate coding modes and the selection of a candidate coding mode from the group is described in Sections II-VI above.
The boundary prediction modules 1010 include modules for various different candidate coding modes. These candidate coding modes may use luma and chroma samples stored in the reconstructed buffer 950 and motion information from MV buffer 965 to generate the boundary predictor samples 1015 (by e.g., deriving merge candidates, deriving MMVD candidates, using motion information to fetch reference samples, generating linear models, etc. ) In some embodiments, the candidate coding modes include those that correspond to various merge candidates of the current block. Merge candidates of various types (e.g., spatial, temporal, affine, CIIP, etc., ) as candidate coding modes are described in Section II above. Different MMVD distances and directions as candidate coding modes are described in Section III above. Different BCW weights as candidate coding modes are described in Section IV above. Linear models of various types (e.g., LM-L, LM-T, LM-LT) as candidate coding modes are described in Section V above.
FIG. 11 conceptually illustrates a process 1100 for using boundary matching costs to select a candidate coding mode to decode the current block. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the decoder 900 performs the process 1100 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the decoder 900 performs the process 1100.
The decoder receives (at block 1110) data to be decoded as a current block of pixels in a current picture. The decoder identifies (at block 1120) a plurality of candidate coding modes applicable to the current block. In some embodiments, the plurality of candidate coding modes includes merge candidates of the current block. The merge candidates of the current block may include (i) merge candidates that use combined inter and intra prediction (CIIP) and/or (ii) merge candidates that use affine transform motion compensation prediction. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different MMVD combinations of distances and offsets for refining motion information. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different linear models (LMs) for deriving predictors of chroma samples of the current block based on luma samples of the current block. In some embodiments, the plurality of candidate coding modes includes candidate coding modes that correspond to different candidate BCW weights for combining inter predictions of different directions.
The decoder identifies (at block 1130) a first group of candidate coding modes that is a subset of the plurality of candidate coding modes. The number of candidate coding modes in the first group is less than the number of candidate coding modes in the plurality of candidate coding modes. (i.e., the first group is a subset of the plurality of candidate coding modes. )
In some embodiments, the first group of candidate coding modes are highest priority candidate coding modes identified based on costs of the plurality of candidate coding modes. The decoder may index the candidate coding modes or assign codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes. In some embodiments, the cost of a candidate coding mode is a boundary matching cost computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current block that are generated according to the candidate coding mode. Section I above describes calculating boundary matching costs based on neighboring samples and boundary predictor samples.
In some embodiments, each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes. For example, in some embodiment, each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K. In some embodiments, the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes. In some embodiments, the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes. In some embodiments, candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group. In some embodiments, candidates coding modes that are merge candidates with motion differences greater than a threshold are assigned to a same group. In some embodiments, the decoder identifies a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes and signals an index selecting one candidate coding mode from the identified group of candidate coding modes. The representative cost of the identified group may be a mean, a maximum, or a minimum of the costs (e.g., boundary matching costs) of the candidate coding modes of the identified group. In some embodiments, the decoder signals an index selecting a group of candidate coding modes and identifies a candidate coding mode from the selected group of candidate coding modes based on costs (e.g., boundary matching costs) of the candidate coding modes of the selected group. The selection of a group of candidate coding modes and the selection of a candidate coding mode from the group is described in Sections II-VI above.
The decoder selects (at block 1140) a candidate coding mode in the first group of candidate coding modes. The selected candidate coding mode in the first group of candidate coding modes may be selected based on cost.
The decoder decodes (at block 1150) the current block by using the selected candidate coding mode to reconstruct the current block. Specifically, the decoder constructs a predictor of the current block according to the selected candidate coding mode and use the predictor to reconstruct the current block. The decoder may then provide the reconstructed current block for display as part of the reconstructed current picture.
VIII. Example Electronic System
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium) . When these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be  implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
FIG. 12 conceptually illustrates an electronic system 1200 with which some embodiments of the present disclosure are implemented. The electronic system 1200 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 1200 includes a bus 1205, processing unit (s) 1210, a graphics-processing unit (GPU) 1215, a system memory 1220, a network 1225, a read-only memory 1230, a permanent storage device 1235, input devices 1240, and output devices 1245.
The bus 1205 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1200. For instance, the bus 1205 communicatively connects the processing unit (s) 1210 with the GPU 1215, the read-only memory 1230, the system memory 1220, and the permanent storage device 1235.
From these various memory units, the processing unit (s) 1210 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure. The processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1215. The GPU 1215 can offload various computations or complement the image processing provided by the processing unit (s) 1210.
The read-only-memory (ROM) 1230 stores static data and instructions that are used by the processing unit (s) 1210 and other modules of the electronic system. The permanent storage device 1235, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1200 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1235.
Other embodiments use a removable storage device (such as a floppy disk, flash memory device, etc., and its corresponding disk drive) as the permanent storage device. Like the permanent storage device 1235, the system memory 1220 is a read-and-write memory device. However, unlike storage device 1235, the system memory 1220 is a volatile read-and-write memory, such a random access memory. The system memory 1220 stores some of the instructions and data that the processor uses at runtime. In some embodiments, processes in accordance with the present disclosure are stored in the system memory 1220, the permanent storage device 1235, and/or the read-only memory 1230. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1210 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
The bus 1205 also connects to the input and  output devices  1240 and 1245. The input devices 1240 enable the user to communicate information and select commands to the electronic system. The input devices 1240 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc. The output devices 1245 display images generated by the electronic system or otherwise output data. The output devices 1245 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
Finally, as shown in FIG. 12, bus 1205 also couples electronic system 1200 to a network 1225 through a network adapter (not shown) . In this manner, the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1200 may be used in conjunction with the present  disclosure.
Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) . Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc. ) , flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc. ) , magnetic and/or solid state hard drives, read-only and recordable 
Figure PCTCN2023071007-appb-000013
discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
While the above discussion primarily refers to microprocessor or multi-core processors that execute software, many of the above-described features and applications are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) . In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs) , ROM, or RAM devices.
As used in this specification and any claims of this application, the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
While the present disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the present disclosure can be embodied in other specific forms without departing from the spirit of the present disclosure. In addition, a number of the figures (including FIG. 8 and FIG. 11) conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the present disclosure is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.
Additional Notes
The herein-described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely examples, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermediate components. Likewise, any two components so associated can also be viewed as being "operably connected" , or "operably coupled" , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable" , to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or  wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
Further, with respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
Moreover, it will be understood by those skilled in the art that, in general, terms used herein, and especially in the appended claims, e.g., bodies of the appended claims, are generally intended as “open” terms, e.g., the term “including” should be interpreted as “including but not limited to, ” the term “having” should be interpreted as “having at least, ” the term “includes” should be interpreted as “includes but is not limited to, ” etc. It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an, " e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more; ” the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number, e.g., the bare recitation of "two recitations, " without other modifiers, means at least two recitations, or two or more recitations. Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. In those instances where a convention analogous to “at least one of A, B, or C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B. ”
From the foregoing, it will be appreciated that various implementations of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various implementations disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Claims (21)

  1. A video coding method comprising:
    receiving data for a block of pixels to be encoded or decoded as a current block of a current picture of a video;
    identifying a plurality of candidate coding modes applicable to the current block;
    identifying a first group of candidate coding modes that is a subset of the plurality of candidate coding modes, wherein a number of candidate coding modes in the first group is less than a number of candidate coding modes in the plurality of candidate coding modes;
    selecting a candidate coding mode in the first group of candidate coding modes; and
    encoding or decoding the current block by using the selected candidate coding mode.
  2. The video coding method of claim 1, wherein the selected candidate coding mode in the first group of candidate coding modes is selected based on cost.
  3. The video coding method of claim 1, wherein the first group of candidate coding modes are highest priority candidate coding modes identified based on cost.
  4. The video coding method of claim 3, further comprising indexing the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes.
  5. The video coding method of claim 3, further comprising assigning codewords to the candidate coding modes in the identified group of candidate coding modes according to the priorities of the candidate coding modes.
  6. The video coding method of claim 3, wherein the cost of a candidate coding mode is computed by comparing (i) reconstructed samples neighboring the current block and (ii) predicted samples of the current block along boundaries of the current block, wherein the predicted samples are generated according to the candidate coding mode.
  7. The video coding method of claim 1, wherein each of the plurality of candidate coding modes is assigned to one of a plurality of groups of candidate coding modes.
  8. The video coding method of claim 7, wherein each candidate coding mode in the plurality of candidate coding modes is associated with an original candidate index, wherein each candidate coding mode is assigned to one of K groups of candidate coding modes based on a result of the original index modulo K or a result of the original index divided by K.
  9. The video coding method of claim 7, wherein the candidate coding modes that correspond to spatial merge candidates are assigned to a same group of candidate coding modes.
  10. The video coding method of claim 7, wherein the candidate coding modes that correspond to spatial merge candidates are assigned to different groups of candidate coding modes.
  11. The video coding method of claim 7, wherein candidates coding modes that are merge candidates with motion differences smaller than a threshold are assigned to a same group.
  12. The video coding method of claim 7, wherein candidates coding modes that are merge  candidates with motion differences greater than a threshold are assigned to a same group.
  13. The video coding method of claim 7, further comprising:
    identifying a group of candidate coding modes having a lowest representative cost among the plurality of groups of candidate coding modes; and
    signaling or receiving an index selecting one candidate coding mode from the identified group of candidate coding modes.
  14. The video coding method of claim 13, wherein the representative cost of the identified group is a mean, maximum, or minimum of the costs of the candidate coding modes of the identified group.
  15. The video coding method of claim 7, further comprising:
    signaling or receiving an index selecting a group of candidate coding modes; and
    identifying a candidate coding mode from the selected group of candidate coding modes based on costs of the candidate coding modes of the selected group.
  16. The video coding method of claim 1, wherein the plurality of candidate coding modes comprises merge candidates of the current block.
  17. The video coding method of claim 16, wherein the merge candidates of the current block comprise:
    (i) merge candidates that use combined inter and intra prediction (CIIP) ; or
    (ii) merge candidates that use affine transform motion compensation prediction.
  18. The video coding method of claim 1, wherein the plurality of candidate coding modes comprises candidate coding modes that correspond to different combinations of distances and offsets for refining motion information.
  19. The video coding method of claim 1, wherein the plurality of candidate coding modes comprises candidate coding modes that correspond to different candidate weights for combining inter predictions of different directions (BCW) .
  20. The video coding method of claim 1, wherein the plurality of candidate coding modes comprises candidate coding modes that correspond to different models for deriving predictors of chroma samples of the current block based on luma samples of the current block.
  21. An electronic apparatus comprising:
    a video coding circuit configured to perform operations comprising:
    receiving data for a block of pixels to be encoded or decoded as a current block of a current picture of a video;
    identifying a plurality of candidate coding modes applicable to the current block;
    identifying a first group of candidate coding modes that is a subset of the plurality of candidate coding modes, wherein a number of candidate coding modes in the first group is less than a number of candidate coding modes in the plurality of candidate coding modes;
    selecting a candidate coding mode in the first group of candidate coding modes; and
    encoding or decoding the current block by using the selected candidate coding mode to reconstruct the current block.
PCT/CN2023/071007 2022-01-07 2023-01-06 Boundary matching for video coding WO2023131298A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW112100602A TW202337207A (en) 2022-01-07 2023-01-06 Video coding method and apparatus thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263297252P 2022-01-07 2022-01-07
US63/297,252 2022-01-07

Publications (1)

Publication Number Publication Date
WO2023131298A1 true WO2023131298A1 (en) 2023-07-13

Family

ID=87073303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071007 WO2023131298A1 (en) 2022-01-07 2023-01-06 Boundary matching for video coding

Country Status (2)

Country Link
TW (1) TW202337207A (en)
WO (1) WO2023131298A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685474A (en) * 2011-03-10 2012-09-19 华为技术有限公司 Encoding and decoding method of prediction modes, encoding and decoding device and network system
CN103581682A (en) * 2013-08-14 2014-02-12 北京交通大学 Fast mode decision algorithm for HEVC intra-frame coding and application thereof
US20150172653A1 (en) * 2012-07-05 2015-06-18 Thomson Licensing Video coding and decoding method with adaptation of coding modes and corresponding encoder and decoder
US20160127725A1 (en) * 2014-10-31 2016-05-05 Ecole De Technologie Superieure Method and system for fast mode decision for high efficiency video coding
US20200186808A1 (en) * 2018-12-11 2020-06-11 Google Llc Rate/distortion/rdcost modeling with machine learning
CN111885382A (en) * 2020-06-23 2020-11-03 北京工业职业技术学院 Intra-frame chroma prediction mode fast selection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685474A (en) * 2011-03-10 2012-09-19 华为技术有限公司 Encoding and decoding method of prediction modes, encoding and decoding device and network system
US20150172653A1 (en) * 2012-07-05 2015-06-18 Thomson Licensing Video coding and decoding method with adaptation of coding modes and corresponding encoder and decoder
CN103581682A (en) * 2013-08-14 2014-02-12 北京交通大学 Fast mode decision algorithm for HEVC intra-frame coding and application thereof
US20160127725A1 (en) * 2014-10-31 2016-05-05 Ecole De Technologie Superieure Method and system for fast mode decision for high efficiency video coding
US20200186808A1 (en) * 2018-12-11 2020-06-11 Google Llc Rate/distortion/rdcost modeling with machine learning
CN111885382A (en) * 2020-06-23 2020-11-03 北京工业职业技术学院 Intra-frame chroma prediction mode fast selection

Also Published As

Publication number Publication date
TW202337207A (en) 2023-09-16

Similar Documents

Publication Publication Date Title
US11172203B2 (en) Intra merge prediction
US11310526B2 (en) Hardware friendly constrained motion vector refinement
US11343541B2 (en) Signaling for illumination compensation
US11166037B2 (en) Mutual excluding settings for multiple tools
US10715827B2 (en) Multi-hypotheses merge mode
US11553173B2 (en) Merge candidates with multiple hypothesis
WO2020169082A1 (en) Intra block copy merge list simplification
US11297348B2 (en) Implicit transform settings for coding a block of pixels
US11924413B2 (en) Intra prediction for multi-hypothesis
US11245922B2 (en) Shared candidate list
US11240524B2 (en) Selective switch for parallel processing
WO2019161798A1 (en) Intelligent mode assignment in video coding
WO2023131298A1 (en) Boundary matching for video coding
WO2024007789A1 (en) Prediction generation with out-of-boundary check in video coding
WO2024027700A1 (en) Joint indexing of geometric partitioning mode in video coding
WO2023236916A1 (en) Updating motion attributes of merge candidates
WO2024017004A1 (en) Reference list reordering in video coding
WO2023236914A1 (en) Multiple hypothesis prediction coding
WO2024037645A1 (en) Boundary sample derivation in video coding
WO2024022144A1 (en) Intra prediction based on multiple reference lines
TW202327360A (en) Method and apparatus for multiple hypothesis prediction in video coding system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23737160

Country of ref document: EP

Kind code of ref document: A1