CN118435605A - Candidate reordering and motion vector refinement for geometric partition modes - Google Patents

Candidate reordering and motion vector refinement for geometric partition modes Download PDF

Info

Publication number
CN118435605A
CN118435605A CN202280059245.4A CN202280059245A CN118435605A CN 118435605 A CN118435605 A CN 118435605A CN 202280059245 A CN202280059245 A CN 202280059245A CN 118435605 A CN118435605 A CN 118435605A
Authority
CN
China
Prior art keywords
partition
candidate
template
list
current block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280059245.4A
Other languages
Chinese (zh)
Inventor
邱志尧
罗志轩
陈俊嘉
徐志玮
陈庆晔
庄子德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MediaTek Inc
Original Assignee
MediaTek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MediaTek Inc filed Critical MediaTek Inc
Priority claimed from PCT/CN2022/112566 external-priority patent/WO2023020446A1/en
Publication of CN118435605A publication Critical patent/CN118435605A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method for reordering partition candidates or motion vectors based on template matching costs of geometric prediction modes (geometric prediction mode, GPM for short) is provided. The video codec receives data to be encoded or decoded as a current block of a current picture of video. The current block is divided into first and second partitions by a bisector defined by an angle-distance pair (angle-DISTANCE PAIR). The video codec identifies a list of candidate prediction modes for encoding and decoding the first and second partitions. The video codec calculates a template matching (TEMPLATE MATCHING, TM) cost for each candidate prediction mode in the list. The video codec receives or transmits a selection of candidate prediction modes based on an index assigned to the selected candidate prediction modes based on the calculated TM cost. The video codec reconstructs the current block by predicting the first and second partitions using the selected candidate prediction modes.

Description

Candidate reordering and motion vector refinement for geometric partition modes
Cross Reference to Related Applications
The present disclosure is part of a non-provisional application claiming priority from U.S. provisional patent application number 63/233,346 filed on day 8 and 16 of 2021 and U.S. provisional patent application number 63/318,806 filed on day 3 and 11 of 2022. The contents of the above-listed applications are incorporated herein by reference in their entirety.
Technical Field
The present disclosure relates generally to video coding. In particular, the present disclosure relates to a prediction candidate selection method of geometric prediction modes (geometric prediction mode, abbreviated as GPM).
Background
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims listed below and are not admitted to be prior art by inclusion in this section.
Efficient video codec (High-EFFICIENCY VIDEO CODING, HEVC for short) is an international video codec standard developed by the joint collaborative group of video codecs (Joint Collaborative Team on Video Coding, JCT-VC for short). HEVC is based on a hybrid block-based motion compensation type DCT transform codec architecture. The compressed base unit, called Coding unit (CU for short), is a square block of 2Nx2N, each CU can be recursively divided into four smaller CUs until a predetermined minimum size is reached. Each CU contains one or more Prediction Units (PUs).
In order to improve the coding efficiency of Motion Vector (MV) coding in HEVC, HEVC has a skip mode and a merge mode. The skip mode and merge mode acquire motion information from spatially neighboring blocks (spatial candidates) or temporally co-located blocks (temporal candidates). When the PU is in skip mode and merge mode, the motion information is not encoded, but only the index of the selected candidate is encoded. For skip mode, the residual signal is forced to zero and not encoded. In HEVC, if a particular block is coded as skipped or merged, a candidate index is sent to indicate which candidate in the candidate set is used for merging. Each merging Prediction Unit (PU) reuses MV, prediction direction and reference picture index of the selected candidate.
Disclosure of Invention
The following summary is illustrative only and is not intended to be in any way limiting. That is, the following summary is provided to introduce the concepts, benefits, and advantages of the novel and nonobvious techniques described herein. Alternative but not all embodiments are further described in the detailed description below. Thus, the following summary is not intended to identify essential features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter.
Some embodiments of the present disclosure provide a method of reordering partition candidates or motion vectors based on template matching costs of geometric prediction modes (geometric prediction mode, GPM for short). The video codec receives data that is to be encoded or decoded as a current block of a current picture of the video. The current block is divided into a first partition and a second partition by a bisector defined by an angle-distance pair (angle-DISTANCE PAIR). The video codec identifies a list of candidate prediction modes for decoding the first and second partitions. The video codec calculates a template matching (TEMPLATE MATCHING, TM) cost for each candidate prediction mode in the list. The video codec receives or transmits a selection of candidate prediction modes based on an index assigned to the selected candidate prediction modes based on the calculated TM cost. The video codec reconstructs the current block by predicting the first partition and the second partition using the selected candidate prediction modes.
The first partition may be encoded and decoded by inter prediction, which refers to samples in a reference picture, and the second partition may be encoded and decoded by intra prediction, which refers to neighboring samples of a current block in a current picture. Alternatively, both the first partition and the second partition may be encoded by inter prediction that uses the first motion vector and the second motion vector from the list to reference samples in the first reference picture and the second reference picture.
The different candidate prediction modes in the list may correspond to different bisectors defined by different angle-distance pairs. Different candidate prediction modes in the list may also correspond to different motion vectors that may be used to generate inter prediction to reconstruct the first partition or the second partition of the current block. In some embodiments, the candidate prediction mode list includes only unidirectional prediction candidates and does not include bidirectional prediction candidates when the current block is greater than a threshold size, and includes merge candidates when the current block is less than the threshold size.
In some embodiments, the video encoder reconstructs the current block by generating predictions for the first partition and the second partition using the refined motion vectors. The refined motion vector is identified by searching for the motion vector with the lowest TM cost based on the initial motion vector. In some embodiments, searching for the motion vector with the lowest TM cost includes iteratively applying a motion vector centered search pattern, the motion vector identified from a previous iteration as having the lowest TM cost (until a lower cost can no longer be found). In some embodiments, the encoder applies different search modes in different iterations or rounds during the search process at different resolutions (e.g., 1 pixel, 1/2 pixel, 1/4 pixel, etc.) to refine the motion vector.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure. The accompanying drawings illustrate embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. It is noted that the drawings are not necessarily to scale, since specific elements may be shown out of scale in an actual implementation in order to clearly illustrate the concepts of the present disclosure.
Fig. 1 shows motion candidates of the merge mode.
Figure 2 conceptually illustrates a "prediction+merge" algorithm framework for merge candidates.
Figure 3 conceptually illustrates an example candidate reordering.
Fig. 4-5 conceptually illustrate an L-shape matching method for calculating a guess cost for a selected candidate.
Fig. 6 shows partitioning of CUs by geometric partitioning mode (geometric partitioning mode, GPM for short).
Fig. 7 shows an example uni-directional prediction candidate list for GPM partition and selecting uni-directional prediction MVs for GPM.
Fig. 8 shows an example partition edge blending process for a GPM of a CU.
Fig. 9 shows a CU coded by GPM-intra.
Fig. 10 conceptually illustrates a CU that is encoded by using MVs from a reordered GPM candidate list.
Figure 11 conceptually illustrates reordering different candidate GPM split modes according to TM cost when coding a CU.
Figure 12 conceptually illustrates MV refinement based on TM cost.
Fig. 13 illustrates an example video encoder in which prediction candidates may be selected based on TM cost.
Fig. 14 shows a video encoder portion that implements candidate prediction mode selection based on TM cost.
Fig. 15 conceptually illustrates a process of assigning indexes to prediction candidates based on TM costs for encoding pixel blocks.
Fig. 16 shows an example video decoder that selects prediction candidates based on TM cost.
Fig. 17 shows a portion of a video decoder that implements candidate prediction mode selection based on TM cost.
Fig. 18 conceptually illustrates a process of assigning indices to prediction candidates based on TM costs for decoding a block of pixels.
Figure 19 conceptually illustrates an electronic system implementing some embodiments of the present disclosure.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. Any variations, derivatives, and/or extensions based on the teachings described herein are within the scope of this disclosure. In some instances, well known methods, processes, components, and/or circuits associated with one or more example embodiments disclosed herein may be described at a relatively high level without detail in order to avoid unnecessarily obscuring aspects of the teachings of the present disclosure.
1. Candidate reordering of merge modes
Fig. 1 shows motion candidates of the merge mode. The figure shows a current block 100 of a video picture or frame encoded or decoded by a video codec. As shown, up to four spatial MV candidates are derived from the spatial neighbors A0, A1, B0 and B1, and one temporal MV candidate is derived from TBR or TCTR (TBR is used first, TCTR is used if TBR is not available). If none of the four spatial MV candidates are available, position B2 is used to derive the MV candidate as a replacement. After the derivation process of four spatial MV candidates and one temporal MV candidate, in some embodiments removing redundancy (pruning) is applied to remove redundant MV candidates. If the number of available MV candidates is less than 5 after removing redundancy (pruning), three additional candidates are derived and added to the candidate set (candidate list). The video encoder selects a final candidate from the candidate set for skip or merge mode based on rate-distortion optimization (RDO) decisions, and transmits the index to the video decoder. (the skip mode and merge mode are collectively referred to herein as "merge mode")
For some embodiments, the merge candidates are defined as candidates for a generic "prediction+merge" algorithm framework. The "prediction+merge" algorithm framework has a first portion and a second portion. The first part generates a candidate list of predictor(s) that are derived by inheriting or refining (refining) or processing the neighbor information. The second part is to send (i) a merge index to indicate which of the candidate lists was selected, and (ii) some auxiliary information related to the merge index. In other words, the encoder sends some side information of the combined index and the selected candidates to the decoder.
Figure 2 conceptually illustrates a "prediction+merge" algorithm framework for merge candidates. The candidate list includes a number of candidates that inherit neighboring information. The inherited information is then processed or refined to form new candidates. In these processes, some candidate side information is generated and sent to the decoder.
The video codec (encoder or decoder) may process the merge candidates in different ways. First, in some embodiments, a video codec may combine two or more candidates into one candidate. Second, in some embodiments, the video codec may perform a motion estimation search using the original candidate as an original MV predictor and using the current block of pixels to find the final motion vector difference (Motion Vector Difference, abbreviated MVD), where the side information is the MVD. Third, in some embodiments, the video codec may perform a motion estimation search using the original candidate as an original MV predictor and using the current block of pixels to find the final MVD of L0, and the L1 predictor is the original candidate. Fourth, in some embodiments, the video codec may perform a motion estimation search using the original candidate as an original MV predictor and using the current block of pixels to find the final MVD of L1, and the L0 predictor is the original candidate. Fifth, in some embodiments, the video codec may perform MV refinement searches using the original candidate as the original MV predictor and using the top or left-side neighboring pixels as a search template to find the final predictor. Sixth, the video codec may perform MV refinement searches using the original candidate as the original MV predictor and using the bilateral template (candidate MVs or pixels on the L0 and L1 reference pictures pointed to by mirrored MVs) as the search template to find the final predictor.
Template matching (TEMPLATE MATCHING, TM) is a video codec method to refine the prediction of a current CU for prediction by matching the template of the current CU in the current picture (current template) with a reference template in a reference picture. A template for a CU or block generally refers to a particular set of pixels adjacent to the top and/or left side of the CU.
For the purposes of this document, the term "merge candidate" or "candidate" refers to a candidate in the framework of a generic "prediction+merge" algorithm. The "prediction+merge" algorithm framework is not limited to the foregoing embodiments. Any algorithm with "predict+merge index" behavior belongs to this framework.
In some embodiments, the video codec reorders the combined candidates, i.e., the video codec modifies the order of candidates within the candidate list to achieve better codec efficiency. The reordering rules depend on some pre-calculations of the current candidate (merge candidates before reordering), such as top neighbor condition (mode, MV, etc.) or left neighbor condition (mode, MV, etc.) of the current CU, current CU shape, or top/left L-template matching.
Figure 3 conceptually illustrates an example candidate reordering. As shown, the example merge candidate list 0300 has six candidates labeled "0" through "5". The video codec initially selects some candidates (candidates labeled "1" and "3") for reordering. Then, the video codec pre-calculates the costs of these candidates (the costs of the candidates labeled "1" and "3" are 100 and 50, respectively). The cost is referred to as the guess cost of the candidate (since this is not the real cost of using the candidate, but is simply an estimate or guess of the real cost), the lower the cost means the better the candidate. Finally, the video codec reorders the selected candidates by moving the lower cost candidate (the candidate labeled "3") to the front of the list.
In general, for a merge candidate Ci having an order position Oi in the merge candidate list (in whichN is the total number of candidates in the list, oi=0 means Ci at the beginning of the list, and oi=n-1 means Ci at the end of the list), oi=i (C0 order is 0, C1 order is 1, C2 order is 2,..and so on), the video codec reorders the merge candidates in the list by changing Oi of Ci of the selected value of i (changing the order of some selected candidates).
In some embodiments, merge candidate reordering may be turned off according to the size or shape of the current PU. The video codec may predefine several PU sizes or shapes to shut down merge candidate reordering. In some embodiments, other conditions for turning off merge candidate reordering, such as picture size, QP value, etc., are specific predetermined values. In some embodiments, the video codec may send a flag to turn on or off merge candidate reordering. For example, the video codec may send a flag (e.g., "merge_cand_rdr_en") to indicate whether "merge candidate reordering" is enabled (value 1: enabled, value 0: disabled). When the flag is not present, the value of merge_cand_rdr_en is inferred to be 1. The minimum size of the cells in the signaling, merge_cand_rdr_en, may also be encoded and decoded separately in sequence level, picture level, slice level, or PU level.
In general, a video codec may reorder candidates by (1) identifying one or more candidates for reordering, (2) calculating a guess cost for each identified candidate, and (3) reordering the candidates according to the guess cost of the selected candidate. In some embodiments, the calculated guess cost of some candidates is adjusted (cost adjustment) before the candidates are reordered.
In some embodiments, the step of selecting one or more candidates may be performed by several different methods. In some embodiments, the video codec selects all candidates with a merge_index less than or equal to the threshold. The threshold is a predetermined value and merge_index is the original order within the merge list (merge_index is 0, 1, 2.). For example, if the original order of the current candidate is at the beginning of the merge list, then merge_index=0 (for the current candidate).
In some embodiments, the video codec selects candidates for reordering according to the candidate type. The candidate type is a candidate class of all candidates. The video codec first classifies all candidates as MG types (mg=1 or 2 or 3 or other values), and then selects an mg_s (mg_s=1, 2, 3.) type from among all MG types for reordering. One example of classification is classifying all candidates into 4 candidate types. Type 1 is a candidate for spatially neighboring MVs. Type 2 is a candidate for a temporally adjacent MV. Type 3 is all sub-PU candidates (e.g., sub-PU TMVP, STMV, affine merge candidates). Type 4 is all other candidates. In some embodiments, the video codec selects candidates according to both the merge_index and the candidate type.
In some embodiments, an L-shaped matching method is used to calculate the guess cost of the selected candidates. For the currently selected merge candidate, the video codec obtains an L-shaped template for the current picture and an L-shaped template for the reference picture and compares the difference between the two templates. The L-shaped matching method has two parts or steps: (i) identifying an L-shaped template and (ii) matching the derived template.
Fig. 4-5 conceptually illustrate an L-shape matching method for calculating a guess cost for a selected candidate. Fig. 4 shows an L-shaped template of the current CU (current template) in the current picture, which includes some pixels around the top and left side edges of the current PU. The L-shaped template of the reference picture includes some pixels around the top and left edges of the reference_block_for_ guessing of the current merge candidates. reference_block_for_ guessing (width BW and height BH are the same as the current PU) is the block to which the integer part of the motion vector of the current merge candidate points.
Different embodiments define the L-shaped templates in different ways. In some embodiments, all pixels of the L-shaped template are outside of reference_block_for_ guessing (e.g., the "outer pixel" label in FIG. 4). In some embodiments, all pixels of the L-shaped template are inside reference_block_for_ guessing (e.g., the "interior pixel" label in FIG. 4). In some embodiments, some pixels of the L-shaped template are outside of reference_block_for_ guessing, while some pixels of the L-shaped template are inside of reference_block_for_ guessing. Fig. 5 shows an L-shaped template of the current PU (current template) in the current picture, similar to fig. 4, and the L-shaped template in the reference picture (outer pixel embodiment) has no top-left pixels.
In some embodiments, the L-shape matching method and corresponding L-shape template (named template_std) are defined according to the following: assuming that the width of the current PU is BW and the height of the current PU is BH, the L-shaped template of the current picture has a top portion and a left portion. Defining top thickness=tth, left thickness=lth, then the top portion contains all current picture pixels with coordinates (ltx +tj, lty-ti), where ltx is the upper-left integer pixel horizontal coordinates of the current PU, lty is the upper-left integer pixel vertical coordinates of the current PU, ti is the index of the pixel row (ti is 0 to (TTH-1)), tj is the pixel index of the row (tj is 0 to BW-1). For the left portion, all current picture pixels with coordinates (ltx-tjl, lty +til) are included, where ltx is the upper left integer pixel horizontal coordinate of the current PU, lty is the upper left integer pixel vertical coordinate of the current PU, til is the pixel index of the column (til is 0 to (BH-1)), tjl is the index of the column (tjl is 0 to (LTH-1)).
In the template_std, the L-shaped template of the reference picture has a top portion and a left portion. Defining top thickness=tthr, left thickness= LTHR, then the top portion includes all reference picture pixels with coordinates (ltxr +tjr, ltyr-tir+ shifty), where ltxr is the upper left integer pixel horizontal coordinate of reference_block_for_ guessing, ltyr is the upper left integer pixel vertical coordinate of reference_block_for_ guessing, tir is the index of the pixel row (tir is 0 to (TTHR-1)), tjr is the pixel index of the row (tjr is 0 to BW-1), shifty is the predetermined shift value. For the left portion, it consists of all reference picture pixels with coordinates of (ltxr-tjlr + shiftx, ltyr + tilr), where ltxr is the upper left integer pixel horizontal coordinate of reference_block_for_ guessing, ltyr is the upper left integer pixel vertical coordinate of reference_block_for_ guessing, tilr is the pixel index of the column (tilr is 0 to (BH-1)), tjlr is the index of the column (tjlr is 0 to (LTHR-1)), shiftx is the predetermined shift value.
If the current candidate has only L0 MV or only L1 MV, then there is one L-shaped template for the reference picture. But if the current candidate has both L0 and L1 MVs (bi-predictive candidates), the reference picture has 2L-shaped templates, one pointing to the L0 MV in the L0 reference picture and the other pointing to the L1 MV in the L1 reference picture.
In some embodiments, for an L-shaped template, the video codec has an adaptive thickness mode. The thickness is defined as the number of rows of pixels at the top of the L-shaped panel or the number of columns of pixels to the left of the L-shaped panel. For the aforementioned L-shaped template template_std, the top thickness of the L-shaped template of the current picture is TTH and the left thickness is LTH, and the top thickness of the L-shaped template of the reference picture is TTHR and the left thickness is LTHR. The adaptive thickness mode changes the top thickness or the left thickness according to some conditions, such as the current PU size or the current PU shape (width or height) or the QP of the current segment. For example, when the current PU height is ∈32, the adaptive thickness mode may set the top thickness=2, and when the current PU height is < 32, the adaptive thickness mode may set the top thickness=1.
In performing L-template matching, the video codec acquires an L-template of the current picture and an L-template of the reference picture, and compares (matches) the difference between the two templates. The difference between the pixels in the two templates (e.g., the sum of absolute differences, or SAD) is used as the cost of the MVs. In some embodiments, the video codec may obtain the selected pixels from the L-shaped templates of the current picture and the selected pixels from the L-shaped templates of the reference picture before calculating the difference between the selected pixels of the two L-shaped templates.
2. Geometric prediction mode (Geometric Prediction Mode, GPM) candidate list in VVC, geometric partition modes are supported for inter prediction. The Geometric Partition Mode (GPM) is transmitted using CU-level flags as one merge mode, and other merge modes include a regular merge mode, MMVD mode, CIIP mode, and a sub-block merge mode. For each possible CU size w×h=2mx2n (where m, n e {3 … }, excluding 8x64 and 64x 8), the geometric partition mode supports a total of 64 partitions.
Fig. 6 shows partitioning of CUs by Geometric Partition Mode (GPM). Each GPM partition or GPM split is characterized by a distance-angle pairing defining a bisector (bisecting line). The figure shows an example of GPM splitting grouped at the same angle. As shown, when GPM is used, the CU is divided into two parts by geometrically located straight lines. The location of the parting line is mathematically derived from the angle and offset parameters of the particular partition.
Each part of the geometric partition in the CU uses its own motion (vector) for inter prediction. Each partition only allows unidirectional prediction, i.e. each part has one motion vector and one reference index. Similar to conventional bi-prediction, uni-predictive motion constraints are applied to ensure that motion compensated prediction is performed only twice for each CU.
If the GPM is used for the current CU, a geometric partition index indicating the partition mode (angle and offset) of the geometric partition and two merge indexes (one for each partition) are further transmitted. The merge index of the geometric partition is used to select candidates from a unidirectional prediction candidate list (also referred to as a GPM candidate list). The maximum candidate number in the GPM candidate list is explicitly sent in the SPS to specify the syntax binarization of the GPM merge index. After predicting each portion of the geometric partition, the sample values along the geometric partition edges are adjusted using a blending process with adaptive weights. This is the prediction signal of the entire CU, and the transform and quantization process will be applied to the entire CU as in other prediction modes. The motion field of the CU predicted by the GPM is then stored.
The unidirectional prediction candidate list (GPM candidate list) of the GPM partition may be directly derived from the merge candidate list of the current CU. Fig. 7 shows an exemplary uni-directional prediction candidate list 0700 for a GPM partition and selection of uni-directional prediction MVs for a GPM. The GPM candidate list 0700 is constructed in a parity manner with only uni-directional prediction candidates alternating between L0 MV and L1 MV. N is set as an index of unidirectional predicted motion in the unidirectional prediction candidate list of the GPM. LX (i.e., L0 or L1) motion vector of the nth extended merge candidate, where X is equal to the parity of n, is used as the nth unidirectional predicted motion vector of the GPM. (these motion vectors are marked with "X" in the figure.) in the absence of the corresponding LX motion vector of the nth extended merge candidate, the L (1-X) motion vector of the same candidate is used as the unidirectional predicted motion vector of the GPM.
As previously described, sample values along geometric partition edges are adjusted using a blending process with adaptive weights. Specifically, after predicting each part of the geometric partition using its own motion, a mixture is applied to the two prediction signals to derive samples around the edges of the geometric partition. The blending weight for each location of the CU is derived based on the distance between the corresponding location and the partition edge. The distance of the location (x, y) to the partition edge is derived as follows:
Where i, j is the index of the angle and offset of the geometric partition, which depends on the transmitted geometric partition index. The sign of ρx, j and ρy, j depends on the angle index i. The weights for each part of the geometric partition are derived as follows:
wIdxL,(x,y)=partIdx32+d(x,y):32-d(x,y) (5)
w1(x,y)=1-w0(x,y) (7)
the variable partIdx depends on the angle index i. Fig. 8 shows an example partition edge blending process for a GPM of CU 0800. In the figure, the mixing weights are generated based on the initial mixing weight w 0.
As described above, the motion field of the CU predicted using the GPM is stored. Specifically, mv1 from the first part of the geometric partition, mv2 from the second part of the geometric partition, and the combined Mv of Mv1 and Mv2 are stored in the motion field of the CU encoded and decoded by the GPM. The stored motion vector type for each individual location in the motion field is determined as:
sType=abs(motionIdx)<32?
2:(motionIdx≤0?(1-partIdx):partIdx) (8)
wherein motionIdx is equal to d (4x+2, 4y+2), which is recalculated from equation (4-1). partIdx depend on the angle index i. If sType is equal to 0 or 1, then Mv0 or Mv1 is stored in the corresponding motion field, otherwise if sType is equal to 2, then Mv from the combination of Mv0 and Mv2 is stored. The combined Mv is generated using the following process: (i) If Mv1 and Mv2 are from different reference picture lists (one from L0 and the other from L1), then Mv1 and Mv2 are simply combined to form a bi-predictive motion vector; (ii) Otherwise, if Mv1 and Mv2 are from the same list, only unidirectional predicted motion Mv2 is stored.
A block encoded by the GPM may have one partition encoded in an inter mode and one partition encoded in an intra mode. Such a GPM mode may be referred to as having intra-and inter-frame GPM, or GPM-intra. Fig. 9 shows a CU 0900 being coded by GPM-intra, where a first GPM partition 0910 is coded by intra prediction and a second GPM partition 0920 is coded by inter prediction.
In some embodiments, each GPM partition has a corresponding flag in the bitstream to indicate whether the GPM partition is encoded by intra-prediction or inter-prediction. For GPM partitions (e.g., partition 0920) that use inter prediction for coding, the prediction signal is generated from MVs from the merge candidate list of CUs. For GPM partitions (e.g., partition 0910) that use intra prediction for encoding and decoding, the prediction signal is generated from neighboring pixels for the intra prediction mode specified by the index from the encoder. The possible intra prediction mode variations may be constrained by geometry. The final prediction of a GPM codec CU (e.g., CU 0900) is generated by combining (mixing at the partition edge) the prediction of an inter-prediction partition and the prediction of an intra-prediction partition, as in the conventional GPM mode (i.e., a partition with two inter-predictions).
In some embodiments, bi-prediction candidates are allowed into the GPM candidate list by reusing the merge candidate list. In some embodiments, a merge candidate list (which includes uni-prediction and/or bi-prediction candidates) is used as the GPM candidate list. In some embodiments, the GPM candidate list including bi-prediction candidates (e.g., reusing the merge candidate list as described with reference to fig. 1 above) is allowed only in small CUs (having a size less than a threshold) and/or when GPM-intra (e.g., GPM mode combining inter and intra prediction as described with reference to fig. 9 above) is enabled in order to constrain the motion compensation bandwidth. Otherwise (CU is greater than or equal to the threshold), the GPM candidate list is constructed in a parity manner (e.g., GPM candidate list 0700 of fig. 7), where only unidirectional prediction is allowed.
3. GPM candidate reordering
As mentioned, the GPM candidate list may be derived from the merge candidate list, although the motion-compensated bandwidth constraint may limit the GPM candidate list to include only uni-directional prediction candidates (e.g., based on the size of the CU as mentioned in section two). MV selection behavior during GPM candidate list construction may lead to MV inaccuracy of the GPM mix. To improve codec efficiency, some embodiments of the present disclosure provide methods for candidate reordering and MV refinement for GPM.
In some embodiments, the video codec (encoder or decoder) reorders the GPM MV candidates (in the GPM candidate list) by ordering the GPM MV candidates in ascending order according to the template matching cost. The reordering behavior may be applied to the merge candidate list and/or the GPM candidate list itself prior to GPM candidate list construction. The TM cost of MVs in the GPM candidate list may be calculated by matching the reference template identified by the MVs in the reference picture with the current template of the current CU.
Fig. 10 conceptually illustrates CUs that are encoded and decoded by using MVs from a reordered GPM candidate list. As shown, the CU 1000 will be encoded and decoded by the GPM mode and will be partitioned into a first GPM partition 1010 and a second GPM partition 1020 based on the GPM distance-angle pair. A GPM candidate list 1005 is generated for the CU 1000. The GPM candidate list may be limited to have only unidirectional prediction candidates in a parity manner, or may reuse merge candidates including bidirectional prediction candidates. The TM cost of each candidate MV in the GPM candidate list 1005 is tested. Based on the calculated TM cost of candidate MVs, each MV is assigned a reordered index that can be sent in the bitstream. In an example, TM cost of "MV0" =30 and reordered index 1, TM cost of "MV1" =45 and reordered index 2, and so on.
In this example, to select candidate MVs for two GPM partitions, the video codec may send a reordered index of "0" to select "MV2" for partition 1010 and a reordered index of "2" to select "MV1" for partition 1020.
In some embodiments, the video codec reorders the partition (or partition) pattern of each GPM candidate in the GPM candidate list. The video codec obtains reference templates for all GPM split modes (i.e., all distance-angle GPM pairs of the CU, as described with reference to fig. 6 above) and calculates template matching costs for each GPM split mode. The GPM split mode is then reordered in ascending order according to TM cost. The video codec may identify the N candidates with the best TM cost as the available segmentation modes.
Figure 11 conceptually illustrates reordering different candidate GPM split modes according to TM cost when decoding CU 1100. The video codec calculates the TM cost for each GPM split mode (distance-angle pair) and assigns a reorder index to each GPM split mode based on the TM cost for the split mode. GPM predictors derived from different MV candidates and partition/split modes are reordered in ascending order by template matching cost. The video codec may designate the N best candidates with the least matching costs as the available partition modes.
In this example, split mode 1101 has TM cost=70 and is assigned a reordered index of "2", split mode 1102 has TM cost=45 and is assigned a reordered index of "1", split mode 1103 has TM cost=100 and is not assigned a reordered index (because it is not one of the N best candidates), split mode 1104 has TM cost=30 and is assigned a reordered index of "0", and so on. Thus, the video codec may send the selection of split mode 1104 by sending a reordered index of "0".
In some embodiments, the TM cost of the candidate GPM split mode is calculated based on MV predictors of the two GPM partitions of the candidate. In the example of fig. 11, to calculate the TM cost of a particular candidate GPM partition pattern (angle-distance pair) that partitions CU 1100 into GPM partitions 1110 and 1120, the MV predictors of the two GPM partitions are used to identify two corresponding reference templates (1115 and 1125). The two reference templates are combined (using edge blending) into one combined reference template. The template matching cost of the candidate GPM split is then calculated by matching the combined reference template to the current template 1105 of the CU 1100.
4. GPM motion vector refinement
In some embodiments, the video codec refines the MVs of each geometric partition (GPM partition) by a search based on Template Matching (TM) costs. The video codec may refine the motion vector of each geometric partition for each candidate (merge candidate or uni-directional prediction candidate only) in the GPM candidate list after a specific search process. This process includes several search steps. Each search step may be represented by a tuple (tuple) (identifier, search pattern, search step, iteration round). The search steps are sequentially performed in ascending order according to the value of the search step identifier. In some embodiments, the video codec refines MVs in the GPM candidate list prior to TM cost-based reordering. In some embodiments, the video codec refines MVs that have been selected by the GPM partition.
For some embodiments, the process of the search step (single run of iterative search) is as follows. For MVs to be used for coding a GPM partition (e.g., candidate MVs in a GPM candidate list), the video codec refines the MVs by:
inheriting the best MV and best cost of the previous round or previous search step; (if this is the first search step of the GPM partition, then use the initial MV of the GPM partition as the best MV);
Taking the optimal MV as the center of the search range;
constructing a MV candidate list (or MV search list) according to a search pattern (e.g., diamond, cross, brute force (brute force), etc.);
Calculating TM costs of all candidates in the constructed MV candidate list of the search pattern; and
The MV candidate (in the MV candidate list of search mode) with the smallest TM cost is identified as the refined MV of the GPM partition.
Figure 12 conceptually illustrates MV refinement based on TM cost. The calculation of the TM cost of MV is described by referring to fig. 4 and 5. In this example, for the nth search step, the video codec performs a round of search centered around the initial MV 1210, so that the TM cost of MV candidates at diamond locations near 1210 can be calculated. Where the TM cost of the MV candidate at location 1220 is the lowest (cost=70). Thereafter, the video codec performs another round of searching (n+1st searching step) centered around the MV location 1220 by calculating the TM cost of MV candidates at diamond-shaped locations near 1220. In this round of searching, the candidate at MV location 1230 has the best cost (cost=50), still lower than the previous best cost (70), so the search continues.
Initially, MV candidate lists are constructed from the search pattern (diamond/cross/other) and the best MVs inherited from the previous round or previous search step. The template matching cost for each MV candidate in the list is calculated. If the cost of the candidate MV with the smallest template matching cost (denoted tmp_cost) is less than the best cost, the best MV and best cost are updated. If the best cost is unchanged or the difference between tmp_cost and best cost is less than a particular threshold, the iterative search is terminated. If n rounds of searches have been performed, the entire search process is terminated. Otherwise, the MV will be iteratively refined.
In some embodiments, the video decoder applies different search modes at different resolutions at different iterations or rounds of the search process. Specifically, the motion vector of each geometric partition of each candidate in the GPM candidate list is refined by the following search process:
n1 rounds of full-pixel diamond search were performed,
N2 rounds of full-pixel cross-searches are performed,
A half-pixel cross search of n3 rounds is performed,
A quarter-pixel cross search of n4 rounds is performed,
A 1/8 pixel cross search of n5 rounds is performed,
An n6 round of 1/16 pixel cross search is performed.
At least one of n1 to n6 is greater than zero (e.g., n1=128, n2...n5=1, n6=0). If n is equal to 0, the search step is skipped. The mv candidates for diamond search include (2, 0), (1, 1), (0, 2), (-1, 1), (-2, 0), (-1, -1), (0, -2), (1, -1). The MV candidates for the cross search include (1, 0), (0, 1), (-1, 0), (0, -1).
In some embodiments, the motion vector for each geometric partition of each candidate in the GPM merge candidate list is refined by the following search process:
The search accuracy (for each search step) is determined by selecting from among full, half, quarter, 1/8 and 1/16 pixels.
And adding all MV candidates in the search range into the candidates according to the determined search precision.
The best MV candidate with the smallest template matching cost is found. The best MV candidate is the refined MV.
5. Example video encoder
Fig. 13 illustrates an example video encoder 1300 that can employ selection of prediction candidates based on TM cost. As shown, the video encoder 1300 receives an input video signal from a video source 1305 and encodes the signal into a bitstream 1395. The video encoder 1300 has several elements or modules for encoding a signal from a video source 1305, including at least some elements selected from the group consisting of: transform module 1310, quantization module 1311, inverse quantization module 1314, inverse transform module 1315, intra-picture estimation module 1320, intra-prediction module 1325, motion compensation module 1330, motion estimation module 1335, loop filter 1345, reconstruction slice buffer 1350, MV buffer 1365, MV prediction module 1375, and entropy encoder 1390. The motion compensation module 1330 and the motion estimation module 1335 are part of the inter prediction module 1340.
In some embodiments, the modules 1310-1390 are software instruction modules that are executed by one or more processing units (e.g., processors) of a computing device or electronic apparatus. In some embodiments, modules 1310-1390 are hardware circuit modules implemented by one or more integrated circuits (INTEGRATED CIRCUIT, simply ICs) of an electronic device. Although modules 1310-1390 are shown as separate modules, some modules may be combined into a single module.
The video source 1305 provides the original video signal that presents the pixel data of each video frame without compression. Subtractor 1308 calculates the difference between the original video pixel data of video source 1305 and predicted pixel data 1313 from motion compensation module 1330 or intra prediction module 1325. The transform module 1310 converts the difference values (or residual pixel data or residual signal 1308) into transform coefficients (e.g., by performing a discrete cosine transform or DCT). The quantization module 1311 quantizes the transform coefficients into quantized material (or quantized coefficients) 1312, which is encoded by an entropy encoder 1390 into a bitstream 1395.
The inverse quantization module 1314 dequantizes the quantized data (or quantized coefficients) 1312 to obtain transform coefficients, and the inverse transform module 1315 performs an inverse transform on the transform coefficients to produce a reconstructed residual 1319. The reconstructed residual 1319 is added to the predicted pixel data 1313 to produce reconstructed pixel data 1317. In some embodiments, the reconstructed pixel data 1317 is temporarily stored in a line buffer (not shown) for intra prediction and spatial MV prediction. The reconstructed pixels are filtered by loop filter 1345 and stored in reconstructed slice buffer 1350. In some embodiments, the reconstructed slice buffer 1350 is a memory external to the video encoder 1300. In some embodiments, the reconstructed slice buffer 1350 is internal memory to the video encoder 1300.
The intra-picture estimation module 1320 performs intra-prediction based on the reconstructed pixel data 1317 to generate intra-prediction data. The intra prediction data is provided to the entropy encoder 1390 to be encoded into a bitstream 1395. The intra-prediction data is also used by the intra-prediction module 1325 to generate predicted pixel data 1313.
The motion estimation module 1335 performs inter prediction by generating MVs to refer to pixel data of previously decoded frames stored in the reconstructed slice buffer 1350. These MVs are provided to motion compensation module 1330 to generate predicted pixel data.
The video encoder 1300 does not encode the complete actual MVs in the bitstream, but generates predicted MVs using MV prediction, and the difference between MVs used for motion compensation and predicted MVs is encoded as residual motion data and stored in the bitstream 1395.
The MV prediction module 1375 generates a predicted MV based on a reference MV generated for encoding a previous video frame, i.e., a motion compensated MV for performing motion compensation. The MV prediction module 1375 retrieves the reference MV from the previous video frame from the MV buffer 1365. The video encoder 1300 stores MVs generated for the current video frame in the MV buffer 1365 as reference MVs for generating predicted MVs.
The MV prediction module 1375 uses the reference MVs to create predicted MVs. The predicted MV may be calculated by spatial MV prediction or temporal MV prediction. The difference (residual motion data) between the predicted MV and the motion compensated MV (MC MV) of the current frame is encoded into the bitstream 1395 by the entropy encoder 1390.
The entropy encoder 1390 encodes various parameters and data into the bitstream 1395 by using entropy encoding techniques such as context-adaptive binary arithmetic coding (CABAC) or huffman coding. The entropy encoder 1390 encodes various header elements, flags, along with quantized transform coefficients 1312 and residual motion data as syntax elements into a bitstream 1395. The bit stream 1395 is then stored in a storage device or transmitted to a decoder through a communication medium such as a network.
The loop filter 1345 performs a filtering or smoothing operation on the reconstructed pixel data 1317 to reduce codec artifacts, particularly at the boundaries of the pixel block. In some embodiments, the filtering operation performed includes a sample adaptive offset (SAMPLE ADAPTIVE offset, SAO for short). In some embodiments, the filtering operation includes an adaptive loop filter (adaptive loop filter, ALF for short).
Fig. 14 shows a portion of a video encoder 1300 that implements candidate prediction mode selection based on TM cost. In particular, the figure shows elements of the inter prediction module 1340 of the video encoder 1300. The candidate partition module 1410 provides a candidate partition mode indicator to the inter prediction module 1340. These possible candidate partition modes may correspond to various angle-distance pairs that define lines that divide the current block into two (or more) partitions according to the GPM. The MV candidate-identification module 1415 identifies MV candidates (as a GPM candidate list) available for GPM partition. The MV candidate-identification module 1415 may identify only uni-directional prediction candidates or reuse merging prediction candidates from the MV buffer 1365.
For each motion vector in the GPM candidate list and/or for each candidate partition mode, the template identification module 1420 retrieves neighboring samples from the reconstructed slice buffer 1350 as L-shaped templates. For a candidate partition mode that divides a block into two partitions, the template identification module 1420 may obtain neighboring pixels of the current block as two current templates and two L-shaped sets of pixels as two reference templates for the two partitions of the current block using two motion vectors.
Template identification module 1420 provides the reference template for the currently indicated codec mode and the current template to TM cost calculator 1430, TM cost calculator 1430 performing matching to generate TM costs for the indicated candidate partition modes. TM cost calculator 1430 may combine reference templates (with edge blending) according to the GPM pattern. TM cost calculator 1430 may also calculate TM costs for candidate MVs in the GPM candidate list. The TM cost calculator 1440 may also assign reordered indexes to candidate prediction modes (MV or partition modes) based on the calculated TM cost. Index reordering based on TM cost is described in section three above.
The calculated TM costs for the various candidates are provided to a candidate selection module 1440, which may use the TM costs to select the lowest cost candidate prediction mode for encoding the current block. The selected candidate prediction modes (which may be MV and/or partition modes) are indicated to the motion compensation module 1330 to complete the prediction for encoding the current block. The selected prediction mode is also provided to the entropy encoder 1390 for transmission in a bitstream. The selected prediction mode may be transmitted by using a corresponding reordered index of the prediction mode to reduce the number of bits transmitted. In some embodiments, the MV provided to the motion compensation 1330 is refined using the search process described in section four above (at MV refinement module 1445).
Fig. 15 conceptually illustrates a process 1500 of assigning indices to prediction candidates based on TM costs for encoding pixel blocks. In some embodiments, one or more processing units (e.g., processors) of a computing device are used to implement encoder 1300, encoder 1300 performing process 1500 by executing instructions stored in a computer readable medium. In some embodiments, the electronics implementing encoder 1300 perform process 1500.
The encoder receives (at block 1510) data to be encoded into a bitstream as a current block of pixels in a current picture. The encoder divides (at block 1520) the current block into a first partition and a second partition by a bisector defined by the angle-distance pair according to a Geometric Prediction Mode (GPM). The first partition may be encoded and decoded by inter prediction, which refers to samples in a reference picture, and the second partition may be encoded and decoded by intra prediction, which refers to neighboring samples of a current block in a current picture. Alternatively, both the first partition and the second partition may be encoded and decoded by inter prediction, which uses the first motion vector and the second motion vector from the list to reference samples in the first reference picture and the second reference picture.
The encoder identifies (at block 1530) a list of candidate prediction modes for encoding and decoding the first and second partitions. The different candidate prediction modes in the list may correspond to different bisectors defined by different angle-distance pairs. The different candidate prediction modes in the list may also correspond to different motion vectors that may be selected to generate inter prediction to reconstruct the first partition or the second partition of the current block. In some embodiments, the candidate motion vectors in the list are ordered (e.g., in ascending order) according to the calculated TM cost of the candidate motion vectors. In some embodiments, the list of candidate prediction modes includes only uni-directional prediction candidates and no bi-directional prediction candidates when the current block is greater than a threshold size, and the list of candidate prediction modes includes merge candidates when the current block is less than the threshold size.
The encoder calculates (at block 1540) a Template Matching (TM) cost for each candidate prediction mode in the list. The encoder may calculate the TM cost of the candidate prediction mode by matching the current template of the current block with a combined template, which is a combination of the first reference template of the first partition and the second reference template of the second partition.
The encoder assigns (at block 1550) an index to the candidate prediction mode based on the calculated TM cost (e.g., fewer bits are needed for the lower cost candidate assigned index to be transmitted). The encoder sends (at block 1560) a selection of candidate prediction modes based on the index assigned to the selected candidate prediction mode.
The encoder encodes (at block 1570) the current block by using the selected candidate prediction modes, e.g., by defining the first partition and the second partition using the selected GPM partition, and/or by predicting and reconstructing the first partition and the second partition using the selected motion vector.
In some embodiments, the video encoder generates predictions for the first and second partitions by using the refined motion vectors to reconstruct the current block. The refined motion vector is identified by searching for the motion vector with the lowest TM cost based on the initial motion vector. In some embodiments, searching for the motion vector with the lowest TM cost includes iteratively applying a motion vector centered search pattern, the motion vector identified from a previous iteration as having the lowest TM cost (until a lower cost can no longer be found). In some embodiments, the encoder applies different search modes at different resolutions (e.g., 1 pixel, 1/2 pixel, 1/4 pixel, etc.) in different iterations or rounds during the search process to refine the motion vector.
6. Example video decoder
In some embodiments, the encoder may send (or generate) one or more syntax elements in the bitstream such that the decoder may parse the one or more syntax elements from the bitstream.
Fig. 16 illustrates an example video decoder 1600 that selects prediction candidates based on TV cost. As shown, video decoder 1600 is an image decoding or video decoding circuit that receives bitstream 1695 and decodes the content of the bitstream into pixel data for a video frame for display. The video decoder 1600 has several elements or modules for decoding the bitstream 1695, including elements selected from the group consisting of: an inverse quantization module 1611, an inverse transform module 1610, an intra prediction module 1625, a motion compensation module 1630, a loop filter 1645, a decoded picture buffer 1650, a MV buffer 1665, a MV prediction module 1675, and a parser 1690. The motion compensation module 1630 is part of the inter prediction module 1640.
In some embodiments, modules 1610-1690 are software instruction modules that are executed by one or more processing units (e.g., processors) of a computing device. In some embodiments, modules 1610-1690 are hardware circuit modules implemented by one or more ICs of an electronic device. Although modules 1610-1690 are shown as separate modules, some modules may be combined into a single module.
The parser 1690 (or entropy decoder) receives the bitstream 1695 and performs an initial parsing according to a syntax defined by a video codec or an image codec standard. The parsed syntax elements include various header elements, flags, and quantized data (or quantized coefficients) 1612. The parser 1690 uses entropy coding techniques, such as context-adaptive binary arithmetic coding (CABAC) or Huffman coding (Huffman coding).
The inverse quantization module 1611 dequantizes the quantized data (or quantized coefficients) 1612 to obtain transform coefficients, and the inverse transform module 1610 performs an inverse transform on the transform coefficients 1616 to generate a reconstructed residual signal 1619. The reconstructed residual signal 1619 is added to the predicted pixel data 1613 from the intra prediction module 1625 or the motion compensation module 1630 to produce decoded pixel data 1617. The decoded pixel data is filtered by loop filter 1645 and stored in decoded picture buffer 1650. In some embodiments, decoded picture buffer 1650 is memory external to video decoder 1600. In some embodiments, decoded picture buffer 1650 is memory internal to video decoder 1600.
The intra prediction module 1625 receives intra prediction data from the bitstream 1695 and, accordingly, generates predicted pixel data 1613 from the decoded pixel data 1617 stored in the decoded picture buffer 1650. In some embodiments, the decoded pixel data 1617 is also stored in a line buffer (not shown) for intra prediction and spatial MV prediction.
In some embodiments, the contents of the decoded picture buffer 1650 are for display. The display device 1655 either retrieves the contents of the decoded picture buffer 1650 for direct display or retrieves the contents of the decoded picture buffer to a display buffer. In some embodiments, the display device receives pixel values from the decoded picture buffer 1650 through pixel transmission.
The motion compensation module 1630 generates predicted pixel data 1613 from the decoded pixel data 1617 stored in the decoded picture buffer 1650 according to a motion compensated MV (MC MV). These motion compensated MVs are decoded by adding residual motion data received from the bitstream 1695 to the predicted MVs received from the MV prediction module 1575.
The MV prediction module 1675 generates a predicted MV based on a reference MV generated for decoding a previous video frame (e.g., a motion compensated MV used to perform motion compensation). The MV prediction module 1675 obtains the reference MV of the previous video frame from the MV buffer 1665. The video decoder 1600 stores motion compensated MVs generated for decoding the current video frame in the MV buffer 1665 as reference MVs for generating predicted MVs.
The loop filter 1645 performs a filtering or smoothing operation on the decoded pixel data 1617 to reduce coding artifacts, particularly at the boundaries of the pixel blocks. In some embodiments, the filtering operation performed includes a sample adaptive offset (SAMPLE ADAPTIVE offset, SAO for short). In some embodiments, the filtering operation includes an adaptive filter (adaptive loop filter, ALF for short).
Fig. 17 illustrates a portion of a video decoder 1600 that implements candidate prediction mode selection based on TM cost. In particular, the figure shows elements of the inter prediction module 1640 of the video decoder 1600. Candidate partition module 1710 provides a candidate partition mode indicator to inter prediction module 1640. These possible candidate segmentation modes may correspond to various angle-distance pairs that define lines that divide the current block into two (or more) partitions according to the GPM. The MV candidate-identification module 1715 identifies MV candidates (as a GPM candidate list) available for GPM partition. The MV candidate-identification module 1715 may identify only uni-directional prediction candidates or reuse merging prediction candidates from the MV buffer 1665.
For each motion vector in the GPM candidate list and/or for each candidate partition mode, the template identification module 1720 obtains neighboring samples from the reconstructed slice buffer 1650 as L-shaped templates. For a candidate partition mode that partitions a block into two partitions, the template identification module 1720 may obtain neighboring pixels of the current block as two current templates and use two motion vectors to obtain two L-shaped sets of pixels as two reference templates for the two partitions of the current block.
Template identification module 1720 provides the reference template of the current indicated prediction mode and the current template to TM cost calculator 1730, TM cost calculator 1730 performing matching to produce TM costs for the indicated candidate segmentation modes. TM cost calculator 1730 may combine reference templates (with edge blending) according to the GPM schema. TM cost calculator 1730 may also calculate TM costs for candidate MVs in the GPM candidate list. TM cost calculator 1740 may also assign reordered indices to candidate prediction modes (MV or partition modes) based on the calculated TM cost. Reordering of the TM cost based index is described in the three above sections.
The calculated TM costs are provided to a candidate selection module 1740, which may assign reordered indices to candidate prediction modes (MVs or partition modes) based on the calculated TM costs. The candidate selection module 1740 may receive signaling of the selected prediction mode from the entropy decoder 1690, which may use a TM cost based reordered index (to reduce the number of bits transmitted). The selected prediction mode (MV or partition mode) is indicated to the motion compensation module 1630 to complete the prediction for decoding the current block. In some embodiments, the MV provided to the motion compensation 1630 is refined (at MV refinement module 1745) using the search process described in section four above.
Fig. 18 conceptually illustrates a process 1800 that assigns indices to prediction candidates for decoding a block of pixels based on TM cost. In some embodiments, one or more processing units (e.g., processors) of a computing device implement decoder 1600 by executing instructions stored in a computer readable medium to perform process 1800. In some embodiments, the electronic device implementing decoder 1600 performs process 1800.
The decoder receives (at block 1810) data (from the bitstream) to be decoded as a current block of pixels in a current picture. The decoder divides (at block 1820) the current block into a first partition and a second partition according to a Geometric Prediction Mode (GPM) by a bisector defined by an angle-distance pair. The first partition may be encoded and decoded by inter prediction, which refers to samples in a reference picture, and the second partition may be encoded and decoded by intra prediction, which refers to neighboring samples of a current block in a current picture. Alternatively, both the first partition and the second partition may be encoded and decoded by inter prediction, which uses the first motion vector and the second motion vector from the list to reference samples in the first reference picture and the second reference picture.
The decoder identifies (at block 1830) a list of candidate prediction modes for encoding and decoding the first partition and the second partition. The different candidate prediction modes in the list may correspond to different bisectors defined by different angle-distance pairs. The different candidate prediction modes in the list may also correspond to different motion vectors that are selected to generate inter prediction to reconstruct the first partition or the second partition of the current block. In some embodiments, candidate motion vectors in the list are ordered (e.g., in ascending order) according to the calculated TM cost of the candidate motion vectors. In some embodiments, the list of candidate prediction modes includes only uni-directional prediction candidates and no bi-directional prediction candidates when the current block is greater than a threshold size, and the list of candidate prediction modes includes merge candidates when the current block is less than the threshold size.
The decoder calculates (at block 1840) a Template Matching (TM) cost for each candidate prediction mode in the list. The decoder may calculate the TM cost of the candidate prediction mode by matching the current template of the current block with a combined template ranging from the first reference template of the first partition to the second reference template of the second partition.
The decoder assigns (at block 1850) an index to the candidate prediction mode based on the calculated TM cost (e.g., fewer bits are needed for the lower cost candidate assigned index to be transmitted). The decoder transmits (at block 1860) the selection of candidate prediction modes based on the index assigned to the selected candidate prediction mode.
The decoder reconstructs (at block 1870) the current block by using the selected candidate prediction mode, e.g., by defining the first partition and the second partition using the selected GPM partition, and/or by predicting and reconstructing the first partition and the second partition using the selected motion vector. The decoder may then provide the reconstructed current block for display as part of the reconstructed current picture. In some embodiments, the video decoder reconstructs the current block by generating predictions for the first partition and the second partition using the refined motion vectors. The refined motion vector is identified by searching for the motion vector with the lowest TM cost based on the initial motion vector. In some embodiments, searching for the motion vector with the lowest TM cost includes iteratively applying a motion vector centered search pattern, the motion vector identified from a previous iteration as having the lowest TM cost (until a lower cost can no longer be found). In some embodiments, the decoder applies different search modes at different resolutions (e.g., 1-pixel, 1/2-pixel, 1/4-pixel, etc.) in different iterations or rounds during the search process to refine the motion vector.
7. Example electronic System
Many of the above features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer-readable storage medium (also referred to as a computer-readable medium). When executed by one or more computing or processing units (e.g., one or more processors, processor cores, or other processing units), cause the processing units to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, compact disk drives (compact disc read-only memou, CD-ROM), flash drives, random-access memroy (RAM) chips, hard disk drives, erasable programmable read-only memory (erasable programmble read-only memory, EPROM), electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROAGRAMMBLE READ-only memory, EEPROM), and the like. Computer readable media does not include carrier waves and electronic signals transmitted over a wireless or wired connection.
In this specification, the term "software" is intended to include firmware residing in read-only memory or applications stored in magnetic memory, which can be read into memory for processing by a processor. Furthermore, in some embodiments, multiple software inventions may be implemented as sub-portions of a larger program while retaining different software inventions. In some embodiments, multiple software inventions may also be implemented as separate programs. Finally, any combination of separate programs that collectively implement the software invention described herein is within the scope of the present disclosure. In some embodiments, a software program, when installed to run on one or more electronic systems, defines one or more particular machine implementations that process and execute the operations of the software program.
Fig. 19 conceptually illustrates an electronic system 1900 implementing some embodiments of the disclosure. Electronic system 1900 may be a computer (e.g., desktop computer, personal computer, tablet computer, etc.), telephone, PDA, or any other type of electronic device. Such electronic systems include various types of computer-readable media and interfaces for various other types of computer-readable media. Electronic system 1900 includes a bus 1905, a processing unit 1910, a graphics-processing unit (GPU) 1915, a system memory 1920, a network 1925, a read-only memory 1930, a persistent storage device 1935, an input device 1940, and an output device 1945.
Bus 1905 collectively represents all system, peripheral, and chipset buses for the numerous internal devices communicatively connected to electronic system 1900. For example, a bus 1905 communicatively connects the processing unit 1910 with the GPU 1915, read-only memory 1930, system memory 1920, and persistent storage 1935.
The processing unit 1910 obtains instructions to be executed and data to be processed from these various memory units in order to perform the processes of the present disclosure. In different embodiments, the processing unit may be a single processor or a multi-core processor. Some instructions are passed to and executed by GPU 1915. The GPU 1915 may offload various computations or supplement image processing provided by the processing unit 1910.
A read-only-memory (ROM) 1930 stores static data and instructions for use by the processing unit 1910 and other modules of the electronic system. On the other hand, the permanent storage device 1935 is a read-write storage device. The device is a non-volatile memory unit that stores instructions and data even when the electronic system 1900 is turned off. Some embodiments of the present disclosure use mass storage (e.g., magnetic or optical disks and their corresponding disk drives) as the permanent storage device 1935.
Other embodiments use removable storage devices (e.g., floppy disks, flash memory devices, etc., and their corresponding disk drives) as the permanent storage device. Like persistent storage 1935, system memory 1920 is a read-write memory device. However, unlike persistent storage 1935, system memory 1920 is a volatile (read-write) memory, such as random access memory. The system memory 1920 stores some instructions and data used by the processor at run-time. In some embodiments, processes according to the present disclosure are stored in system memory 1920, persistent storage 1935, and/or read-only memory 1930. For example, according to some embodiments of the present disclosure, various memory units include instructions for processing multimedia clips. From these various memory units, processing unit 1910 obtains instructions to be executed and data to be processed in order to perform the processes of some embodiments.
Bus 1905 is also connected to input device 1940 and output device 1945. Input device 1940 enables a user to communicate information and select commands to the electronic system. Input devices 1940 include an alphanumeric keyboard and pointing device (also referred to as a "cursor control device"), a camera (e.g., a webcam), a microphone, or similar devices for receiving voice commands, and so forth. An output device 1945 displays images or output data generated by the electronic system. The output device 1945 includes a printer and a display device such as a cathode ray tube (cathode ray tubes, abbreviated to CRT) or a Liquid Crystal Display (LCD), and a speaker or the like. Some embodiments include devices that function as input and output devices, such as touch screens.
Finally, as shown in FIG. 19, bus 1905 also couples electronic system 1900 through a network adapter (not shown) to network 1925. In this manner, the computer may be part of a computer network (e.g., a local area network ("LAN"), a wide area network ("WAN"), or an internal network, or be a network of a variety of networks, such as the Internet.
Some embodiments include electronic components, such as microprocessors, storage devices, and memory, that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as a computer-readable storage medium, machine-readable medium, or machine-readable storage medium). Some examples of such computer readable media include RAM, ROM, read-only compact discs, CD-ROM for short, recordable optical disk (recordable compact discs, CD-R for short), rewritable optical disk (rewritable compact discs, CD-RW for short), read-only digital versatile disk (read-only DIGITAL VERSATILEDISCS) (e.g., DVD-ROM, dual layer DVD-ROM), various recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, dvd+rw, etc.), flash memory (e.g., SD card, mini SD card, micro SD card, etc.), magnetic and/or solid state disk drives, read-only and recordableOptical discs, super-density optical discs, any other optical or magnetic medium, and floppy disks. The computer-readable medium may store a computer program executable by at least one processing unit and including a set of instructions for performing various operations. Examples of a computer program or computer code include machine code, such as produced by a compiler, and documents including high-level code that are executed by a computer, electronic component, or microprocessor using an annotator (interpreter).
While the above discussion primarily refers to a microprocessor or multi-core processor executing software, many of the above features and applications are performed by one or more integrated circuits, such as Application SPECIFIC INTEGRATED Circuits (ASICs) or field programmable gate arrays (field programmable GATE ARRAY FPGAs). In some embodiments, such integrated circuits execute instructions stored on the circuits themselves. In addition, some embodiments execute software stored in a programmable logic device (programmable logic device, PLD for short), ROM, or RAM device.
As used in this specification and in any of the claims of the present application, the terms "computer," "server," "processor," and "memory" all refer to electronic or other technical equipment. These terms do not include a person or group of people. For the purposes of this specification, the term display or display refers to displaying on an electronic device. As used in this specification and any claims of the present application, the terms "computer-readable medium," "computer-readable medium," and "machine-readable medium" are entirely limited to tangible physical objects that store information in a computer-readable form. These terms do not include any wireless signals, wired download signals, and any other transitory signals.
Although the present disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the present disclosure can be embodied in other specific forms without departing from the spirit of the disclosure. Further, many of the figures (including fig. 15 and 18) conceptually illustrate the processing. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in a continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process may be implemented using several sub-processes, or as part of a larger macro-process. It is therefore to be understood that by one of ordinary skill in the art, the present disclosure is not to be limited by the foregoing illustrative details, but is to be defined by the appended claims.
Supplementary description
The subject matter described herein sometimes represents different elements included in or connected to other different elements. It is to be understood that the depicted architectures are merely examples, and that in fact can be implemented by means of many other architectures to achieve the same functionality, and that in the conceptual sense any arrangement of elements that achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Thus, any two elements combined to achieve a particular functionality is seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated are viewed as being "operably connected," or "operably coupled," to each other to achieve the specified functionality. Any two elements that are capable of being interrelated are also considered to be "operably coupled" to each other to achieve a particular functionality. Any two elements that are capable of being interrelated are also considered to be "operably coupled" to each other to achieve a particular functionality. Specific examples of operable connections include, but are not limited to, physically mateable and/or physically interacting elements, and/or wirelessly interactable and/or wirelessly interacting elements, and/or logically interacting and/or logically interactable elements.
Furthermore, with respect to the use of substantially any plural and/or singular terms, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural depending upon the context and/or application. For clarity, the present invention explicitly sets forth different singular/plural permutations.
Furthermore, those skilled in the art will recognize that, in general, terms used herein, and particularly in the claims, such as the subject matter of the claims, are often used as "open" terms, e.g., "comprising" should be construed to "including but not limited to", "having" should be construed to "at least" include "," including "should be construed to" include but not limited to ", etc. It will be further understood by those with skill in the art that if a specific number of an introduced claim element is intended, such specific claim element will be explicitly recited in the claim element, and not explicitly recited in the claim element. For example, as an aid to understanding, the following claims may contain usage of the phrases "at least one" and "one or more" to introduce claim contents. However, the use of such phrases should not be construed to imply that the use of the indefinite articles "a" or "an" limits any particular claim. Even when the same claim includes the introductory phrases "one or more" or "at least one", indefinite articles such as "a" or "an" are to be interpreted to mean at least one or more, the same being true for use in introducing the explicit description of the claim. Moreover, even if a specific number of an introductory content is explicitly recited, one of ordinary skill in the art will recognize that such content should be interpreted to represent the recited number, e.g., "two references" without other modifications, meaning at least two references, or two or more references. Further, where a convention analogous to "at least one of A, B and C" is used, such convention is generally made so that one of ordinary skill in the art will understand the convention, for example, "at least one of systems include A, B and C" will include, but are not limited to, systems having a alone, systems having B alone, systems having C alone, systems having a and B, systems having a and C, systems having B and C, and/or systems having A, B and C, and the like. It will be further understood by those within the art that any separate word and/or phrase represented by two or more alternative terms, whether in the specification, claims, or drawings, shall be understood to include one of such terms, or the possibility of both terms. For example, "a or B" is to be understood as the possibility of "a", or "B", or "a and B".
From the foregoing, it will be appreciated that various embodiments of the invention have been described herein for purposes of illustration, and that various modifications may be made without deviating from the scope and spirit of the invention. Therefore, the various embodiments disclosed herein are not to be taken as limiting, and the true scope and application are indicated by the following claims.

Claims (14)

1. A video encoding and decoding method, comprising:
receiving data to be encoded or decoded as a current block of a current picture of a video, wherein the current block is divided into a first partition and a second partition by a bisector, and the bisector is defined by an angle-distance pair;
Identifying a list of candidate prediction modes for encoding and decoding the first partition and the second partition;
calculating the template matching cost of each candidate prediction mode in the list;
Receiving or transmitting a selection of candidate prediction modes based on an index, the index being assigned to the selected candidate prediction mode based on the calculated template matching cost; and
The current block is reconstructed by predicting the first partition and the second partition using the selected candidate prediction mode.
2. The video coding method of claim 1, wherein the template matching cost of the candidate prediction modes is calculated by matching a current template of the current block with a combined template, the combined template being the combined template of a first reference template of the first partition and a second reference template of the second partition.
3. The video coding method of claim 1, wherein the plurality of different candidate prediction modes in the list correspond to a plurality of different bisectors defined by a plurality of different angle-distance pairs.
4. The video coding method of claim 1, wherein a plurality of different candidate prediction modes in the list correspond to a plurality of different motion vectors, wherein the candidate prediction modes selected correspond to candidate motion vectors selected from the list to generate inter prediction to reconstruct the first partition or the second partition of the current block.
5. The video codec method of claim 4, wherein the candidate motion vectors in the list are ordered according to a plurality of template matching costs for a plurality of calculated candidate motion vectors.
6. The video coding method of claim 1, wherein the candidate prediction mode list (i) includes only uni-directional prediction candidates and does not include bi-directional prediction candidates when the current block is greater than a threshold size and (ii) includes merge candidates when the current block is less than a threshold size.
7. The video coding method of claim 1, wherein the first partition is coded by inter-prediction referencing a plurality of samples in a reference picture, and the second partition is coded by intra-prediction referencing a plurality of neighboring samples of the current block in the current picture.
8. The video coding method of claim 1, wherein the first partition and the second partition are coded by inter prediction that uses a first motion vector and a second motion vector from the list to reference a plurality of samples in a first reference picture and a second reference picture.
9. The video coding method of claim 1, wherein reconstructing the current block comprises generating predictions for the first partition and the second partition using a plurality of refined motion vectors, wherein refined motion vectors are identified by searching for a motion vector having a lowest template matching cost based on an initial motion vector.
10. The video codec method of claim 9, wherein searching for the motion vector with the lowest template matching cost comprises iteratively applying a motion vector centered search pattern, the motion vector identified as having the lowest template matching cost from a previous iteration.
11. The video coding method of claim 10, wherein searching for the motion vector with the lowest template matching cost comprises applying a plurality of different search modes at a plurality of different resolutions in a plurality of different iterations.
12. The video coding method of claim 1, wherein the candidate prediction mode list includes one or more merge candidates, wherein the template matching cost of a merge candidate is calculated by matching a current template of the current block with a reference template, the reference template being the reference template of a pixel block to which the merge candidate refers.
13. The video coding method of claim 12, wherein the candidate prediction mode list further comprises one or more geometric prediction mode candidates, wherein the template matching cost of geometric prediction mode candidates is calculated by matching a current template of the current block with a combined template, the combined template being the combined template of a first reference template of the first partition and a second reference template of the second partition.
14. An electronic device, comprising:
A video decoder or encoder circuit configured to perform a plurality of operations comprising:
receiving data to be encoded or decoded as a current block of a current picture of a video, wherein the current block is divided into a first partition and a second partition by a bisector, and the bisector is defined by an angle-distance pair;
Identifying a list of candidate prediction modes for encoding and decoding the first partition and the second partition;
calculating the template matching cost of each candidate prediction mode in the list;
Receiving or transmitting a selection of candidate prediction modes based on an index, the index being assigned to the selected candidate prediction mode based on the calculated template matching cost; and
The current block is reconstructed by predicting the first partition and the second partition using the selected candidate prediction mode.
CN202280059245.4A 2021-08-16 2022-08-15 Candidate reordering and motion vector refinement for geometric partition modes Pending CN118435605A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/233,346 2021-08-16
US202263318806P 2022-03-11 2022-03-11
US63/318,806 2022-03-11
PCT/CN2022/112566 WO2023020446A1 (en) 2021-08-16 2022-08-15 Candidate reordering and motion vector refinement for geometric partitioning mode

Publications (1)

Publication Number Publication Date
CN118435605A true CN118435605A (en) 2024-08-02

Family

ID=92310916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280059245.4A Pending CN118435605A (en) 2021-08-16 2022-08-15 Candidate reordering and motion vector refinement for geometric partition modes

Country Status (1)

Country Link
CN (1) CN118435605A (en)

Similar Documents

Publication Publication Date Title
CN110169061B (en) Coding and decoding electronic device and method
CN111886866B (en) Method and electronic device for encoding or decoding video sequence
KR102613889B1 (en) Motion vector correction with adaptive motion vector resolution
CN113455003B (en) Video encoding and decoding method and electronic equipment
CN111034194B (en) Method for coding and decoding video image and electronic equipment
US11245922B2 (en) Shared candidate list
CN113141783A (en) Intra prediction for multiple hypotheses
US20240259588A1 (en) Method, apparatus, and medium for video processing
US20240244187A1 (en) Method, apparatus, and medium for video processing
TWI814540B (en) Video coding method and apparatus thereof
WO2023020392A1 (en) Latency reduction for reordering merge candidates
CN118435605A (en) Candidate reordering and motion vector refinement for geometric partition modes
TWI847224B (en) Video coding method and apparatus thereof
WO2023174426A1 (en) Geometric partitioning mode and merge candidate reordering
WO2024027700A1 (en) Joint indexing of geometric partitioning mode in video coding
WO2024017224A1 (en) Affine candidate refinement
US20240259608A1 (en) Method, apparatus, and medium for video processing
WO2024007789A1 (en) Prediction generation with out-of-boundary check in video coding
US20240291967A1 (en) Method, apparatus, and medium for video processing
WO2023186040A1 (en) Bilateral template with multipass decoder side motion vector refinement
WO2023131047A1 (en) Method, apparatus, and medium for video processing
WO2023202569A1 (en) Extended template matching for video coding
WO2023193769A1 (en) Implicit multi-pass decoder-side motion vector refinement
WO2024083203A1 (en) Method, apparatus, and medium for video processing
WO2024156268A1 (en) Method, apparatus, and medium for video processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination