CN110662077A - Symmetric bi-directional prediction modes for video coding - Google Patents

Symmetric bi-directional prediction modes for video coding Download PDF

Info

Publication number
CN110662077A
CN110662077A CN201910586486.4A CN201910586486A CN110662077A CN 110662077 A CN110662077 A CN 110662077A CN 201910586486 A CN201910586486 A CN 201910586486A CN 110662077 A CN110662077 A CN 110662077A
Authority
CN
China
Prior art keywords
motion vector
video
motion
difference information
reference picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910586486.4A
Other languages
Chinese (zh)
Other versions
CN110662077B (en
Inventor
庄孝强
张莉
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202210763999.XA priority Critical patent/CN115396677A/en
Publication of CN110662077A publication Critical patent/CN110662077A/en
Application granted granted Critical
Publication of CN110662077B publication Critical patent/CN110662077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video bitstream processing method, comprising: generating second motion vector difference information based on the symmetry rule and the first motion vector difference information in response to a mirror mode flag in the video bitstream; reconstructing a video block using the first motion vector difference information and the second motion vector difference information, wherein the reconstructing is performed bi-predictively.

Description

Symmetric bi-directional prediction modes for video coding
Cross Reference to Related Applications
The present application claims in time the priority and benefit of international patent application No. pct/CN2018/093897 filed 2018, 6, month 30, according to applicable patent laws and/or according to the rules of the paris convention. The entire disclosure of international patent application No. pct/CN2018/093897 is incorporated by reference into a portion of the disclosure of the present application.
Technical Field
This document relates to image and video coding techniques.
Background
Digital video occupies the maximum bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that bandwidth requirements for digital video usage will continue to grow.
Disclosure of Invention
The disclosed techniques may be used by visual media decoder or encoder embodiments, where the symmetry of motion vectors is used to reduce the bits used to signal motion information to improve coding efficiency.
In one example aspect, a video bitstream processing method is disclosed. The method includes generating second motion vector difference information based on symmetry rules and the first motion vector difference information in response to a mirror mode flag in the video bitstream. The method also includes reconstructing a video block in the current picture using the first motion vector difference information and the second motion vector difference, wherein the reconstructing is performed using bi-prediction.
In another example aspect, another video bitstream processing method is disclosed. The method includes receiving motion vector difference information for a first set of motion vectors for a first reference picture list associated with a video block. The method further comprises deriving motion vector difference information associated with a second set of motion vectors of a second reference picture list associated with the video block from the motion vector difference information of the first set of motion vectors using a multi-hypothesis symmetry rule, wherein the multi-hypothesis symmetry rule specifies the second motion vector difference value as (0,0) and the corresponding motion vector predictor is set to a mirror motion vector value derived from the first motion vector difference information, and performing a conversion between the video block and a bitstream representation of the video block using the derived result.
In another example aspect, another video bitstream processing method is disclosed. The method includes receiving, for a video block, first motion vector difference information associated with a first reference picture list. The method further comprises for the video block, receiving second motion vector difference information associated with the second reference picture list, and deriving third motion vector difference information associated with the first reference picture list and fourth motion vector difference information associated with the second reference picture list from the first motion vector difference information and the second motion vector difference information using a multi-hypothesis symmetry rule, wherein the multi-hypothesis symmetry rule specifies that the second motion vector difference value is (0,0) and the corresponding motion vector predictor is set to the mirrored motion vector value derived from the first motion vector difference value information.
In another example aspect, another video bitstream processing method is disclosed. The method includes receiving a future frame of the video relative to a reference frame of the video, receiving motion vectors associated with the future frame of the video and a past frame of the video, applying a predetermined relationship between the future frame of the video and the past frame of the video, and reconstructing the past frame of the video based on the future frame of the video, the motion vectors, and the predetermined relationship between the past frame of the video and the future frame of the video.
In another example aspect, the above method may be implemented by a video decoder apparatus comprising a processor.
In another example aspect, the above-described method may be implemented by a video encoder apparatus that includes a processor for decoding encoded video during a video encoding process.
In yet another example aspect, the methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.
These and other aspects are further described in this document.
Drawings
Figure 1 shows an example of the derivation process for the Merge candidate list construction.
Fig. 2 shows example positions of spatial Merge candidates.
Fig. 3 is a diagram of motion vector scaling for spatial motion vector candidates.
Fig. 4 shows an example derivation process for motion vector prediction candidates.
Fig. 5 shows an example of candidate pairs considered for redundancy checking of spatial Merge candidates.
Fig. 6 shows example locations of a second PU partitioned by Nx2N and 2 NxN.
Fig. 7 is an example of motion vector scaling for the temporal Merge candidate.
FIG. 8 shows an example of candidate positions for the time Merge candidate labeled C0 and C1.
Fig. 9 shows an example of combined bidirectional predictive Merge candidates.
FIG. 10 illustrates an example of a bilateral matching process.
Fig. 11 shows an example of a template matching process.
Fig. 12 shows an example of unilateral Motion Estimation (ME) in frame rate up-conversion (FRUC).
Fig. 13 shows an example of a double-sided template matching process.
Fig. 14 shows an example of an Alternative Temporal Motion Vector Prediction (ATMVP) method.
Fig. 15 shows an example of identifying a source block and a source picture.
Fig. 16 is an example of one Coding Unit (CU) having four sub-blocks (a-D) and its neighboring sub-blocks (a-D).
Fig. 17 shows a block diagram example of a video encoding apparatus.
Fig. 18 is a block diagram of an example of a video processing apparatus.
Fig. 19 is a flowchart of an example of a video bitstream processing method.
Fig. 20 is a flowchart of another example of a video bitstream processing method.
Detailed Description
Section headings are used in this document to facilitate understanding, and do not limit the embodiments disclosed in the sections to only that section. As such, embodiments from one section may be combined with embodiments from other sections. Furthermore, although certain embodiments are described with reference to a particular video codec, the disclosed techniques are also applicable to other video coding techniques. Furthermore, while some embodiments describe the video encoding steps in detail, it should be understood that the corresponding decoding steps of the de-encoding will be performed by the decoder. Furthermore, the term video processing covers video encoding or compression, video decoding or decompression and video transcoding, where video pixels are represented from one compression format to another compression format or at different compression bit rates.
This document provides various techniques that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct decoded frames for further encoding.
Signaling of bi-prediction in HEVC
In HEVC, inter PU-level signaling (inter PU-level signaling) can be divided into three different modes. Table 1 and table 2 show relevant syntax elements for inter PU signaling in HEVC. The first mode is a skip mode, where only a single Merge index (Merge _ idx) needs to be signaled. The second mode is a Merge mode in which only a Merge flag (Merge _ flag) and a Merge index (Merge _ idx) are signaled. The third mode is an AMVP mode in which a direction index (inter _ pred _ idc), a reference index (ref _ idx _ l0/ref _ idx _ l1), an mvp index (mvp _ l0_ flag/mvp _ l1_ flag), and an MVD (MVD _ coding) are signaled.
Of all three, the bi-predictive AMVP mode provides a more rate-consuming scenario, while it provides degrees of freedom to capture various motion models, including acceleration and other non-linear motion models. The motion vectors of the two lists are separately signaled to provide this degree of freedom.
AMVP derivation in HEVC
Motion vector prediction in AMVP mode
Motion vector prediction exploits the spatial-temporal correlation with the motion vectors of neighboring PUs, which is used for explicit transmission of motion parameters. It constructs a motion vector candidate list by first checking the availability of the left, upper neighboring PU position in time, removing redundant candidates and adding zero vectors to make the candidate list a constant length. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similarly, with the Merge index signaling, a truncated unary code (truncated unary) is used to encode the index of the best motion vector candidate. The maximum value to be encoded in this case is 2. In the following sections, details regarding the derivation process of motion vector prediction candidates are provided. TABLE 1 inter PU syntax elements in HEVC
Figure BDA0002114608320000051
TABLE 2 syntax elements for MVD coding in HEVC
Figure BDA0002114608320000061
Motion vector prediction candidates
Fig. 1 summarizes the derivation of motion vector prediction candidates.
In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For spatial motion vector candidate derivation, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions, as shown in fig. 2.
For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates derived based on two different co-located (co-located) positions. After making the first list of spatio-temporal candidates, the duplicate motion vector candidates in the list are removed. If the number of potential candidates is greater than two, the motion vector candidates whose reference picture index within their associated reference picture list is greater than 1 are removed from the list. If the number of spatio-temporal motion vector candidates is less than two, additional zero motion vector candidates are added to the list.
Spatial motion vector candidates
In the derivation of spatial motion vector candidates, at most two candidates are considered among the five potential candidates derived from PUs located at the positions shown in fig. 2, which are the same as the position of the motion Merge. The derivation order to the left of the current PU is defined as A0, A1, scaled A0, and scaled A1. The derivation order of the upper side of the current PU is defined as B0, B1, B2, scale B0, scale B1, and scale B2. Thus, for each side, there are four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling and two of which use spatial scaling. Four different cases are summarized below.
No space scaling
(1) The same reference picture list, and the same reference picture index (the same Picture Order Count (POC))
(2) Different reference picture lists, but the same reference picture index (same POC)
Spatial scaling
(3) Same reference picture list, but different reference picture indices (different POCs)
(4) Different reference picture lists, and different reference picture indices (different POCs)
The non-spatial scaling case is checked first and then the spatial scaling is checked. Spatial scaling is considered when POC is different between the reference picture of the neighboring PU and the reference picture of the current PU, regardless of the reference picture list. Scaling for the motion vectors described above may facilitate parallel derivation of left and upper MV candidates if all PUs of the left candidate are unavailable or intra-coded. Otherwise, the motion vectors do not allow spatial scaling.
In the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner as in the temporal scaling, as shown in fig. 3. The main difference is that the reference picture list and index of the current PU are given as input; the actual scaling procedure is the same as the time scaling procedure.
Temporal motion vector candidates
All processes for deriving temporal Merge candidates are the same as those for deriving spatial motion vector candidates, except for reference picture index derivation. The reference picture index is signaled to the decoder.
Merge mode in HEVC
Candidates for Merge mode
When predicting a PU using the Merge mode, the index pointing to an entry in the Merge candidate list is parsed from the bitstream and used to retrieve motion information. The construction of this list is specified in the HEVC standard and can be generalized according to the following sequence of steps:
step 1: initial candidate derivation
-step 1.1: spatial candidate derivation
-step 1.2: redundancy check for spatial candidates
-step 1.3: temporal candidate derivation
Step 2: additional candidate insertions
-step 2.1: bi-directional prediction candidate creation
-step 2.2: zero motion candidate insertion
These steps are also schematically depicted in fig. 4. For spatial Merge candidate derivation, up to four Merge candidates are selected among the candidates located at five different positions. For time Merge candidate derivation, at most one Merge candidate is selected among the two candidates. Since the number of candidates per PU is assumed to be constant at the decoder, additional candidates are generated when the number of candidates does not reach the maximum number of Merge candidates (MaxNumMergeCand) signaled in the slice header. Since the number of candidates is constant, the index of the best Merge candidate is encoded using truncated unary binarization (TU). If the size of the CU is equal to 8, all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2N × 2N prediction unit.
In the following subsections, detailed operations of each of the above steps are described.
Spatial candidates
In the push of the spatial Merge candidates, up to four Merge candidates are selected from the candidates located at the positions shown in FIG. 2. The order of derivation was a1, B1, B0, a0, and B2. Position B2 is considered only when any PU of positions a1, B1, B0, a0 is unavailable (e.g., because it belongs to another slice or slice) or is intra-coded. After the candidate at position a1 is added, the addition of the remaining candidates is subject to a redundancy check that ensures that candidates with the same motion information are excluded from the list, thereby improving coding efficiency. In order to reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only pairs linked with arrows in fig. 5 are considered, and only when the corresponding candidates for redundancy check do not have the same motion information, the candidates are added to the list. Another source of duplicate motion information is the "second PU" associated with a partition other than 2Nx 2N. As an example, fig. 6 depicts the second PU in the N × 2N and 2N × N cases, respectively. When the current PU is partitioned into nx2N, the candidate at position a1 is not considered for list construction. In fact, adding this candidate will result in two prediction units having the same motion information, which is redundant for having only one prediction unit in the coding unit. Similarly, when the current PU is split into 2N × N, position B1 is not considered.
Time candidates
In this step, only one candidate is added to the list. In particular, in the derivation of this temporal Merge candidate, the scaled motion vector is derived based on the co-located PU belonging to the picture with the smallest POC difference from the current picture within the given reference picture list. The reference picture lists to be used for deriving co-located PUs are explicitly signaled in the slice header. As indicated by the dashed line in fig. 7, a scaled motion vector for the temporal Merge candidate is obtained, which is scaled from the motion vector of the co-located PU using the POC distance, tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of the temporal Merge candidate is set equal to zero. A practical implementation of the scaling process is described in the HEVC specification. For B slices, two motion vectors are obtained and combined, one for reference picture list 0(list0) and the other for reference picture list 1(list1), to form a bi-predictive Merge candidate.
In co-located pu (y) belonging to the reference frame, the position of the temporal candidate is selected between candidates C0 and C1, as shown in fig. 8. If the PU at location C0 is not available, is intra-coded, or is outside of the current Coding Tree Unit (CTU), location C1 is used. Otherwise, position C0 is used to derive the time Merge candidate.
Additional candidate insertions
In addition to spatiotemporal Merge candidates, there are two other types of Merge candidates: a combined bi-directional predicted Merge candidate and zero Merge candidate. A combined bi-directional predicted Merge candidate is generated by using the spatio-temporal Merge candidates. The combined bi-directionally predicted Merge candidates are for B slices only. A combined bi-directional prediction candidate is generated by combining the first reference picture list motion parameters of the initial candidate with the second reference picture list motion parameters of the other. If these two tuples provide different motion hypotheses they will form new bi-directional prediction candidates. As an example, fig. 9 depicts the case when two candidates in the original list (on the left) have mvL0 and refIdxL0 or mvL1 and refIdxL1 for creating a combined bi-predictive Merge candidate (on the right) that is added to the final list. There are many rules regarding the combinations that are considered to generate these additional Merge candidates.
Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list to reach MaxUMMergeCand capacity. These candidates have zero spatial displacement and a reference picture index that starts from zero and increases each time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is one and two of unidirectional and bi-directional prediction, respectively. Finally, no redundancy check is performed on these candidates.
Pattern-matched motion vectors
The Pattern Matched Motion Vector Derivation (PMMVD) mode is a special Merge mode based on Frame-Rate Up Conversion (FRUC) techniques. With this mode, the motion information of the block is not signaled, but derived at the decoder side.
When the Merge flag of a CU is true, a FRUC flag is signaled for that CU. When the FRUC flag is false, the Merge index is signaled and the normal Merge mode is used. When the FRUC flag is true, an additional FRUC mode flag is signaled to indicate which method (bilateral matching or template matching) will be used to derive the motion information for the block.
At the encoder side, the decision whether to use FRUC Merge mode for a CU is based on a cost of development (cost) selection, as is done for normal Merge candidates. That is, two matching patterns (bilateral matching and template matching) for a CU are verified by using RD cost selection. The one that results in the smallest cost is further compared to other CU patterns. If the FRUC matching pattern is the most efficient pattern, the FRUC flag is set to true for the CU and the associated matching pattern is used.
The motion derivation process in FRUC target mode has two steps. CU-level motion search is performed first, followed by sub-CU-level motion refinement. At the CU level, an initial motion vector is derived for the entire CU based on bilateral matching or template matching. First, a list of MV candidates is generated and the candidate that results in the smallest matching cost is selected as the starting point for further CU-level refinement. Then, a local search based on bilateral matching around the starting point or template matching is performed, and the MV that results in the minimum matching cost is taken as the MV of the entire CU. The motion information is then further refined at the sub-CU level, with the derived CU motion vector as a starting point.
For example, the following derivation process is performed for W × H CU motion information derivation. In the first stage, MVs for the entire W × H CU are derived. In the second stage, the CU is further divided into M × M sub-CUs. The value of M is calculated as in (1), D is a predefined division depth, which is set to 3 by default in JEM. The MV of each sub-CU is then derived.
Figure BDA0002114608320000101
As shown in fig. 10, bilateral matching is used to derive motion information of a current CU by finding the closest match between two blocks along the motion trajectory of the current CU in two different reference pictures. Under the assumption of a continuous motion trajectory, the motion vectors MV0 and MV1 pointing to the two reference blocks should be proportional to the temporal distance between the current picture and the two reference pictures (i.e., TD0 and TD 1). As a special case, the bilateral matching becomes a mirror-based bidirectional MV when the current picture is temporally between two reference pictures and the temporal distance from the current picture to the two reference pictures is the same.
As shown in fig. 11, template matching is used to derive motion information for a current CU by finding the closest match between a template in the current picture (the top and/or left neighboring blocks of the current CU) and a block in a reference picture (the same size as the template). In addition to the FRUC Merge mode, template matching is also applicable to the AMVP mode. In JEM, there are two AMVP candidates. New candidates are derived using a template matching method. If the newly derived candidate by template matching is different from the first existing AMVP candidate, it is inserted into the very beginning of the AMVP candidate list and then the list size is set to 2 (meaning the second existing AMVP candidate is removed). When applied to AMVP mode, only CU level search is applied.
CU-LEVEL MV candidate set
MV candidates set at the CU level include:
-original AMVP candidate if current CU is in AMVP mode
-all of the Merge candidates,
-interpolating several MVs in the MV field.
Top and left neighboring motion vectors
When using bilateral matching, each valid MV of the Merge candidate is used as input to generate MV pairs under the assumption of bilateral matching. For example, one valid MV of the Merge candidate is located at reference list a (MVa, refa). Then, the reference picture refb of its paired bilateral MV is found in another reference list B, so that refa and refb are temporally located on different sides of the current picture. If there is no such refb in reference list B, refb is determined to be a reference other than refa and its temporal distance to the current picture is the smallest in list B. After refb is determined, MVb is derived by scaling MVa based on the temporal distance between the current picture and refa, refb.
Four MVs from the interpolated MV field are also added to the CU level candidate list. More specifically, interpolation MVs at positions (0,0), (W/2,0), (0, H/2) and (W/2, H/2) of the current CU are added.
When FRUC is applied to AMVP mode, the original AMVP candidate is also added to the CU-level MV candidate set.
At the CU level, AMVP CU is added to the candidate list with up to 15 MVs and Merge CU is added to the candidate list with up to 13 MVs.
sub-CU level MV candidate set
The MV candidate sets set at the CU level include:
-searching the determined MVs from the CU level,
top, left side, upper left corner and upper right corner neighboring MVs,
-a scaled version of the co-located MVs from the reference picture,
-at most 4 ATMVP candidates,
up to 4 STMVP candidates
The scaled MV from the reference picture is derived as follows. All reference pictures in both lists are traversed. The MVs at the co-located positions of the sub-CUs in the reference picture are scaled to the reference of the starting CU-level MV.
ATMVP and STMVP candidates are limited to the first four.
At the sub-CU level, at most 17 MVs are added to the candidate list.
Generation of interpolated MV fields
Before encoding a frame, an interpolation motion field is generated for the whole picture based on one-sided ME. The motion field can then be used later as a CU-level or sub-CU-level MV candidate.
First, the motion field of each reference picture in the two reference lists is traversed at 4 × 4 block level. For each 4 x 4 block, if the motion associated with the block passes through a 4 x 4 block in the current picture (as shown in fig. 12) and the block is not assigned any interpolated motion, the motion of the reference block is scaled to the current picture according to temporal distances TD0 and TD1 (in the same way as MV scaling of TMVP in HEVC) and the scaled motion is assigned to the block in the current frame. If no scaled MV is assigned to the 4 x 4 block, the motion of the block is marked as unavailable in the interpolated motion field.
Interpolation and matching costs
When the motion vector points to a fractional sample position, a motion compensated interpolation is required. To reduce complexity, bilinear interpolation is used for bilateral matching and template matching instead of conventional 8-tap HEVC interpolation.
The matching cost is calculated slightly differently at different steps. When selecting candidates from the candidate set at the CU level, the matching cost is Sum of Absolute Differences (SAD) of bilateral matching or template matching. After determining the starting MV, the matching cost for the bilateral matching of the sub-CU level search is calculated as follows:
Figure BDA0002114608320000121
where w is a weighting factor empirically set to 4, MV and MVsRepresenting the current MV and the starting MV, respectively. SAD is still used as the matching cost for template matching for sub-CU level search.
In FRUC mode, the MV is derived by using only the luma samples. The derived motion will be used for both luma and chroma for MC inter prediction. After the MV is decided, the final MC is performed using an 8-tap interpolation filter for luminance and a 4-tap interpolation filter for chrominance.
MV refinement
MV refinement is a pattern-based MV search with criteria of bilateral matching cost or template matching cost. In JEM, two search modes are supported-an unrestricted center-biased diamond search (UCBDS) and MV refined adaptive cross-search at the CU level and sub-CU level, respectively. For both CU and sub-CU level MV refinement, the MV is searched directly with quarter luma sample MV accuracy, and then eighth luma sample MV refinement. The search range for MV refinement for the CU and sub-CU steps is set equal to 8 luma samples.
Selecting a prediction direction in template matching FRUC Merge mode
In the bilateral matching Merge mode, bi-prediction is always applied, since the motion information of a CU is derived based on the closest match between two blocks along the motion trajectory of the current CU in two different reference pictures. Template matching Merge patterns have no such limitation. In the template matching Merge mode, the encoder may select from list0 uni-directional prediction, list1 uni-directional prediction, or CU bi-directional prediction. The selection is based on the template matching cost, as follows:
if costBi & gt factor & ltmin (cost0, cost1)
Using bi-directional prediction;
otherwise, if cost0< ═ cost1
Using one-way prediction from list 0;
if not, then,
using unidirectional prediction from list 1;
where cost0 is the SAD of the List0 template match, cost1 is the SAD of the List1 template match, and cost Bi is the SAD of the bidirectional prediction template match. The value of the factor (factor) is equal to 1.25, which means that the selection process is biased towards bi-directional prediction.
Inter prediction direction selection is applied only to CU-level template matching processing.
Decoder-side motion vector refinement
In the bi-directional prediction operation, for prediction of one block region, two prediction blocks respectively formed using Motion Vectors (MVs) of list0 and MVs of list1 are combined to form a single prediction signal. In the decoder-side motion vector refinement (DMVR) method, two motion vectors of bi-prediction are further refined by a two-sided template matching process. The bilateral template matching is applied in the decoder to perform a distortion-based search between the bilateral template and reconstructed samples in the reference picture to obtain refined MVs without transmitting additional motion information.
In DMVR, a double-sided template is generated as a weighted combination (i.e., average) of two prediction blocks from the initial MV0 of list0 and the MV1 of list1, respectively, as shown in fig. 10. The template matching operation includes calculating a cost measure between the generated template and a sample region in the reference picture (around the initial prediction block). For each of the two reference pictures, the MV yielding the smallest template cost is considered as the updated MV of the list to replace the original template. In JEM, nine MV candidates are searched for each list. The nine MV candidates include the original MV and 8 surrounding MVs, where one luma sample shifts the original MV in the horizontal or vertical direction or both. Finally, two new MVs, MV0 'and MV 1', shown in fig. 10, are used to generate the final bi-directional prediction results. The Sum of Absolute Differences (SAD) is used as a cost measure.
DMVR is applied to the Merge mode for bi-prediction, where one MV is from a past reference picture and another MV is from a future reference picture, without transmitting additional syntax elements. In JEM, DMVR is not applied when LIC, affine motion, FRUC, or sub-CU Merge candidates are enabled for a CU.
Adaptive motion vector differential resolution
In HEVC, when use _ integer _ mv _ flag in a slice header is equal to 0, a Motion Vector Difference (MVD) (between a motion vector and a prediction motion vector of a PU) is signaled in units of one-quarter luma samples. In JEM, a Locally Adaptive Motion Vector Resolution (LAMVR) is introduced. In JEM, MVDs may be encoded in units of quarter luma samples, integer luma samples, or four luma samples. MVD resolution is controlled at the Coding Unit (CU) level, and an MVD resolution flag is conditionally signaled for each CU having at least one non-zero MVD component.
For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter luma sample MV precision is not used, another flag is signaled to indicate whether integer or four luma sample MV precision is used.
One-quarter luma sample MV resolution is used for a CU when the first MVD resolution flag of the CU is zero or not coded for the CU (meaning all MVDs in the CU are zero). When a CU uses integer luma sample MV precision or four luma sample MV precision, the MVPs in the CU's AMVP candidate list are rounded to the corresponding precision.
In the encoder, the CU-level RD check is used to determine which MVD resolution to use for the CU. That is, the CU-level RD check is performed three times for each MVD resolution. To speed up the encoder speed, the following encoding scheme is applied in JEM.
During RD-checking of a CU with a normal quarter luma sample MVD resolution, the motion information (integer luma sample accuracy) of the current CU is stored. The stored motion information (after rounding) is used as a starting point for further small-range motion vector refinement during RD-checking for the same CU with integer luma sample and 4 luma sample MVD resolution, so that the time-consuming motion estimation process is not repeated three times.
The RD check of a CU with a 4 luma sample MVD resolution is conditionally invoked. For a CU, when the RD cost integer luma sample MVD resolution is much greater than the quarter-luma sample MVD resolution, the RD check for the CU's 4 luma sample MVD resolution is skipped.
sub-CU-based motion vector prediction
In JEM, each CU may have at most one set of motion parameters for each prediction direction. By dividing a large CU into sub-CUs and deriving motion information of all sub-CUs of the large CU, two sub-CU-level motion vector prediction methods are considered in the encoder. The Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to obtain multiple sets of motion information from multiple blocks smaller than the current CU in the co-located reference picture. In a spatial-temporal motion vector prediction (STMVP) method, a motion vector of a sub-CU is recursively derived by using a temporal motion vector predictor and a spatial neighboring motion vector.
In order to preserve more accurate motion fields for sub-CU motion prediction, motion compression of reference frames is currently disabled.
Alternative temporal motion vector prediction
In an Alternative Temporal Motion Vector Prediction (ATMVP) method, the Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. As shown in fig. 11, a sub-CU is a square N × N block (N is set to 4 by default).
Fig. 13 shows an example of a double-sided template matching process. In a first step, a bilateral template is generated from the prediction block. In a second step, a two-sided template matching is used to find the best matching block.
ATMVP predicts the motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture using a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain a motion vector and a reference index of each sub-CU from a block corresponding to each sub-CU, as shown in fig. 14.
In the first step, the reference picture and the corresponding block are determined by motion information of spatially neighboring blocks of the current CU. To avoid a repeated scanning process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set to the index of the temporal vector and the motion source picture. In this way, in ATMVP, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes referred to as a co-located block) is always located at the bottom right or center position relative to the current CU. In one example, if the first Merge candidate is from a left neighboring block (i.e., A in FIG. 15)1) The associated MV and reference picture are utilized to identify the source block and the source picture.
In a second step, the corresponding block of the sub-CU is identified by a temporal vector in the motion source picture by adding a temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (the smallest motion grid covering the center sample) is used to derive the motion information of the sub-CU. After identifying the motion information of the corresponding nxn block, the motion information is converted into a motion vector and reference index of the current sub-CU in the same manner as the TMVP of HEVC, where motion scaling and other processes are applied. For example, the decoder checks whether a low delay condition is met (i.e., POC of all reference pictures of the current picture is less than POC of the current picture) and motion vector MVy (corresponding to motion vector of reference picture list X) for each sub-CU is predicted, possibly using motion vector MVx (where X equals 0 or 1 and Y equals 1-X).
Spatio-temporal motion vector prediction
In this method, the motion vectors of the sub-CUs are derived recursively in raster scan order. Fig. 16 illustrates this concept. Let us consider an 8 × 8 CU, which contains four 4 × 4 sub-CUs a, B, C, and D. The neighboring 4 x 4 blocks in the current frame are labeled a, b, c, and d.
The motion derivation of sub-CU a starts by identifying its two spatial neighbors. The first neighbor is the nxn block (block c) above the sub-CU a. If this block c is not available or intra-coded, the other nxn blocks above the sub-CU a are checked (from block c, left to right). The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or intra-coded, the other blocks to the left of sub-CU a are checked (starting from block b, top to bottom). The motion information obtained from the neighboring blocks of each list is scaled to the first reference frame of the given list. Next, the Temporal Motion Vector Predictor (TMVP) of sub-block a is derived by following the same procedure as the TMVP derivation specified in HEVC. The motion information of the co-located block at position D is obtained and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged separately for each reference list. The averaged motion vector is specified as the motion vector of the current sub-CU.
sub-CU motion prediction mode signaling
The sub-CU mode is enabled as an additional Merge candidate and no additional syntax element is needed to signal the mode. Two additional Merge candidate lists per CU are added to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, at most seven Merge candidates are used. The encoding logic of the additional Merge candidates is the same as that of the Merge candidates in the HM, which means that two additional Merge candidates require two additional RD checks for each CU in a P or B slice.
In JEM, all the Merge-indexed bins are context coded by CABAC. Whereas in HEVC, only the first bin is context-coded, while the remaining bins are context by-pass-coded.
Examples of problems addressed by embodiments
While MVDs provide great flexibility to accommodate various motions in a video signal, they form a large part of the bitstream. Especially during bi-directional prediction, the MVDs of L0 and L1 need to be signaled and they introduce large overhead, especially for low rate visual communication. Some properties about motion symmetry can be exploited to save the rate spent on coding of motion information.
The current AMVP mode (including both MVP index and reference index) signals for both L0 and L1, respectively, and when the motion follows a symmetry model, they can be represented more efficiently.
Examples of the embodiments
1. During bi-prediction, the base MV set for AMVP mode can be generated using the property of symmetry of motion vectors. Specifically, the MVDs are signaled only for a single direction (list), and the MV for the other direction is set using the mirroring condition. Alternatively, in addition, the MV may be further refined. This mode is called symmetric bi-prediction mode (sym-bi-mode). Here, bidirectional prediction refers to prediction by using one reference frame from the past and another reference frame from the future in display order. In some example embodiments, universal video coding (VVC) (e.g., jfet-N1001-v 5 and other versions and standards) includes a Symmetric Motion Vector Difference (SMVD) mode, which may skip signaling of L1 MVDs. The skipped L1 MVDs may be set as a mirror image of the L0 MVDs without scaling.
a. In one example, when L (1-N) (N ═ 0 or 1) MVDs are transmitted, the MVD values of LNs are not transmitted (i.e., inherited as (0, 0)), and MVPs are set as mirror MVs from L (1-N) MVs. Motion refinement may then be applied to the LN motion vectors.
(i) In one example, a DMVR refinement procedure may be applied. Alternatively, a FRUC refinement procedure may be applied to refine the LN motion vectors.
(ii) In one example, the refined search range may be predefined or signaled by SPS (Sequence Parameter Set), PPS (Picture Parameter Set), VPS (Video Parameter Set), or slice header.
(iii) In one example, motion refinement may be applied to a particular mesh. For example, a uniform sampling grid with a grid distance d may be used to define the search points. The mesh distance d may be predefined or signaled via SPS, PPS, VPS, or slice header. The use of a sampling grid can be considered a sub-sampled search area and therefore has the benefit of reducing the memory bandwidth required for the search.
(iv) In one example, signaling of the mirror mode may be done at the CU level, CTU level, area level (covering multiple CUs/CTUs), or stripe level. When it is done at the CU level, a one-bit flag needs to be signaled when it is sym-bi-mode. That is, when the flag is signaled as 1, the associated LN MVD and its MVP index may be skipped. When done at CTU level, regional level, or stripe level, all sym-bi-modes do not signal the LNMVD value and its MVP index. In some example embodiments, the signaling of the SMVD flag occurs at the CU level.
b. In one example, there is a one bit flag in the slice header/picture parameter set/sequence parameter set to signal whether the refinement procedure should be invoked. Alternatively, the signaling may also be done at the CU/CTU/area level.
c. In one example, during bi-prediction, which MVD list to skip may be signaled. The signaling may occur at the CU level, the area level, the CTU level, or the slice level. When signaling at the CU level, a one-bit flag needs to be signaled in sym-bi-mode. When signaled at the region level, CTU level or slice level, all of the bi-predictive CUs belonging to the designated list of MVDs will skip signaling and use the mirrored MVP as their starting point to find the final motion vector.
d. In one example, the mirror MVP need only be stored in the MV buffer for motion prediction (AMVP, Merge) of subsequent blocks. The refined motion vectors need not be stored in the MV buffer.
e. In one example, MVPs may be placed with the regular MVP index and one additional bit (2 total) is needed to signal three MVP indexes. In some embodiments, in SMVD mode, both MVP indices are signaled as normal AMVP mode.
f. In one example, a mirrored MVP candidate is added in place of the second AMVP candidate. However, only one bit is required to signal the MVP index.
g. In one example, the mirror MVP mode may be applied when the POC distance between two reference frames is equal. In some embodiments, in the SMVD mode, two references are derived as the closest reference frames to the current frame in L0 and L1.
h. In one example, the scaling introduced by mirroring may use the relative temporal distance between the source frame and the target frame. For example, if L (1-N) reference frames and LN reference frames are used, and it is decided to skip the MVD signaling of the LN, the initial motion vector of the LN (N ═ 0 or 1) can be calculated as: MVPN ═ MV (1-N) (τ N/τ (1-N)) · MV (1-N), where τ 0 and τ 1 denote the POC distance between the current frame and the reference frame of L0 and the POC distance between the current frame and the reference frame of L1, respectively.
2. Various matching schemes may be used to accomplish the refinement process. Let patches (patch) from L0 and L1 pictures be P0 and P1, respectively. A patch is defined as a predicted sample generated by the interpolation process of MVs.
The similarity between p0 and P1 is used as a criterion for selecting refined MVs. In one example, the refinement finds MVN (N ═ 0 or 1), which minimizes the Sum of Absolute Differences (SAD) between P0 and P1.
b. Temporary patches are generated by P0 and P1, and the criteria may be defined as finding the MV with the highest correlation between the predicted patch and the temporary patch. For example, a separate patch P '═ P0+ P1)/2 may be created and used to find MVN (N ═ 0 or 1), which minimizes the SAD between P' and PN. More generally, P' may be generated by the following formula: p' ═ ω · P0+ (1- ω) · P1, where ω is a weighting factor between 0 and 1.
c. In one example, a template-based matching scheme may be used to define the refinement process. The top template, the left template, or both the top and left templates may be used for lookup MVN (N ═ 0 or 1). The process of finding MVN (N ═ 0 or 1) is similar to the process described in the two examples above.
d. In one example, depending on the distance of the search points from the initial mirror MVP location, the interpolation process may be skipped for some of the search points. When searching for points whose distance to MVPN (N ═ 0 or 1) exceeds the threshold T, no interpolation process is involved. Only integer-pixel reference samples are used as patches to derive motion vectors. T may be predefined or may be signaled via SPS, PPS, VPS, or slice header.
e. In one example, the cost metric for finding MVN includes an estimated rate introduced to the mirrored MVP by the search point: C-SAD + λ · R, where λ is a weighting factor, is used to weight the importance of the estimated rate in the refinement process. The value of λ may be predefined, signaled by SPS, PPS, VPS, or slice header. Note that MVDN, MVN, and MVPN defined below are two-dimensional vectors.
i. In one example, R | | MVDN | |, where MVDN ═ MVN-MVPN. Here, the function | · | |, represents the L1 specification.
in one example, R ═ round (log)2(| MVDN |)), where the function round indicates the rounding function of the input argument (argument) to the nearest integer.
in one example, R ═ MVD _ coding (mvdn), where the function MVD _ coding indicates a standard-compliant binarization process of the input MVD value.
MVD _ L1_ ZERO _ FLAG is a stripe level FLAG that imposes a strong constraint on L1MVD signaling by removing all L1MVD values. Mirrored MVs and refinements can be used in conjunction with this design in the following manner.
a. In one example, when MVD _ L1_ ZERO _ FLAG is enabled, the MVP index is not signaled and the mirrored MVP constraint and refinement procedure can still be applied.
b. In one example, when MVD _ L1_ ZERO _ FLAG is enabled, the MVP index is still signaled (e.g., as in 1.e or 1.f above) and no mirrored MVP constraint is imposed. However, the MV refinement procedure can still be applied.
c. In one example, when MVD _ L1_ ZERO _ FLAG is enabled, the mirror MVP is added to the MVP candidate list, followed by the MV refinement process.
4. When signaling involving reference indices and MVP indices for LN (N ═ 0 or 1), a joint MVP list may be created to support the mirrored MVD mode. That is, the MVP list is jointly derived for L0 and L1 (given a pair of specific reference indices), and only a single index needs to be signaled.
a. In one example, the signaling of refIdxN may be skipped and only the reference frame closest to the mirrored position of the L (1-N) reference frame is selected as its reference frame for MVP scaling. In some embodiments, in SMVD mode, two reference indices are skipped because they are selected as the closest reference frame to the current frame in the two lists.
b. In one example, MVP candidates that cannot create a Bi predictor during the derivation process should be considered invalid.
c. In one example, the derivation may be done by following the existing procedure of MVP derivation for L (1-N), except that when scaling occurs, the candidate pair that results in motion vectors located on the reference frames of L0 and L1 in the Decoded Picture Buffer (DPB) is considered a valid candidate.
d. A mirror MVD mode may be represented, including:
if (sym _ mvd _ flag [ x0] [ y0]) retaining pocket
MvdL1[x0][y0][0]=-MvdL0[x0][y0][0]
MvdL1[x0][y0][1]=-MvdL0[x0][y0][1]
Else
5. The proposed method can also be applied to multi-hypothesis modes.
a. In this case, when there are two sets of MV information for each reference picture list, MV information may be signaled for one reference picture list. However, the MVD of the MV information set of another reference picture list may be derived. For each set of MV information of one reference picture list, it can be processed in the same manner as sym-bi-mode.
b. Alternatively, when there are two sets of MV information for each reference picture list, one set of MV information for both reference picture lists may be signaled. However, the other two sets of MV information for the two reference picture lists can be derived on the fly using sym-bi-mode.
Many video coding standards are based on hybrid video coding architectures, in which temporal prediction plus transform coding is utilized. An example of a typical HEVC encoder framework is depicted in fig. 17.
Fig. 18 is a block diagram 1800 of a video processing device. Apparatus 1800 may be used to implement one or more of the methods described herein. The apparatus 1800 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, or the like. The apparatus 1800 may include one or more processors 1802, one or more memories 1804, and video processing hardware 1806. The processor(s) 1802 may be configured to implement one or more of the methods described in this document. Memory (es) 1804 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 1806 may be used to implement some of the techniques described in this document in hardware circuitry.
Fig. 19 is a flow diagram of an example method 1900 of video bitstream processing. The method 1900 includes: generating (1902), in response to a mirror mode flag in the video bitstream, second motion vector difference information based on the symmetry rule and the first motion vector difference information; reconstructing (1904) the video block using the first motion vector difference and the second motion vector difference information, wherein the reconstructing is performed bi-predictively.
Fig. 20 is a flow diagram of an example method 2000 of video bitstream processing. The method 2000 includes: receiving (2002) motion vector difference information for a first set of motion vectors for a first reference picture list associated with a video block; and deriving (2004) motion vector difference information associated with a second set of motion vectors of a second reference picture list associated with the video block from the motion vector difference information of the first set of motion vectors using a multi-hypothesis symmetry rule. The information may be generated using motion vector difference information of the received first set of motion vectors.
In some embodiments, the method of video bitstream processing may include a variation of method 2000, wherein in the multi-hypothesis case, the partial motion vector difference information is signaled in an interleaved manner. Such a method comprises: receiving, for a video block, first motion vector difference information associated with a first reference picture list, and receiving, for the video block, second motion vector difference information associated with a second reference picture list; deriving third motion vector difference information associated with the first reference picture list and fourth motion vector difference information associated with the second reference picture list from the first motion vector difference information and the second motion vector difference information using a multi-hypothesis symmetry rule.
With respect to methods 1900 and 2000, bitstream processing may include generating a bitstream representing video in compressed form. Alternatively, bitstream processing may include using the bitstream to represent reconstructed video from its compressed form.
With respect to methods 1900 and 2000, in some embodiments, the symmetry rules and the multi-hypothesis symmetry rules may be the same or different. In particular, the multi-hypothesis symmetry rule may be used only when a video block (or picture) is coded using multi-hypothesis motion prediction.
With respect to methods 1900 and 2000, the symmetry rule may specify that the second motion vector prediction difference will be (0,0) and the corresponding motion vector predictor is set to the mirror motion vector, the value of which is derived from the first motion vector difference information. In addition, motion vector refinement may be further performed on the mirrored motion vector values. As described in the examples above, the mirroring mode may be selectively used based on an indication in the bitstream at the CU/CTU/region level. Similarly, the refinement flag may also be signaled to control the motion vector refinement to be used (or not used). The refinement flag may be used at slice header or picture parameter set, or sequence parameter set or region level or coding unit or coding tree unit level.
With respect to methods 1900 and 2000, using symmetry rule based techniques to generate mirror motion vectors may enable skipping the sending of motion vector difference information in the bitstream (as this information may be generated by the decoder). The skip operation may be selectively controlled via a flag in the bitstream. In an advantageous aspect, the mirrored MVP calculation using the above described techniques can be used at the decoder side to improve the decoding of subsequent blocks without suffering from the adverse effects of computational dependencies that may occur if refined motion vectors are used for the prediction of subsequent blocks.
With respect to methods 1900 and 2000, in some embodiments, the symmetry rule may only be used to generate mirror motion vectors if two reference frames have the same distance. Otherwise, scaling of the motion vector may be performed based on the relative temporal distance of the reference frame.
With respect to methods 1900 and 2000, in some embodiments, the mirrored motion vector may be calculated using a patch-based technique, and may include generating a first patch of predicted samples using a first motion vector difference from reference frame list0, generating a second patch of predicted samples using the first motion vector difference from reference frame list1, and determining a motion vector refinement to a value that minimizes an error function between the first patch and the second patch. Various optimization criteria (e.g., rate distortion, SAD, etc.) may be used to determine the refined motion vector.
It should be appreciated that techniques are disclosed for reducing the amount of bits used to represent motion in a compressed video bitstream. Using the disclosed techniques, bi-prediction can be signaled using only half of the motion information of conventional techniques, and the other half of the motion information can be generated at the decoder using the mirror symmetry of the motion of objects in the video. The symmetry flag and refinement flag may be used to signal the use (or non-use) of this mode and further refinement of the motion vectors. The mirror motion vector may be calculated using symmetry rules. One assumption made in the symmetry rule is that the object maintains its translational motion between the time of the current block and the time of the reference block for bi-directional prediction. For example, using one symmetry rule, a motion vector pointing to a reference region displaced from the current block by delx and dely in one temporal direction may be assumed to change in the other direction to a scaled version of delx and dely (scaling may also include negative scaling, which may be due to a change in motion vector direction). Scaling may depend on time distance and other considerations and is described in this document.
The disclosed and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a combination of substances that affect a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few embodiments and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims (35)

1. A method of processing a video bitstream, comprising:
generating second motion vector difference information based on the symmetry rule and the first motion vector difference information in response to a mirror mode flag in the video bitstream; and
reconstructing a video block in a current picture using the first motion vector difference information and the second motion vector difference information, wherein the reconstructing is performed using bi-prediction.
2. The method of claim 1, wherein the symmetry rule specifies that the second motion vector difference information is not to be transmitted.
3. The method of claim 2, further comprising:
motion vector refinement of the mirrored motion vector values is performed to generate motion vector refinement values.
4. The method of claim 1, wherein the mirror mode flag exists at a Coding Unit (CU) level, a Coding Tree Unit (CTU) level, a region level or a slice level covering a plurality of CUs/CTUs.
5. The method of claim 3, wherein the motion vector refinement is selectively performed based on a refinement flag in the video bitstream.
6. The method of claim 5, wherein the refinement flag is included at least in a slice header, a picture parameter set, a sequence parameter set, a region level, a coding unit, or a coding tree unit level.
7. The method of claim 1, wherein the video bitstream comprises skip information indicating a list of motion vector differences signaled in the video bitstream by skip signaling.
8. The method of claim 7, wherein the skip information is at a coding unit level, a region level, a coding tree unit level, or a slice level.
9. The method of claim 8, wherein the coding unit generates the second motion vector information using the symmetry rule in a case where the skip information is at a region level, a coding tree unit level, or a slice level.
10. The method of claim 1, further comprising:
and storing the motion vector predictor generated by using the symmetry rule for processing the prediction information of the subsequent video block.
11. The method of claim 10, wherein the motion vector predictor is used with a conventional motion vector predictor, and wherein a two-bit field signals the motion vector predictor in the video bitstream.
12. The method of claim 10, wherein the motion vector predictor is used in place of one of the conventional motion vector predictors and signaling is performed with a single bit in the video bitstream.
13. The method of claims 1 to 12, wherein the symmetry rule is used only if a picture order count distance between two reference frames for bi-prediction is equal.
14. The method of claim 1, wherein the video bitstream omits signaling of motion vector difference values for reference picture list l1, and wherein the bi-prediction is performed using:
(first picture order number (POC) of a first reference picture in reference picture list0) - (second POC of the current picture) - (third POC of another reference picture in reference picture list 1).
15. The method of claim 1, wherein the two reference pictures used for the bi-prediction are the reference pictures closest to the current picture derived from a past frame and a future frame.
16. The method of claim 1, wherein the video bitstream jointly signals the reference index and the motion vector prediction index of reference list0 and reference list1 using a single reference index and a single motion vector prediction index for each video block.
17. The method of claim 1, wherein mirror motion vector values are determined using scaling proportional to relative temporal distances between source and target reference frames of the video block.
18. The method of claim 3, wherein performing the motion vector refinement comprises:
generating a prediction sample first patch using a third motion vector from a reference frame associated with the first reference picture list;
generating a prediction sample second patch using the mirrored motion vector values from a reference frame associated with a second reference picture list; and
determining the motion vector refinement value as a value that minimizes an error function between the first patch and the second patch.
19. The method of claim 18, wherein the error function comprises a sum of absolute differences measurement.
20. The method of claim 18, wherein the error function comprises a correlation between the motion vector refinement values and weighted linear averages of the first patch and the second patch.
21. The method of claim 18, wherein the error function is a rate-distortion function using the motion vector refinement values.
22. The method of claim 3, wherein performing the motion vector refinement comprises:
the motion vector refinement value is determined as a value that minimizes an error function using top and left reference or interpolated samples between reference frames associated with the two reference picture lists.
23. The method of claim 3, wherein performing the motion vector refinement comprises:
determining the motion vector refinement value as a value that minimizes an error function using integer reference samples between two reference frames associated with two reference picture lists when the motion vector refinement value is greater than a threshold.
24. The method of claim 1, wherein the symmetry rule is responsive to a FLAG comprising MVD _ L1_ ZERO _ FLAG in a stripe level signaling for the video block.
25. A video bitstream processing method, comprising:
receiving motion vector difference information for a first set of motion vectors for a first reference picture list associated with a video block; and
deriving motion vector difference information associated with a second set of motion vectors of a second reference picture list associated with the video block from motion vector difference information of the first set of motion vectors using a multi-hypothesis symmetry rule, wherein the multi-hypothesis symmetry rule specifies the second motion vector difference value as (0,0) and a corresponding motion vector predictor is set to a mirror motion vector value derived from the first motion vector difference information; and
performing a conversion between the video block and a bit stream representation of the video block using the derived result.
26. The method of claim 25, comprising:
deriving further motion vector difference information associated with a first reference picture list associated with the video block using the multi-hypothesis symmetry rule; and
deriving another motion vector difference information associated with a second reference picture list associated with the video block using the multi-hypothesis symmetry rule.
27. A method of processing a video bitstream, comprising:
receiving, for a video block, first motion vector difference information associated with a first reference picture list;
receiving, for the video block, second motion vector difference information associated with a second reference picture list;
deriving third motion vector difference information associated with the first reference picture list and a fourth motion vector difference information picture list associated with the second reference from the first motion vector difference information and the second motion vector difference information using a multi-hypothesis symmetry rule, wherein the multi-hypothesis symmetry rule specifies that the second motion vector difference value is (0,0) and a corresponding motion vector predictor is set to a mirror motion vector value derived from the first motion vector difference information.
28. The method of claims 25 to 27, further comprising:
motion vector refinement of the mirrored motion vector values is performed to generate motion vector refinement values.
29. A video processing method, comprising:
receiving a future frame of the video relative to a reference frame of the video;
receiving motion vectors associated with future frames of the video and past frames of the video;
applying a predetermined relationship between future frames of the video and past frames of the video;
reconstructing the past frame of the video based on the future frame of the video, the motion vector, and a predetermined relationship between the past frame of the video and the future frame of the video, wherein the predetermined relationship is that the future frame of the video and the past frame of the video are associated by a mirroring condition.
30. The method of claim 29, wherein the mirroring condition implies that an object having coordinates (x, y) in a future frame of the video has coordinates (-x, -y) in a past frame of the video.
31. A video processing method, comprising:
receiving a past frame of the video relative to a reference frame of the video;
receiving motion vectors associated with a past frame of the video and a future frame of the video;
applying a predetermined relationship between future frames of the video and past frames of the video;
reconstructing a future frame of the video data based on the past frame of the video, the motion vector, and a predetermined relationship between the past frame of the video and the future frame of the video, wherein the predetermined relationship is that the future frame of the video and the past frame of the video are associated by a mirroring condition.
32. The method of claim 31, wherein the mirroring condition implies that an object having coordinates (x, y) in a past frame of the video has coordinates (-x, -y) in a future frame of the video.
33. A video decoding apparatus, comprising:
a processor configured to implement the method of one or more of claims 1 to 32.
34. A video encoding device, comprising:
a processor configured to implement the method of one or more of claims 1 to 32.
35. A computer program product having computer code stored thereon, wherein the code, when executed by a processor, causes the processor to implement the method of one or more of claims 1 to 32.
CN201910586486.4A 2018-06-30 2019-07-01 Symmetric bi-directional prediction modes for video coding and decoding Active CN110662077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210763999.XA CN115396677A (en) 2018-06-30 2019-07-01 Symmetric bi-directional prediction modes for video coding and decoding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2018/093897 2018-06-30
CN2018093897 2018-06-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210763999.XA Division CN115396677A (en) 2018-06-30 2019-07-01 Symmetric bi-directional prediction modes for video coding and decoding

Publications (2)

Publication Number Publication Date
CN110662077A true CN110662077A (en) 2020-01-07
CN110662077B CN110662077B (en) 2022-07-05

Family

ID=67185530

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210763999.XA Pending CN115396677A (en) 2018-06-30 2019-07-01 Symmetric bi-directional prediction modes for video coding and decoding
CN201910586486.4A Active CN110662077B (en) 2018-06-30 2019-07-01 Symmetric bi-directional prediction modes for video coding and decoding

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210763999.XA Pending CN115396677A (en) 2018-06-30 2019-07-01 Symmetric bi-directional prediction modes for video coding and decoding

Country Status (3)

Country Link
CN (2) CN115396677A (en)
TW (1) TWI719522B (en)
WO (1) WO2020003262A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102602827B1 (en) * 2018-09-04 2023-11-15 후아웨이 테크놀러지 컴퍼니 리미티드 Reference frame acquisition method and device applied to two-way inter prediction
US11025936B2 (en) 2019-01-25 2021-06-01 Tencent America LLC Method and apparatus for video coding
WO2020184920A1 (en) * 2019-03-08 2020-09-17 한국전자통신연구원 Image encoding/decoding method and apparatus, and recording medium for storing bitstream
WO2020197243A1 (en) * 2019-03-24 2020-10-01 엘지전자 주식회사 Image encoding/decoding method and device using symmetric motion vector difference (smvd), and method for transmitting bitstream
WO2023040972A1 (en) * 2021-09-15 2023-03-23 Beijing Bytedance Network Technology Co., Ltd. Method, apparatus, and medium for video processing
US20230328227A1 (en) * 2022-04-07 2023-10-12 Tencent America LLC Systems and methods for joint coding of motion vector difference using template matching based scaling factor derivation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140327819A1 (en) * 2013-04-03 2014-11-06 Huawei Technologies Co., Ltd. Multi-level bidirectional motion estimation method and device
CN107222742A (en) * 2017-07-05 2017-09-29 中南大学 Video coding Merge mode quick selecting methods and device based on time-space domain correlation
CN107431820A (en) * 2015-03-27 2017-12-01 高通股份有限公司 Motion vector derives in video coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140327819A1 (en) * 2013-04-03 2014-11-06 Huawei Technologies Co., Ltd. Multi-level bidirectional motion estimation method and device
CN107431820A (en) * 2015-03-27 2017-12-01 高通股份有限公司 Motion vector derives in video coding
CN107222742A (en) * 2017-07-05 2017-09-29 中南大学 Video coding Merge mode quick selecting methods and device based on time-space domain correlation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HUANBANG CHEN: "" Symmetrical mode for bi-prediction "", 《JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 10TH MEETING: SAN DIEGO, US》 *
RICKARD SJÖBERG: ""Description of SDR and HDR video coding technology proposal by Ericsson and Nokia"", 《JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 10TH MEETING: SAN DIEGO, CA, USA》 *
Y. CHEN等: "" Description of SDR, HDR and 360° video coding technology proposal by Qualcomm and Technicolor –low and high complexity versions "", 《JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 10TH MEETING: SAN DIEGO, US》 *

Also Published As

Publication number Publication date
WO2020003262A1 (en) 2020-01-02
CN110662077B (en) 2022-07-05
CN115396677A (en) 2022-11-25
TW202017375A (en) 2020-05-01
TWI719522B (en) 2021-02-21

Similar Documents

Publication Publication Date Title
US11082712B2 (en) Restrictions on decoder side motion vector derivation
US11470341B2 (en) Interaction between different DMVD models
US20220150508A1 (en) Restrictions on decoder side motion vector derivation based on coding information
US11825113B2 (en) Interaction between intra block copy mode and inter prediction tools
KR102613889B1 (en) Motion vector correction with adaptive motion vector resolution
US11778170B2 (en) Temporal gradient calculations in bio
US20210235083A1 (en) Sub-block based prediction
CN110662077B (en) Symmetric bi-directional prediction modes for video coding and decoding
KR20230158645A (en) Interpolation for inter prediction with refinement
CN110662055B (en) Extended Merge mode
CN112868239A (en) Collocated local illumination compensation and intra block copy codec
US20210360256A1 (en) Interaction between mv precisions and mv difference coding
WO2020070729A1 (en) Size restriction based on motion information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant