CN114503596B - Interaction between motion vector refinement and other codec tools - Google Patents

Interaction between motion vector refinement and other codec tools Download PDF

Info

Publication number
CN114503596B
CN114503596B CN202080041806.9A CN202080041806A CN114503596B CN 114503596 B CN114503596 B CN 114503596B CN 202080041806 A CN202080041806 A CN 202080041806A CN 114503596 B CN114503596 B CN 114503596B
Authority
CN
China
Prior art keywords
block
video
codec
sub
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080041806.9A
Other languages
Chinese (zh)
Other versions
CN114503596A (en
Inventor
张凯
张莉
刘鸿彬
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Publication of CN114503596A publication Critical patent/CN114503596A/en
Application granted granted Critical
Publication of CN114503596B publication Critical patent/CN114503596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/583Motion compensation with overlapping blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness

Abstract

Apparatus, systems, and methods for digital video processing are described. An exemplary method for video processing includes making a first determination regarding a codec mode for representing a current video block of a video in a codec representation of the video; based on the first determination, a second determination is made as to whether to apply the deblocking filter; and performing a transition between the current video block and the codec representation according to the first determination and the second determination, wherein the codec mode uses an affine codec tool and a specific motion prediction/compensation tool for the transition.

Description

Interaction between motion vector refinement and other codec tools
Cross Reference to Related Applications
The present application is directed to claiming priority and benefit from PCT/CN2019/090201 international patent application filed on 5 th 6 th 2019, PCT/CN2019/094767 international patent application filed on 4 th 2019, and PCT/CN2019/096180 international patent application filed on 16 th 7 th 2019 in accordance with applicable patent laws and/or rules of the paris convention in time. The entire disclosure of the foregoing application is incorporated by reference as part of the disclosure of this application for all purposes in accordance with law.
Technical Field
This patent document relates to video processing techniques, devices, and systems.
Background
Despite advances in video compression technology, digital video still occupies the greatest bandwidth usage on the internet and other digital communication networks. As the number of networked user devices capable of receiving and displaying video increases, the bandwidth requirements for digital video usage are expected to continue to increase.
Disclosure of Invention
Devices, systems and methods related to digital video processing, for example, to predictive refinement (Prediction Refinement with Optical Flow, PROF) using optical flow for video coding. The described methods may be applied to existing video codec standards (e.g., high efficiency video codec (High Efficiency Video Coding, HEVC)) and future video codec standards or video codecs.
In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes making a first determination regarding a codec mode for representing a current video block of the video in a codec representation of the video; based on the first determination, a second determination is made as to whether to apply the deblocking filter; and performing a transition between the current video block and the codec representation according to the first determination and the second determination, wherein the codec mode uses an affine codec tool and a specific motion prediction/compensation tool for the transition.
In another representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes determining to enable use of a switchable interpolation filter tool due to use of a particular motion vector precision in an affine codec tool for representing a current video block of a video in a codec representation of the video; and performing a conversion based on the determination, wherein the switchable interpolation filter tool allows switching to another interpolation filter for the current video block that is different from the interpolation filter used to process the previous video block.
In yet another representative aspect, the disclosed techniques can be used to provide a method for video processing. The method includes, for a current video block of a video comprising one or more video blocks, making a decision regarding an applicability of a Prediction Refinement Optical Flow (PROF) that refines an optical flow of the current video block using bidirectional optical flow (BDOF) and/or motion information based on a use of a switchable interpolation filter tool that allows the current video block and another video block to use different interpolation filters for determining a prediction block; and based on the decision, performing a transition between the video and the codec representation of the video.
In yet another representative aspect, the disclosed techniques can be used to provide a method for video processing. The method comprises performing a conversion between video blocks of a video region of the video and a codec representation of the video according to a rule, wherein the rule specifies that a first syntax element is included in the codec representation at a level corresponding to an applicability of a codec tool or a decoder-side motion vector refinement tool based on an optical flow model, and wherein the conversion is performed according to a value of the first syntax element.
In yet another representative aspect, the above-described method is embodied in the form of processor-executable code and stored in a computer-readable program medium.
In yet another representative aspect, an apparatus configured or operable to perform the above-described method is disclosed. The apparatus may include a processor programmed to implement the method.
In yet another representative aspect, a video decoder device may implement the method as described herein.
The above and other aspects and features of the disclosed technology are described in more detail in the accompanying drawings, description and claims.
Drawings
Fig. 1 shows an example of constructing a Merge candidate list.
Fig. 2 shows an example of the location of spatial candidates.
Fig. 3 shows an example of a candidate pair for which redundancy check of the spatial Merge candidate is performed.
Fig. 4A and 4B illustrate examples of the location of a second Prediction Unit (PU) based on the size and shape of a current block.
Fig. 5 shows an example of motion vector scaling for a temporal Merge candidate.
Fig. 6 shows an example of candidate locations of the time domain Merge candidates.
Fig. 7 shows an example of generating combined bi-predictive Merge candidates.
Fig. 8 shows an example of constructing a motion vector prediction candidate.
Fig. 9 shows an example of motion vector scaling for spatial motion vector candidates.
Fig. 10 shows an example of motion prediction using an alternative temporal motion vector prediction (Alternative Temporal Motion Vector Prediction, ATMVP) algorithm for a Coding Unit (CU).
Fig. 11 shows an example of a coding and decoding unit (CU) with sub-blocks and neighboring blocks used by a Spatial-motion vector prediction (STMVP) algorithm.
Fig. 12A and 12B show an example snapshot (snapshot) of sub-blocks when using an Overlapped Block Motion Compensation (OBMC) algorithm.
Fig. 13 shows an example of neighboring samples for deriving parameters for a Local Illumination Compensation (LIC) algorithm.
Fig. 14 shows an example of a simplified affine motion model.
Fig. 15 shows an example of affine Motion Vector Field (MVF) of each sub-block.
Fig. 16 shows an example of Motion Vector Prediction (MVP) for an AF INTER affine motion mode.
Fig. 17A and 17B show example candidates of the af_merge affine motion mode.
Fig. 18 shows an example of bilateral matching in a pattern-matched motion vector derivation (PMMVD) pattern, which is a special Merge pattern based on a Frame Rate Up Conversion (FRUC) algorithm.
Fig. 19 shows an example of template matching in a FRUC algorithm.
Fig. 20 shows an example of single-sided motion estimation in a FRUC algorithm.
FIG. 21 shows an example of optical flow trajectories used by a bi-directional optical flow (BIO) algorithm.
FIGS. 22A and 22B illustrate example snapshots using a bi-directional optical flow (BIO) algorithm without block expansion.
Fig. 23 shows an example of interpolation samples used in BIO.
Fig. 24 shows an example of a decoder-side motion vector refinement (DMVR) algorithm based on bilateral template matching.
Fig. 25 shows an example of the sub-block MV VSB and the pixel Δv (i, j).
Fig. 26 shows an example of phase change level filtering.
Fig. 27 shows an example of one example of applying 8-tap horizontal filtering.
Fig. 28 shows an example of non-uniform phase vertical filtering.
Fig. 29A-29D illustrate a flowchart of an example method for video processing.
Fig. 30A and 30B are block diagrams of examples of hardware platforms for implementing visual media decoding or visual media encoding techniques described in this document.
Fig. 31 shows an example of 16 4×4 sub-blocks in a 16×16 region.
Detailed Description
Video processing methods and techniques are ubiquitous in modern technology due to the increasing demand for higher resolution video. Video codecs typically include electronic circuitry or software that compresses or decompresses digital video, and are continually improving to provide higher codec efficiency. The video codec converts uncompressed video into a compressed format and vice versa. There is a complex relationship between video quality, the amount of data used to represent the video (determined by the bit rate), the complexity of the encoding and decoding algorithms, the sensitivity to data loss and errors, the ease of editing, random access and end-to-end delay (latency). The compression format typically conforms to standard video compression specifications, such as the High Efficiency Video Codec (HEVC) standard (also known as h.265 or MPEG-H Part 2), the universal video codec standard to be completed, or other current and/or future video codec standards.
Embodiments of the disclosed technology may be applied to existing video codec standards (e.g., HEVC, h.265) and future standards to improve compression performance. Chapter titles are used in this document to enhance the readability of the description and in no way limit the discussion or embodiments (and/or implementations) to only the corresponding chapters.
Examples of inter prediction in HEVC/H.265
In recent years, video codec standards have significantly improved, and now provide, in part, high codec efficiency and support for higher resolutions. Recent standards such as HEVC and h.265 are based on hybrid video codec structures, where temporal prediction plus transform coding is utilized.
1.1 examples of prediction modes
Each inter prediction PU (prediction unit) has motion parameters of one or two reference picture lists. In some embodiments, the motion parameters include a motion vector and a reference picture index. In other embodiments, the use of one of the two reference picture lists may also be signaled using inter predidc. In other embodiments, the motion vector may be coded as an increment relative to the predicted value.
When a CU is encoded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no motion vector delta or reference picture index for the encoding and decoding. The Merge mode is specified whereby the motion parameters of the current PU are obtained from neighboring PUs that include spatial and temporal candidates. The Merge mode may be applied to any inter prediction PU, not just for skip mode. An alternative to the Merge mode is explicit transmission of motion parameters, where motion vectors, corresponding reference picture indices for each reference picture list, and reference picture list usage are explicitly signaled per PU.
When the signaling indicates that one of the two reference picture lists is to be used, a PU is generated from a block of one sample. This is called "unidirectional prediction". Unidirectional prediction applies to both P-stripes and B-stripes.
When the signaling indicates that two reference picture lists are to be used, a PU is generated from two sample blocks. This is called "bi-prediction". Bi-directional prediction is only applicable to B-stripes.
1.1.1 embodiment of constructing candidates for Merge mode
When predicting a PU using the Merge mode, an index to an entry in the Merge candidate list is parsed from the bitstream and used to retrieve motion information. The construction of this list can be summarized according to the following sequence of steps:
step 1: initial candidate derivation
Step 1.1: spatial candidate derivation
Step 1.2: redundancy check of airspace candidates
Step 1.3: time domain candidate derivation
Step 2: additional candidate inserts
Step 2.1: creating bi-prediction candidates
Step 2.2: inserting zero motion candidates
Fig. 1 shows an example of constructing a Merge candidate list based on the above summarized sequence of steps. For spatial-domain Merge candidate derivation, up to four Merge candidates are selected from among the candidates located at five different positions. For time domain Merge candidate derivation, a maximum of one Merge candidate is selected among the two candidates. Since the number of candidates per PU is assumed to be constant at the decoder, additional candidates are generated when the number of candidates does not reach the maximum number of Merge candidates signaled in the slice header (MaxNumMergeCand). Since the number of candidates is constant, the index of the best Merge candidate is encoded using Truncated Unary binarization (TU). If the size of the CU is equal to 8, then all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2Nx2N prediction unit.
1.1.2 construction of airspace Merge candidates
In the derivation of the spatial-domain Merge candidates, up to four Merge candidates are selected from among candidates located at the positions depicted in fig. 2. The deduced sequence is A 1 、B 1 、B 0 、A 0 And B 2 . Only when position A 1 、B 1 、B 0 、A 0 Position B is considered only if either PU of (e.g., because it belongs to another slice or slice) is unavailable or intra-codec 2 . In position A 1 After the candidates at the position are added, redundancy check is performed on the addition of the remaining candidates, which ensures that candidates having the same motion information are excluded from the list, thereby improving the codec efficiency.
In order to reduce the computational complexity, all possible candidate pairs are not considered in the redundancy check mentioned. Instead, only the pairs linked by arrows in FIG. 3 are considered, and only if the candidates for redundancy check do not have the same fortuneWhen the information is moved, the corresponding candidate is added to the list. Another source of duplicate motion information is a "second PU" associated with a partition other than 2n×2n. As an example, fig. 4A and 4B depict second PUs for the case of nx2n and 2nxn, respectively. When the current PU is partitioned into N2N, position A 1 Candidates at this point are not considered for list construction. In some embodiments, adding the candidate may result in both prediction units having the same motion information, which is redundant for having only one PU in the coding unit. Similarly, when the current PU is partitioned into 2N×N, position B is not considered 1
1.1.3 construction of time Domain Merge candidates
In this step, only one candidate is added to the list. In particular, in the derivation of the temporal Merge candidate, a scaled motion vector is derived based on collocated PUs belonging to a picture with the smallest POC difference from the current picture within a given reference picture list. The derived reference picture list to be used for the collocated PU is signaled explicitly in the slice header.
Fig. 5 shows an example of the derivation of a scaled motion vector of a temporal Merge candidate (as shown by the dashed line) scaled from the motion vector of the collocated PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal Merge candidate is set to zero. For the B slices, two motion vectors are obtained, one for reference picture list 0 and the other for reference picture list 1, and combined to form bi-prediction Merge candidates.
As depicted in fig. 6, in the collocated PU (Y) belonging to the reference frame, at candidate C 0 And C 1 Between which the location of the time domain candidate is selected. If position C 0 Where the PU is unavailable, intra-coded or outside the current CTU, then position C is used 1 . Otherwise, position C is used in the derivation of the time domain Merge candidate 0
1.1.4 construction of additional types of Merge candidates
In addition to the space-time Merge candidate, there are two additional types of Merge candidates: combined bi-predictive Merge candidate and zero Merge candidate. The combined bi-predictive Merge candidate is generated by using the space-time Merge candidate. The combined bi-predictive Merge candidate is only for the B stripe. The combined bi-prediction candidate is generated by combining the first reference picture list motion parameter of the initial candidate with the second reference picture list motion parameter of the other. If the two tuples provide different motion hypotheses they will form new bi-prediction candidates.
Fig. 7 shows an example of this process, where two candidates with mvL0 and refIdxL0 or mvL1 and refIdxL1 in the original list (710, on the left) are used to create a combined bi-prediction Merge candidate that is added to the final list (720, on the right).
The zero motion candidate is inserted to fill the remaining entries in the Merge candidate list and thus reach MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index that starts from zero and increases each time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is one unidirectional prediction and two bidirectional predictions, respectively. In some embodiments, no redundancy check is performed on these candidates.
1.1.5 examples of motion estimation regions for parallel processing
In order to speed up the encoding process, motion estimation may be performed in parallel, thereby deriving motion vectors for all prediction units within a given region at the same time. Deriving Merge candidates from spatial neighborhood may interfere with parallel processing because one prediction unit cannot derive motion parameters from neighboring PUs until its associated motion estimation is complete. To mitigate the trade-off between codec efficiency and processing latency, a Motion Estimation Region (MER) may be defined. The size of the MER may be signaled in a Picture Parameter Set (PPS) using a "log2_parallel_merge_level_minus2" syntax element. When defining MERs, mere candidates that fall into the same region are marked as unavailable and are therefore not considered in list construction.
1.2 embodiment of Advanced Motion Vector Prediction (AMVP)
AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. A motion vector candidate list is constructed by first checking the availability of left, upper temporal neighboring PU locations, removing redundant candidates, and adding zero vectors to make the candidate list length constant. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, the index of the best motion vector candidate uses truncated unary coding. In this case, the maximum value to be encoded is 2 (see fig. 8). In the following section, details are provided regarding the derivation process of motion vector prediction candidates.
1.2.1 examples of constructing motion vector prediction candidates
Fig. 8 summarizes the derivation process of the motion vector prediction candidates, and may be implemented for each reference picture list having refidx as an input.
In motion vector prediction, two types of motion vector candidates are considered: spatial domain motion vector candidates and temporal motion vector candidates. For spatial domain motion vector candidate derivation, two motion vector candidates are ultimately derived based on the motion vector of each PU located in five different locations as previously shown in fig. 2.
For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different collocated positions. After the first space-time selection list is generated, the repeated motion vector candidates in the list are removed. If the number of potential candidates is greater than 2, motion vector candidates having a reference picture index greater than 1 within the list are removed from the associated reference picture list. If the number of space-time motion vector candidates is less than two, additional zero motion vector candidates are added to the list.
1.2.2 construction of spatial motion vector candidates
In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates derived from the PU located at the positions shown in previous fig. 2, which are the same as the positions of the motion Merge And the same is true. The derivation order to the left of the current PU is defined as A 0 、A 1 And scaled A 0 Scaled A 1 . The derivation order of the upper side of the current PU is defined as B 0 、B 1 、B 2 Scaled B 0 Scaled B 1 Scaled B 2 . Thus, for each side, there are four cases that can be used as motion vector candidates, two of which do not require spatial scaling, and two of which use spatial scaling. These four different cases are summarized as follows:
-non-spatial scaling
(1) Identical reference picture list and identical reference picture index (identical POC)
(2) Different reference picture lists but the same reference picture (same POC)
-spatial domain scaling
(3) Identical reference picture list but different reference pictures (different POCs)
(4) Different reference picture lists and different reference pictures (different POCs)
First check the no spatial scaling case and then check the case where spatial scaling is allowed. Regardless of the reference picture list, spatial scaling is considered when POC between the reference picture of the neighboring PU and the reference picture of the current PU is different. If all PUs of the left candidate are not available or are intra-coded, then the upper motion vectors are allowed to be scaled to assist in the parallel derivation of the left and upper MV candidates. Otherwise, spatial scaling of the upper motion vector is not allowed.
As shown in the example in fig. 9, for the spatial scaling case, the motion vectors of neighboring PUs are scaled in a similar manner as the temporal scaling. One difference is that the reference picture list and the index of the current PU are given as inputs; the actual scaling procedure is the same as the scaling procedure of time domain scaling.
1.2.3 construction of temporal motion vector candidates
All procedures for deriving temporal Merge candidates are the same as those for deriving spatial motion vector candidates (as shown in the example in fig. 6), except for reference picture index derivation. In some embodiments, the reference picture index is signaled to the decoder.
2. Examples of inter prediction methods in Joint Exploration Model (JEM)
In some embodiments, future video codec techniques are explored using reference software called joint exploration models (Joint Exploration Model, JEM). In JEM, sub-block based prediction is employed in several codec tools, such as affine prediction, optional temporal motion vector prediction, spatial motion vector prediction, bi-optical flow (Bi-directional Optical flow, BIO), frame-rate up-conversion (Frame-Rate Up Conversion, FRUC), locally adaptive motion vector resolution (Locally Adaptive Motion Vector Resolution, LAMVR), overlapped block motion compensation (Overlapped Block Motion Compensation, OBMC), local illumination compensation (Local Illumination Compensation, LIC), and Decoder-side motion vector refinement (Decoder-side Motion Vector Refinement, DMVR).
2.1 example of sub-CU based motion vector prediction
In a JEM with a quadtree plus binary tree (QuadTrees plus Binary Trees, QTBT), each CU may have at most one set of motion parameters for each prediction direction. In some embodiments, two sub-CU level motion vector prediction methods are considered in the encoder by dividing the large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. An optional temporal motion vector prediction (ATMVP) method allows each CU to obtain multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In the space-time motion vector prediction (STMVP) method, the motion vector of the sub-CU is recursively derived by using a time domain motion vector predictor and a spatial neighboring motion vector. In some embodiments, in order to preserve a more accurate motion field for sub-CU motion prediction, the motion compression of the reference frame may be disabled.
2.1.1 example of Alternative Temporal Motion Vector Prediction (ATMVP)
Among the ATMVP methods, a Temporal Motion Vector Prediction (TMVP) method is modified by acquiring a plurality of motion information sets (including a motion vector and a reference index) from a block smaller than a current CU.
Fig. 10 shows an example of an ATMVP motion prediction process for CU 1000. The ATMVP method predicts the motion vector of the sub-CU 1001 within the CU 1000 in two steps. The first step is to identify the corresponding block 1051 in the reference picture 1050 with a time domain vector. The reference picture 1050 is also referred to as a motion source picture. The second step is to divide the current CU 1000 into sub-CUs 1001 and obtain a motion vector and a reference index of each sub-CU from a block corresponding to each sub-CU.
In a first step, the reference picture 1050 and the corresponding block are determined from motion information of spatial neighboring blocks of the current CU 1000. To avoid the repeated scanning process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU 1000 is used. The first available motion vector and its associated reference index are set to the temporal vector and index of the motion source picture. In this way, the corresponding block (sometimes referred to as a collocated block) may be more accurately identified than the TMVP, with the corresponding block always being in a lower right or center position relative to the current CU.
In a second step, the corresponding block of sub-CU 1051 is identified by the temporal vector in motion source picture 1050 by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (e.g., the smallest motion grid covering the center sample) is used to derive the motion information of the sub-CU. After the motion information of the corresponding nxn block is identified, it is converted into a motion vector and a reference index of the current sub-CU in the same manner as TMVP of HEVC, in which motion scaling and other procedures are applied. For example, the decoder checks whether a low delay condition is satisfied (e.g., POC of all reference pictures of the current picture is less than POC of the current picture), and may predict a motion vector MVy (e.g., where X is equal to 0 or 1 and Y is equal to 1-X) of each sub-CU using a motion vector MVx (e.g., a motion vector corresponding to reference picture list X).
2.1.2 example of spatial motion vector prediction (STMVP)
In the STMVP method, motion vectors of sub-CUs are recursively derived in raster scan order. Fig. 11 shows an example of one CU and neighboring blocks with four sub-blocks. Consider an 8 x 8CU 1100 that includes four 4 x 4 sub-CUs a (1101), B (1102), C (1103), and D (1104). The neighboring 4 x 4 blocks in the current frame are labeled a (1111), b (1112), c (1113), and d (1114).
The motion derivation of sub-CU a begins by identifying its two spatial neighbors. The first neighbor is an nxn block on the upper side of sub CU a 1101 (block c 1113). If the block c (1113) is not available or intra-coded, other nxn blocks on the upper side of the sub-CU a (1101) are checked (from left to right, starting at block c 1113). The second neighbor is the block to the left of sub-CU a 1101 (block b 1112). If block b (1112) is not available or intra-coded, other blocks on the left side of sub-CU A1101 are checked (from top to bottom, starting at block b 1112). The motion information obtained from neighboring blocks of each list is scaled to the first reference frame of the given list. Next, a temporal motion vector prediction value (TMVP) of the sub-block a 1101 is derived by following the same procedure as TMVP derivation specified in HEVC. The motion information of the collocated block at block D1104 is acquired and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors are averaged separately for each reference list. The average motion vector is designated as the motion vector of the current sub-CU.
2.1.3 examples of sub-CU motion prediction mode Signaling
In some embodiments, sub-CU modes are enabled as additional Merge candidates, and additional syntax elements are not required to signal these modes. Two additional Merge candidates are added to the Merge candidate list for each CU to represent the ATMVP mode and the STMVP mode. In other embodiments, up to seven Merge candidates may be used if the sequence parameter set indicates that ATMVP and STMVP are enabled. The coding logic of the additional Merge candidate is the same as that of the Merge candidate in the HM, which means that for each CU in the P-slice or B-slice, two additional Merge candidates may also require two RD checks. In some embodiments, all binary bits (bins) of a JEM, merge index, for example, are Context-coded by CABAC (Context-based Adaptive Binary Arithmetic Coding ). In other embodiments, such as HEVC, only the first binary bit is context-coded, while the remaining binary bits are context-bypass-coded.
2.2 examples of adaptive motion vector difference resolution
In some embodiments, when use_integer_mv_flag is equal to 0 in the slice header, a motion vector difference (Motion Vector Difference, MVD) between the motion vector of the PU and the predicted motion vector is signaled in units of quarter (quarter) luma samples. In JEM, locally Adaptive Motion Vector Resolution (LAMVR) is introduced. In JEM, MVDs may be encoded and decoded in units of quarter luminance samples, integer luminance samples, or four luminance samples. The MVD resolution is controlled at a Coding Unit (CU) level and, for each CU having at least one non-zero MVD component, a MVD resolution flag is conditionally signaled.
For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter-luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that the quarter-luminance sample MV precision is not used, another flag is signaled to indicate whether the integer-luminance sample MV precision or the four-luminance sample MV precision is used.
When the first MVD resolution flag of a CU is zero or no coding is performed for the CU (meaning that all MVDs in the CU are zero), the quarter luma sample MV resolution is used for the CU. When the CU uses integer luminance sample MV precision or four luminance sample MV precision, the MVPs in the AMVP candidate list of the CU are rounded to the corresponding precision.
In the encoder, a RD-check at the CU level is used to determine which MVD resolution is to be used for the CU. That is, for each MVD resolution, three CU-level RD checks are performed. In order to accelerate the encoder speed, the following coding scheme is applied in JEM:
-storing motion information of the current CU (integer luminance sample degree) during RD checking of a CU with normal quarter luminance sample MVD resolution. The stored motion information (after rounding) is used as a starting point for further small-range motion vector refinement during RD checking for the same CU with integer luminance samples and 4 luminance samples MVD resolution, so that the time-consuming motion estimation process is not repeated three times.
-conditionally invoking RD checking of CUs with 4 luma samples MVD resolution. For a CU, when the RD cost of the integer luminance sample MVD resolution is much greater than the RD cost of the quarter luminance sample MVD resolution, the RD check for the 4 luminance sample MVD resolution of the CU is skipped.
2.3 example of higher motion vector storage precision
In HEVC, motion vector precision is one-quarter pixel (for 4:2:0 video, one-quarter luma samples and one-eighth chroma samples). In JEM, the accuracy of the internal motion vector store and the Merge candidate is increased to 1/16 pixel. Higher motion vector precision (1/16 pixels) is used for motion compensated inter prediction of CUs coded with skip/Merge mode. For CUs that are encoded with normal AMVP mode, integer-pixel or quarter-pixel motion is used.
An SHVC upsampling interpolation filter having the same filter length and normalization factor as the HEVC motion compensation interpolation filter is used as the motion compensation interpolation filter for the additional fractional pixel positions. The chrominance component motion vector accuracy is 1/32 samples in JEM, and the additional interpolation filter for the 1/32 pixel fractional position is derived by using the average of the filters for two adjacent 1/16 pixel fractional positions.
2.4 example of Overlapped Block Motion Compensation (OBMC)
In JEM, the syntax at the CU level can be used to turn on and off OBMC. When OBMC is used in JEM, OBMC is performed for all motion compensated (Motion Compensation, MC) block boundaries except for the right and lower boundaries of the CU. Furthermore, it is applied to luminance and chrominance components. In JEM, MC blocks correspond to codec blocks. When a CU is coded with sub-CU modes (including sub-CU Merge, affine, and FRUC (frame rate up conversion) modes), each sub-block of the CU is an MC block. To process CU boundaries in a unified way, OBMC is performed for all MC block boundaries at the sub-block level, where the sub-block size is set equal to 4 x 4, as shown in fig. 12A and 12B.
Fig. 12A shows the sub-blocks at the CU/PU boundary, and the hatched sub-blocks are where OBMC is applied. Similarly, fig. 12B shows the sub-blocks in ATMVP mode.
When OBMC is applied to the current sub-block, the motion vectors of the four consecutive neighboring sub-blocks (if available and not identical to the current motion vector) are used to derive the prediction block of the current sub-block in addition to the current motion vector. These multiple prediction blocks based on multiple motion vectors are combined to generate a final prediction signal for the current sub-block.
The prediction block based on the motion vector of the neighboring sub-block is denoted as P N Where N represents indexes for adjacent upper, lower, left, and right sub-blocks, and a prediction block based on a motion vector of a current sub-block is represented as P C . When P N When based on motion information including the same motion information as the current sub-block of the neighboring sub-block, the OBMC is not a slave P N Performed. Otherwise, P is N Is added to P C In the same sample point, i.e. P N Is added to P C Is a kind of medium. Weighting factors {1/4,1/8,1/16,1/32} are used for P N And weighting factors {5/16,31/32} are used for P C . The exception is small MC blocks (i.e. when the height or width of a codec block is equal to 4 or a CU is coded with a sub-CU mode), for such blocks only P N Two rows/columns are added to P C Is a kind of medium. In this case, the weighting factor {1/4,1/8} is used for P N And weighting factors {3/4,7/8} are used for P C . P for motion vector generation based on vertically (horizontally) adjacent sub-blocks N Will P N Samples in the same row (column) of (a) are added to P with the same weighting factor C
In JEM, for CUs of size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether OBMC is applied for the current CU. For CUs that are greater than 256 luma samples in size or are not coded with AMVP mode, OBMC is applied by default. At the encoder, when OBMC is applied to a CU, its effect is taken into account during the motion estimation phase. The prediction signal formed by the OBMC using the motion information of the upper and left neighboring blocks is used to compensate the upper and left boundaries of the original signal of the current CU, and then a normal motion estimation process is applied.
2.5 example of Local Illumination Compensation (LIC)
The LIC uses a scaling factor a and an offset b based on a linear model of the illumination variation. And a Coding Unit (CU) adaptively enables or disables it for each inter-mode codec.
When LIC is applied to a CU, a least squares error method is employed to derive parameters a and b by using neighboring samples of the current CU and their corresponding reference samples. Fig. 13 is an example showing neighboring samples for deriving parameters of an IC algorithm. Specifically, and as shown in fig. 13, sub-samples (2:1 sub-samples) of a CU in a reference picture are used to neighbor the sample and the corresponding sample (identified by motion information of the current CU or sub-CU). The IC parameters are derived and applied individually to each prediction direction.
When a CU is encoded in the Merge mode, the LIC flag is copied from the neighboring block in a similar manner to the motion information copy in the Merge mode; otherwise, an LIC flag is signaled for the CU, indicating whether LIC is applicable.
When LIC is enabled for pictures, an additional CU level RD check is needed to determine whether LIC is applied to the CU. When LIC is enabled for CU, mean-removed sum of absolute differences (Mean-Removed Sum of Absolute Difference, MR-SAD) and Mean-removed sum of absolute Hadamard (Hadamard) transform differences (Mean-Removed Sum of Absolute Hadamard-Transformed Difference, MR-SATD) (instead of SAD and SATD) are used for integer-pixel motion search and fractional-pixel motion search, respectively.
In order to reduce coding complexity, the following coding scheme is applied in JEM:
-disabling LIC for the whole picture when there is no significant illumination change between the current picture and its reference picture. To identify this, a histogram of the current picture and each reference picture of the current picture is calculated at the encoder. Disabling the LIC for the current picture if the histogram difference between the current picture and each reference picture of the current picture is less than a given threshold; otherwise, LIC is enabled for the current picture.
2.6 example of affine motion compensated prediction
In HEVC, only translational motion models are applied to motion compensated prediction (Motion Compensation Prediction, MCP). However, the camera and object may have many kinds of movements, such as zoom in/out, rotation, perspective movement, and/or other irregular movement. JEM, on the other hand, applies reduced affine transformation motion compensated prediction. Fig. 14 shows the motion vector V from two control points 0 And V 1 An example of affine motion field of block 1400 is described. The motion vector field (Motion Vector Field, MVF) of block 1400 can be described by the following equation:
as shown in fig. 14, (v) 0x ,v 0y ) Is the motion vector of the upper left corner control point, and (v 1x ,v 1y ) Is the motion vector of the upper right corner control point. To simplify motion compensated prediction, sub-block based affine transformation prediction may be applied. The subblock size mxn is derived as follows:
here, mvPre is the motion vector score precision (e.g., 1/16 in JEM), (v) 2x ,v 2y ) Is the motion vector of the lower left control point calculated according to equation (1). If desired, M and N can be adjusted downward to be factors of w and h (devisor), respectively.
Fig. 15 shows an example of affine MVF for each sub-block of block 1500. To derive the motion vector for each mxn sub-block, the motion vector for the center sample for each sub-block may be calculated according to equation (1) and rounded to a motion vector score accuracy (e.g., 1/16 in JEM). A motion compensated interpolation filter may then be applied to generate a prediction for each sub-block with the derived motion vectors. After MCP, the high precision motion vector for each sub-block is rounded and saved to the same precision as the normal motion vector.
2.6.1AF_INTER mode example
In JEM, there are two affine motion modes: af_inter mode and af_merge mode. For CUs with width and height both greater than 8, the af_inter mode may be applied. Affine flags at the CU level are signaled in the bitstream to indicate whether af_inter mode is used. In AF_INTER mode, neighboring block constructions are used to construct a block with motion vector pairs { (v) 0 ,v 1 )|v 0 ={v A ,v B ,v c },v 1 ={v D ,v E Candidate list of }.
Fig. 16 shows an example of Motion Vector Prediction (MVP) of a block 1600 in AF INTER mode. As shown in fig. 16, v is selected from the motion vector of sub-block a, block B or block C 0 . Motion vectors from neighboring blocks may be scaled according to the reference list. The motion vector may also be scaled according to a relationship between a Picture Order Count (POC) of references of neighboring blocks, a POC of a reference of the current CU, and a POC of the current CU. And v is selected from adjacent sub-blocks D and E 1 Is similar. If the number of candidate lists is less than 2, the list is populated with motion vector pairs composed by copying each AMVP candidate. When the candidate list is greater than 2, the candidates may first be ordered according to neighboring motion vectors (e.g., based on the similarity of two motion vectors in a pair of candidates). In some embodiments, the first two candidates are retained. In some embodiments, a Rate Distortion (RD) cost check is used to determine which motion vector pair candidate to select as the control point motion vector prediction (Control Point Motion Vector Prediction, CPMVP) of the current CU. An index indicating the CPMVP position in the candidate list may be signaled in the bitstream. After determining the CPMVP of the current affine CU, affine motion estimation is applied and a control point motion vector is found (Control Point Motion Vector, CPMV). Then, the difference of CPMV and CPMVP is signaled in the bitstream.
Examples of AF_MERGE mode
When a CU is applied in af_merge mode, it gets the first block encoded and decoded in affine mode from the valid neighboring reconstructed blocks. FIG. 17A shows the selection order of candidate blocks of the current CU 1700Is an example of (a). As shown in fig. 17A, the selection order may be from the left side (1701), the upper side (1702), the upper right side (1703), the lower left side (1704) to the upper left side (1705) of the current CU 1700. Fig. 17B shows another example of a candidate block of the current CU 1700 in the af_merge mode. If the neighboring lower left block 1801 is encoded and decoded in affine mode, as shown in FIG. 17B, then the motion vectors v of the upper left, upper right and lower left corners of the CU containing the sub-block 1701 2 、v 3 And v 4 Is derived. Calculating the motion vector v of the upper left corner on the current CU 1700 based on v2, v3 and v4 0 . The motion vector v1 at the upper right of the current CU may be calculated accordingly.
After CPMV 0 and v1 of the current CU are calculated according to the affine motion model in equation (1), MVF of the current CU may be generated. In order to identify whether the current CU is encoded in the af_merge mode, an affine flag is signaled in the bitstream when there is at least one neighboring block encoded in the affine mode.
2.7 example of motion vector derivation (PMMVD) for pattern matching
The PMMVD mode is a special Merge mode based on the Frame Rate Up Conversion (FRUC) method. With this mode, the motion information of the block is pushed on the decoder side, instead of signaling the motion information of the block.
When the Merge flag of a CU is true, the FRUC flag may be signaled for the CU. When the FRUC flag is false, the Merge index may be signaled and the regular Merge mode is used. When the FRUC flag is true, an additional FRUC mode flag may be signaled to indicate which method (e.g., bilateral matching or template matching) to use to derive motion information for the block.
At the encoder side, the decision as to whether or not to use FRUC Merge mode for the CU is based on RD cost selection made for the normal Merge candidate. For example, multiple matching patterns (e.g., bilateral matching and template matching) are checked for CUs by using RD cost selection. The matching pattern that results in the least cost is further compared to other CU patterns. If the FRUC match pattern is the most efficient pattern, then the FRUC flag is set to true for the CU and the relevant match pattern is used.
In general, the motion derivation process in FRUC Merge mode has two steps: CU-level motion search is performed first, and then sub-CU-level motion refinement is performed. At the CU level, an initial motion vector for the entire CU is derived based on bilateral matching or template matching. First, a MV candidate list is generated and the candidate that gives the smallest matching cost is selected as the starting point for further CU level refinement. Then, local search based on bilateral matching or template matching is performed near the start point. The MV result of the minimum matching cost is taken as the MV of the entire CU. Subsequently, the motion information is further refined at the sub-CU level, starting from the deduced CU motion vector.
For example, the following derivation procedure is performed for w×h CU motion information derivation. In the first stage, MVs for the entire W×H CU are derived. In the second stage, the CU is further divided into m×m sub-CUs. The value of M is calculated as in equation (3), D being a predefined division depth, set to 3 by default in JEM. The MV for each sub-CU is then derived.
Fig. 18 shows an example of bilateral matching used in the Frame Rate Up Conversion (FRUC) method. The bilateral matching is used to derive the motion information of the current CU by finding the closest match between the two blocks along the motion trajectory of the current CU (1800) in two different reference pictures (1810, 1811). Under the assumption of a continuous motion trajectory, motion vectors MV0 (1801) and MV1 (1802) pointing to two reference blocks are proportional to temporal distances (e.g., TD0 (1803) and TD1 (1804)) between the current picture and the two reference pictures. In some embodiments, when the current picture 1800 is temporally between two reference pictures (1810, 1811) and the temporal distance from the current picture to the two reference pictures is the same, the bilateral matching becomes a mirror-based bi-directional MV.
Fig. 19 shows an example of template matching used in the Frame Rate Up Conversion (FRUC) method. Template matching may be used to derive motion information for the current CU 1900 by finding the closest match between a template in the current picture (e.g., a top and/or left neighboring block of the current CU) and a block in the reference picture 1910 (the same size as the template). Template matching may be applied to AMVP mode in addition to FRUC Merge mode described above. In JEM and HEVC, AMVP has two candidates. Using the template matching method, new candidates can be derived. If the candidate newly derived from the template matching is different from the first existing AMVP candidate, it is inserted at the very beginning of the AMVP candidate list, and then the list size is set to 2 (e.g., by removing the second existing AMVP candidate). When applied to AMVP mode, only CU level search is applied. The MV candidate set at the CU level may include the following: (1) if the current CU is in AMVP mode, then the original AMVP candidate, (2) all Merge candidates, (3) several MVs in the interpolated MV field (described later), and top and left neighboring motion vectors.
When bilateral matching is used, each valid MV of the Merge candidate is used as input to generate MV pairs in the case of hypothetical bilateral matching. For example, in reference list a, one valid MV of the Merge candidate is (MVa, refa). Then, the reference pictures refb of its paired bilateral MVs are found in the other reference list B so that refa and refb are located on different sides of the current picture in the time domain. If such refb is not available in reference list B, then refb is determined to be a different reference than refa and its temporal distance to the current picture is the minimum in list B. After determining refb, MVb is derived by scaling MVa based on the temporal distance between the current picture refa and refb.
In some implementations, four MVs from the interpolated MV field may also be added to the CU-level candidate list. More specifically, interpolation MVs at the locations (0, 0), (W/2, 0), (0, H/2) and (W/2, H/2) of the current CU are added. When FRUC is applied in AMVP mode, the original AMVP candidates are also added to the CU-level MV candidate set. In some implementations, at the CU level, 15 MVs may be added to the candidate list for an AMVP CU, while 13 MVs may be added to the candidate list for a mere CU.
The MV candidate set at the sub-CU level includes: (1) MVs determined from CU level search, (2) top, left, upper left, and upper right neighboring MVs, (3) scaled versions of collocated MVs from reference pictures, (4) one or more ATMVP candidates (e.g., up to four), and (5) one or more STMVP candidates (e.g., up to four). Scaled MVs from reference pictures are derived as follows. The reference pictures in both lists are traversed. The MV at the collocated position of the sub-CU in the reference picture is scaled to the reference of the starting CU level MV. ATMVP and STMVP candidates may be the first four. At the sub-CU level, one or more MVs (e.g., up to 17) are added to the candidate list.
And generating an interpolation MV field.An interpolated motion field is generated for the entire picture based on a single side ME before encoding and decoding the frame. The motion field may then later be used as a CU level or sub-CU level MV candidate.
In some implementations, the motion field for each reference picture in the two reference lists is traversed at a 4 x 4 block level. Fig. 20 shows an example of single-sided Motion Estimation (ME) 2000 in the FRUC method. For each 4 x 4 block, if the motion associated with the block passes through the 4 x 4 block in the current picture and the block is not assigned any interpolation motion, the motion of the reference block is scaled to the current picture according to temporal distances TD0 and TD1 (in the same manner as the MV scaling of TMVP in HEVC), and the scaled motion is assigned to the block in the current frame. If no scaled MVs are assigned to a 4X 4 block, the motion of the block is marked as unusable in the interpolation motion field.
Interpolation and matching costs.When the motion vector points to a fractional sample point, motion compensated interpolation is required. To reduce complexity, bilinear interpolation may be used for bilateral matching and template matching instead of conventional 8-tap HEVC interpolation.
The computation of the matching costs is somewhat different at the different steps. When selecting candidates from the candidate set at the CU level, the matching cost may be the sum-absolute-difference (Absolute Sum Difference, SAD) of bilateral matching or template matching. After the starting MV is determined, the matching cost C of the bilateral matching for the sub-CU level search is calculated as follows:
here, w is a weighting factor. In some embodiments, w may be empirically set to 4.MV and MV s Indicating the current MV and the starting MV, respectively. The SAD can still be used as a matching cost for template matching for sub-CU level searching.
In FRUC mode, MV is derived only by using luminance samples. The derived motion will be used for both luminance and chrominance of the MC inter prediction. After the MV is determined, the final MC is performed using an 8-tap interpolation filter for luminance and a 4-tap interpolation filter for chrominance.
MV refinement is a pattern-based MV search, with bilateral matching costs or template matching costs as the standard. In JEM, two search modes are supported-unlimited center biased diamond search (Unrestricted Center-Biased Diamond Search, UCBDS) and adaptive cross search, MV refinement at CU level and sub-CU level, respectively. For both CU and sub-CU level MV refinement, MVs are searched directly with quarter luma sample MV precision, and then eighth luma sample MV refinement. The search range for MV refinement for the CU and sub-CU steps is set equal to 8 luma samples.
In bilateral matching Merge mode, bi-prediction is applied because the motion information of a CU is derived based on the closest match between two blocks along the current CU's motion trajectory in two different reference pictures. In the template matching Merge mode, the encoder may choose among unidirectional prediction according to list 0, unidirectional prediction according to list 1, or bi-directional prediction for the CU. The selection may be based on the template matching cost as follows:
if cosbi < = factor min (cost 0, cost 1)
Then bi-directional prediction is used;
otherwise, if cost0< = cost1
Unidirectional prediction in list 0 is used;
otherwise the first set of parameters is selected,
using unidirectional predictions in list 1;
here, cost0 is the SAD of the list 0 template match, cost1 is the SAD of the list 1 template match, and cost bi is the SAD of the bi-prediction template match. For example, when the factor value is equal to 1.25, this means that the selection process is biased towards bi-prediction. Inter prediction direction selection may be applied to the CU level template matching process.
2.8 examples of generalized bidirectional predictive improvement (GBi)
The generalized bi-prediction improvement (GBi) proposed in JVET-L0646 is employed in VTM-3.0. In bi-prediction mode, GBi applies unequal weights to the predictors from L0 and L1. In inter prediction mode, multiple weight pairs including equal weight pairs (1/2 ) are evaluated based on Rate-distortion optimization (Rate-Distortion Optimization, RDO), and the decoder is signaled with GBi index of the selected weight pair. In Merge mode, the GBi index inherits from neighboring CUs. The predictor generation formula is shown in equation (5).
P GBi =(w0×P L0 +w1×P L1 +RoundingOffset)>>shiftNum GBi Equation (5)
Herein, P GBi Is the final predicted value of GBi, w 0 And w 1 Are predictive values (P) applied to list 0 (L0) and list 1 (L1), respectively L0 And P L1 ) GBi weights selected by the user. rounddingOffset GBi And shiftNum GBi For normalizing the final predicted value in GBi. Supported w 1 The weight set is { -1/4,3/8,1/2,5/8,5/4}, where five weights correspond to one equal weight pair and four unequal weight pairs. Mixing gain, i.e. w 1 And w 0 And, the sum is fixed to 1.0. Thus, the corresponding w 0 The weight set is {5/4,5/8,1/2,3/8, -1/4}. The weight pairs are selected at the CU level.
For non-low delay pictures, the weight set size is reduced from 5 to 3, where w 1 The weight set is {3/8,1/2,5/8}, and w 0 The weight set is {5/8,1/2,3/8}. The weight set size reduction of non-low latency pictures is applied to BMS2.1 GBi and all GBi tests in the draft.
2.8.1GBi encoder error repair
To reduce GBi encoding time, in current encoder designs, the encoder will store the uni-directional predicted motion vector estimated from GBi weights equal to 4/8 and reuse it for uni-directional prediction searches of other GBi weights. The fast encoding method is applied to translational and affine motion models. In VTM2.0, a 6-parameter affine model and a 4-parameter affine model are employed. When the GBi weight is equal to 4/8, the BMS2.1 encoder does not distinguish between the 4-parameter affine model and the 6-parameter affine model when storing unidirectional predicted affine MVs. Thus, after encoding with GBi weights 4/8, a 4-parameter affine MV may be covered by a 6-parameter affine MV. The stored 6-parameter affine MVs may be used for the 4-parameter affine MVs of other GBi weights, or the stored 4-parameter affine MVs may be used for the 6-parameter affine MVs. GBi encoder error repair is proposed to separate 4-parameter and 6-parameter affine MV storage. When the GBi weights are equal to 4/8, the encoder stores those affine MVs based on the affine model type and reuses the corresponding affine MVs based on the affine model type for other GBi weights.
2.8.2GBi encoder acceleration
In this prior embodiment, five encoder acceleration methods are proposed to reduce the encoding time when GBi is enabled.
(1) Affine motion estimation with conditional skipping of some GBi weights
In BMS2.1, affine ME including 4-parameter and 6-parameter affine ME is performed on all GBi weights. It is proposed to conditionally skip affine ME for those unequal GBi weights (weights not equal to 4/8). Specifically, affine ME will be performed on other GBi weights if and only if the affine mode is selected as the current best mode and it is not the affine Merge mode after evaluating GBi weights of 4/8. If the current picture is a non-low delay picture, bi-predictive motion estimation of the translational model will be skipped for unequal GBi weights when performing affine motion estimation. If affine mode is not selected as current best mode, or if affine Merge is selected as current best mode, affine ME will be skipped for all other GBi weights.
(2) Reducing the number of weights for RD cost checking for low delay pictures in encoding with 1-pixel and 4-pixel MVD precision
For low-delay pictures, there are five weights for the RD cost check for all MVD precision including 1/4 pixel, 1 pixel, and 4 pixel. The encoder will first check the RD cost for 1/4 pixel MVD precision. It is proposed to skip some of the GBi weights for RD cost checks for 1-pixel and 4-pixel MVD precision. The unequal weights are ordered according to the RD cost of the 1/4 pixel MVD precision for those unequal weights. During encoding with 1-pixel and 4-pixel MVD precision, only the first two weights with minimum RD cost and GBi weights of 4/8 will be evaluated. Therefore, for 1-pixel and 4-pixel MVD precision of low-delay pictures, a maximum of three weights will be evaluated.
(3) Conditionally skipping bi-predictive search when L0 and L1 reference pictures are the same
For some pictures in RA, the same picture may appear in two reference picture lists (list 0 and list 1). For example, for a random access codec configuration in CTCs, the reference picture structure of the first group of pictures (GOP) is listed below.
POC:16,TL:0,[L0:0] [L1:0]
POC:8,TL:1,[L0:0 16] [L1:16 0]
POC:4,TL:2,[L0:0 8] [L1:8 16]
POC:2,TL:3,[L0:0 4] [L1:4 8]
POC:1,TL:4,[L0:0 2] [L1:2 4]
POC:3,TL:4,[L0:2 0] [L1:4 8]
POC:6,TL:3,[L0:4 0] [L1:8 16]
POC:5,TL:4,[L0:4 0] [L1:6 8]
POC:7,TL:4,[L0:6 4] [L1:8 16]
POC:12,TL:2,[L0:8 0] [L1:16 8]
POC:10,TL:3,[L0:8 0] [L1:12 16]
POC:9,TL:4,[L0:8 0] [L1:10 12]
POC:11,TL:4,[L0:10 8] [L1:12 16]
POC:14,TL:3,[L0:12 8] [L1:12 16]
POC:13,TL:4,[L0:12 8] [L1:14 16]
POC:15,TL:4,[L0:14 12] [L1:16 14]
Note that pictures 16, 8, 4, 2, 1, 12, 14, and 15 have the same reference picture(s) in both lists. For bi-prediction of these pictures, the L0 and L1 reference pictures may be identical. We propose that the encoder skips bi-prediction ME for unequal GBi weights when 1) the two reference pictures in bi-prediction are the same, and 2) the temporal layer is greater than 1, and 3) the MVD precision is 1/4 pixel. For affine bi-prediction ME, the fast skip method is applied only to 4-parameter affine ME.
(4) Skipping RD cost checking for unequal GBi weights based on the temporal layer and the POC distance between the reference picture and the current picture
It is proposed to skip those RD cost evaluations for those unequal GBi weights when the temporal layer is equal to 4 (highest temporal layer in RA) or the POC distance between the reference picture (list 0 or list 1) and the current picture is equal to 1 and the codec QP is greater than 32.
(5) During ME, for unequal GBi, the floating point calculation is changed to fixed point calculation
For existing bi-predictive searches, the encoder will fix the MVs of one list and refine the MVs in the other list. The target is modified before ME to reduce computational complexity. For example, if the MV of list 1 is fixed and the encoder is to refine the MV of list 0, the goal of list 0MV refinement is modified with equation (6). O is the original signal, and P 1 Is the predicted signal of list 1. w is the GBi weight of list 1.
T=((O<<3)-w*P 1 )*(1/(8-w)) (6)
In this context, the term (1/(8-w)) is stored with floating point precision, which increases computational complexity. It is proposed to change equation (6) to the fixed point as in equation (7).
T=(O*a 1 -P 1 *a 2 +round)>>N (7)
Wherein a is 1 And a 2 Are scaling factors and they are calculated as follows:
γ=(1<<N)/(8-w);a 1 =γ<<3;a 2 =γ*w;round=1<<(N-1)
2.8.3 CU size constraints for GBi
In this approach, GBi is disabled for small CUs. In inter prediction mode, if bi-prediction is used and the CU area is smaller than 128 luma samples, GBi is disabled without any signaling.
2.9 examples of bidirectional optical flow (BDOF or BIO)
Summary of 2.9.1BDOF
In BIO, motion compensation is first performed to generate a first prediction (in each prediction direction) of the current block. The first prediction is used to push the spatial gradient, temporal gradient, and optical flow of each sub-block or pixel within the block, and then used to generate a second prediction, e.g., a final prediction of the sub-block or pixel. Details are described below.
The bi-directional optical flow (BIO) method is a sample-by-sample motion refinement performed on the basis of bi-predictive block-by-block motion compensation. In some implementations, the sample level motion refinement does not use signaling.
Hypothesis I (k) For luminance values from reference k (k=0, 1) after block motion compensation and will be respectivelyAnd->Represented as I (k) Horizontal and vertical components of the gradient. Assuming that the optical flow is valid, the motion vector field (v x ,v y ) Given by the formula:
combining the optical flow equation with Hermite interpolation for the motion trajectories of each sample point, ultimately yielding a sum function value I (k) Derivative ofAnd->A matching unique third-order polynomial. At t=0, the polynomial has the value of BIO prediction:
FIG. 24 illustrates an example optical flow trace in a bi-directional optical flow (BIO) method. Here, τ 0 And τ 1 Representing the distance to the reference frame. Distance τ 0 And τ 1 Ref-based 0 And Ref 1 Is calculated by POC: τ 0 =poc (current) -POC (Ref 0 ),τ 1 =POC(Ref 1 ) POC (current). If the two predictions are from the same temporal direction (both from the past or both from the future), then the symbols are different (e.g., τ 0 ·τ 1 < 0). In this case, if the predictions are not from the same instant (e.g., τ 0 ≠τ 1 ). The two reference regions have non-zero motion (e.g., MVx 0 ,MVy 0 ,MVx 1 ,MVy 1 Not equal to 0), and the block motion vector is proportional to the temporal distance (e.g., MVx 0 /MVx 1 =MVy 0 /MVy 1 =-τ 01 )。
The motion vector field (vx, vy) is determined by minimizing the difference delta between the values in points a and B. Fig. 24 shows an example of the intersection of a motion trajectory and a reference frame plane. The model uses only the first linear term of the local taylor expansion of Δ:
all values in the above equation depend on the sample position, denoted (i ', j'). Assuming that the motion is uniform in the locally surrounding area, it can be minimized inside a (2m+1) x (2m+1) square window Ω centered on the current predicted point (i, j), where M equals 2:
for this optimization problem, JEM uses a simplified approach, first minimizing in the vertical direction and then minimizing in the horizontal direction. This will result in the following formula:
wherein,
to avoid division by zero or by very small values, regularization parameters r and m may be introduced in equations (9) and (10), where:
r=500·4 d-8 equation (15)
m=700·4 d-8 Equation (16)
Here, d is the bit depth of the video samples.
In order to keep memory access to BIO the same as conventional bi-predictive motion compensation, all prediction and gradient values I (k) ,Is calculated for the location inside the current block. Fig. 22A shows an example of an access location external to block 2200. As shown in fig. 22A, in equation (9), a square window Ω of (2m+1) × (2m+1) centered on the current prediction point on the boundary of the prediction block needs to access a position outside the block. At JEM, value I outside the block (k) ,Is set equal to the most recently available value inside the block. This may be implemented, for example, as a fill region 2201, as shown in fig. 22B.
With BIO it is possible to refine the motion field for each sample. Is thatThe computational complexity is reduced and the block-based design of BIO is used in JEM. Motion refinement may be calculated based on 4 x 4 blocks. In block-based BIO, s in equation (9) for all samples in a 4×4 block can be aggregated n Then s n For the derived BIO motion vector offset of the 4 x 4 block. More specifically, the following formula may be used for block-based BIO derivation:
here b k Representing the set of samples belonging to the kth 4 x 4 block of the prediction block. S in equation (12) and equation (13) n Is replaced by ((s) n,bk )>>4) To derive an associated motion vector offset.
In some scenarios, MV refinement of BIO may be unreliable due to noise or irregular motion. Thus, in BIO, the amplitude of MV refinement is clipped to a threshold value. The threshold is determined based on whether the reference pictures of the current picture are all from one direction. For example, if all the reference pictures of the current picture are from one direction, the value of the threshold is set to 12×2 14-d The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, it is set to 12×2 13-d
The motion compensated interpolation may be utilized to simultaneously calculate the gradient of the BIO using operations consistent with the HEVC motion compensation process, e.g., 2D separable Finite Impulse Response (FIR). In some embodiments, the input to the 2D separable FIR is the same reference frame samples as the motion compensation process and the fractional position (fracX, fracY) from the fractional portion of the block motion vector. For horizontal gradientsThe signal is first vertically interpolated using a BIOfilter corresponding to the fractional position fracY with a de-scaling shift d-8. The gradient filter bisfilteg is then applied in the horizontal direction corresponding to the fractional position fracX with the descaled shift 18-d. For vertical gradients +.>The gradient filter is applied vertically using a BIOfileG corresponding to the fractional position fracY with the descaled shift d-8. The signal displacement is then performed using BIOfileS in the horizontal direction corresponding to the fractional position fracX with the descaled shift 18-d. The length of the interpolation filter for gradient computation, bisafiltg, and for signal displacement, bisafiltf, may be shorter (e.g., 6 taps) in order to maintain reasonable complexity. Table 1 shows example filters that may be used for gradient computation for different fractional positions of block motion vectors in BIO. Table 2 shows an example interpolation filter that may be used for prediction signal generation in BIO.
Table 1: exemplary Filter for gradient computation in BIO
Fractional accuracy position Gradient interpolation filter (BIOfilterG)
0 {8,-39,-3,46,-17,5}
1/16 {8,-32,-13,50,-18,5}
1/8 {7,-27,-20,54,-19,5}
3/16 {6,-21,-29,57,-18,5}
1/4 {4,-17,-36,60,-15,4}
5/16 {3,-9,-44,61,-15,4}
3/8 {1,-4,-48,61,-13,3}
7/16 {0,1,-54,60,-9,2}
1/2 {-1,4,-57,57,-4,1}
Table 2: exemplary interpolation Filter for prediction Signal Generation in BIO
Fractional accuracy position Interpolation filter of prediction signal (BIOfilter S)
0 {0,0,64,0,0,0}
1/16 {1,-3,64,4,-2,0}
1/8 {1,-6,62,9,-3,1}
3/16 {2,-8,60,14,-5,1}
1/4 {2,-9,57,19,-7,2}
5/16 {3,-10,53,24,-8,2}
3/8 {3,-11,50,29,-9,2}
7/16 {3,-11,44,35,-10,3}
1/2 {3,-10,35,44,-11,3}
In JEM, BIO may be applied to all bi-predictive blocks when the two predictions come from different reference pictures. The BIO may be disabled when Local Illumination Compensation (LIC) is enabled for the CU.
In some embodiments, OBMC is applied to the block after a normal MC process. To reduce computational complexity, BIO may not be applied during the OBMC process. This means that the BIO is applied in the MC process of the block when using the MV of the block itself, and is not applied in the MC process when using the MV of the neighboring block during the OBMC process.
2.9.2 examples of BIO in VTM-3.0 as proposed in JVET-L0256
Step 1: determining whether BIO is applicable (W/H is width/height of current block)
BIO is not applicable if the following occurs
The o current video block is affine codec or ATMVP codec
ο(iPOC-iPOC 0 )×(iPOC-iPOC 1 )≥0
O h= = 4 or (w= = 4 and h= 8)
Omicron weighted prediction
The weight of the omicron GBi is not (1, 1)
If two reference blocks (denoted as R 0 And R is 1 ) If the total SAD between is less than the threshold, then BIO is not used, where
Step 2: data preparation
For a WxH block, (w+2) x (h+2) samples are interpolated.
As in normal motion compensation, the internal WxH samples are interpolated with an 8 tap interpolation filter.
Four outer lines of samples (black circles in fig. 23) are interpolated with a bi-directional linear filter.
For each position, the position is determined between two reference blocks (R 0 And R is 1 ) The gradient was calculated.
Gx0(x,y)=(R0(x+1,y)-R0(x-1,y))>>4
Gy0(x,y)=(R0(x,y+1)-R0(x,y-1))>>4
Gx1(x,y)=(R1(x+1,y)-R1(x-1,y))>>4
Gy1(x,y)=(R1(x,y+1)-R1(x,y-1))>>4
For each location, the internal value is calculated as:
t1= (R0 (x, y) > > 6) - (R1 (x, y) > > 6), t2= (Gx 0 (x, y) +gx1 (x, y)) >3, t3= (Gy 0 (x, y) +gy1 (x, y)) >3; and
B1(x,y)=T2*T2,B2(x,y)=T2*T3,B3(x,y)=-T1*T2,B5(x,y)=T3*T3,B6(x,y)=-T1*T3
step 3: computing predictions for each block
If the SAD between two 4 x 4 reference blocks is less than the threshold, the BIO is skipped for the 4 x 4 blocks.
Vx and Vy are calculated.
Calculate the final prediction for each position in a 4 x 4 block:
b(x,y)=(Vx(Gx 0 (x,y)-Gx 1 (x,y))+Vy(Gy 0 (x,y)-Gy 1 (x,y))+1)>>1
P(x,y)=(R 0 (x,y)+R 1 (x,y)+b(x,y)+offset)>>shift
herein, b (x, y) is referred to as a correction term.
BIO in 2.9.3VTM-4.0
In VTM-4.0, JHET-M0063 was used which proposes rounding the computation result in BDOF according to bit depth.
In VTM-4.0, jfet-M0487 is employed, which removes bi-directional linear filtering and extracts the nearest integer pixels of the reference block to fill the four outer lines of samples (black circles in fig. 23).
The BIO-related working draft in VTM-4.0 is shown below (from JVET-M1001)
2.9.4 fractional sample interpolation process
Universal use
The inputs to this process are:
a luminance position (xSb, ySb) specifying an upper left sample of the current encoded sub-block relative to an upper left luminance sample of the current picture,
a variable sbWidth, specifying the current codec sub-block width,
a variable sbHeight specifying the current codec sub-block height,
motion vector offset mvOffset,
a refined motion vector refMvLX,
the selected reference picture sample array refPicLX,
the bidirectional optical flow flag bdofFlag,
the variable cIdx specifies the color component index of the current block.
The output of this process is:
-an array predSamplesLX of (sbwidth+bdofoffset) x (sbheight+bdofoffset) of predicted sample values.
The bi-directional optical flow boundary offset bdofOffset is derived as follows:
bdofOffset=bdofFlag2:0 (8-811)
-if cIdx is equal to 0, the following applies:
-assuming (xIntL, yIntL) as the luminance position given in full sample units and (xFracl, yFracl) as the offset given in 1/16 sample units. These variables are used only in this clause to specify fractional sample positions inside the reference sample array refPicLX.
For each luminance sample point (x L =0..sbWidth-1+bdofOffset,y L =0..sbheight-1+bdofoffset), corresponding predicted luminance sample value predsamplelx [ x ] L ][y L ]Is deduced as follows:
-variable xInt L 、yInt L 、xFrac L And yFrac L Is deduced as follows:
xInt L =xSb+(refMvLX[0]>>4)+x L (8-812)
yInt L =ySb+(refMvLX[1]>>4)+y L (8-813)
xFrac L =refMvLX[0]&15 (8-814)
yFrac L =refMvLX[1]&15 (8-815)
-if bdofFlag is equal to TRUE and one or more of the following conditions are TRUE, predicting the luminance sample value predsamplelx [ x ] L ][y L ]Derived by invoking the luma integer sample extraction procedure as specified in clause 8.5.7.3.3 to (xInt L ,yInt L )、(xFrac L ,yFrac L ) And refPicLX as output:
-bdofFlag is equal to TRUE.
-x L Equal to 0.
-x L Equal to sbWidth +1.
-y L Equal to 0.
-y L Equal to sbheight+1.
Otherwise, the following applies:
the motion vector mvLX is set to (refMvLX-mvOffset). The predicted luminance sample value predsamplexl [ xL ] [ yL ] is derived by invoking the luminance sample 8-tap interpolation filter process as specified in clause 8.5.7.3.2, with (xIntL, yIntL), (xFracL, yFracL), refPicLX, and padVal as inputs.
……
Luminance integer sample extraction process
The inputs to this process are:
luminance position in full sample units (xInt L ,yInt L ),
-an array of luminance reference samples refPicLX L
The output of this process is the predicted luminance sample value predSampleLX L
The variable shift is set to Max (2, 14-BitDepth Y )。
The variable picW is set equal to pic_width_in_luma_samples, and the variable picH is set equal to pic_height_in_luma_samples.
The luminance positions (xInt, ynT) of the full sample unit are derived as follows:
xInt=Clip3(0,picW–1,sps_ref_wraparound_enabled_flag? (8-838)
ClipH((sps_ref_wraparound_offset_minus1+1)*MinCbSizeY,picW,xInt L ):xInt L )
yInt=Clip3(0,picH-1,yInt L ) (8-839)
predictive luminance sample value predSampleLX L Is deduced as follows:
predSampleLX L =refPicLX L [xInt][yInt]<<shift3 (8-840)
bidirectional optical flow prediction process
The inputs to this process are:
two variables nCbW and nCbH, specifying the width and height of the current codec block,
two (nCbW+2) x (nCbH+2) luminance prediction sample arrays predSamplesL0 and predSamplesL1,
the prediction list uses the flags predflag l0 and predflag l1,
reference indices refIdxL0 and refIdxL1,
-bidirectional optical flow using the flag bdofUtilizationFlag [ xIdx ] [ yIdx ], where xidx=0.(nCbW > > 2) -1, yidx=0.(nCbH > > 2) -1.
The output of this process is the (nCbW) x (nCbH) array pbSamples of luminance prediction samples values.
Variables bitDepth, shift, shift2, shift3, shift4, offset4 and mvRefineThres are derived as follows:
the variable bitDepth is set equal to bitDepth Y
The variable shift1 is set equal to Max (2, 14-bitDepth).
The variable shift2 is set equal to Max (8, bitdepth-4).
The variable shift3 is set equal to Max (5, bitdepth-7).
The variable shift4 is set equal to Max (3, 15-bitDepth) and the variable offset4 is set equal to 1< < (shift 4-1).
The variable mvrefintethres is set equal to Max (2, 1< < (13-bitDepth)).
For xidx=0.(nCbW > > 2) -1 and yidx=0.(nCbH > > 2) -1, the following applies:
the variable xSb is set equal to (xIdx < < 2) +1, and ySb is set equal to (yIdx < < 2) +1.
-if bdofUtilizationFlag [ xSbIdx ] [ yIdx ] is equal to FALSE (FALSE), then for x=xSb-1..xSb+2, y= ySb-1.. ySb +2, the predicted sample value for the current sub-block is derived as follows:
pbSamples[x][y]=Clip3(0,(2 bitDepth )-1,(predSamplesL0[x+1][y+1]+offset2+predSamplesL1[x+1][y+1])>>shift2) (8-852)
otherwise (bdofUtilizationFlag [ xSbIdx ] [ yIdx ] equals TRUE), the predicted sample value for the current sub-block is derived as follows:
for x=xsb-1..xsb+4, y= ySb-1.. ySb +4, the following sequential steps apply:
1. predicting the position (h) of each corresponding sample point (x, y) inside the sample point array x ,v y ) Is deduced as follows:
h x =Clip3(1,nCbW,x) (8-853)
v y =Clip3(1,nCbH,y) (8-854)
2. the variables gradientHL0[ x ] [ y ], gradientVL0[ x ] [ y ], gradientHL1[ x ] [ y ] and gradientVL1[ x ] [ y ] are derived as follows:
gradientHL0[x][y]=(predSamplesL0[h x +1][v y ]-predSampleL0[h x -1][v y ])>>shift1 (8-855)
gradientVL0[x][y]=(predSampleL0[h x ][v y +1]-predSampleL0[h x ][v y -1])>>shift1 (8-856)
gradientHL1[x][y]=(predSamplesL1[h x +1][v y ]-predSampleL1[h x -1][v y ])>>shift1 (8-857)
gradientVL1[x][y]=(predSampleL1[h x ][v y +1]-predSampleL1[h x ][v y -1])>>shift1 (8-858)
3. the variables temp [ x ] [ y ], temp [ x ] [ y ] and temp [ v [ x ] [ y ] were deduced as follows:
diff[x][y]=(predSamplesL0[h x ][v y ]>>shift2)-(predSamplesL1[h x ][v y ]>>shift2) (8-859)
tempH[x][y]=(gradientHL0[x][y]+gradientHL1[x][y])>>shift3 (8-860)
tempV[x][y]=(gradientVL0[x][y]+gradientVL1[x][y])>>shift3 (8-861)
variables sGx, sGy2, sGxGy, sGxdI and sGydI are derived as follows:
sGx2=Σ i Σ j (tempH[xSb+i][ySb+j]*tempH[xSb+i][ySb+j]) Wherein i, j= -1..4 (8-862)
sGy2=Σ i Σ j (tempV[xSb+i][ySb+j]*tempV[xSb+i][ySb+j]) Wherein i, j= -1..4 (8-863)
sGxGy=Σ i Σ j (tempH[xSb+i][ySb+j]*tempV[xSb+i][ySb+j]) Wherein i, j-1..4 (8-864)
sGxdI=Σ i Σ j (-tempH[xSb+i][ySb+j]*diff[xSb+i][ySb+j]) Wherein i, j= -1..4 (8-865)
sGydI=Σ i Σ j (-tempV[xSb+i][ySb+j]*diff[xSb+i][ySb+j]) Wherein i, j= -1..4 (8-866)
The horizontal and vertical motion offsets of the current sub-block are derived as follows:
v x =sGx2>0Clip3(-mvRefineThres,mvRefineThres,-(sGxdI<<3)>>Floor(Log2(sGx2))):0 (8-867)
v y =sGy2>0Clip3(-mvRefineThres,mvRefineThres,((sGydI<<3)-((v x *sGxGy m )<<12+v x *sGxGy s )>>1)>>Floor(Log2(sGx2))):0 (8-868)
-for x=xsb-1..xsb+2, y= ySb-1.. ySb +2, the predicted sample value for the current sub-block is derived as follows:
bdofOffset=Round((v x *(gradientHL1[x+1][y+1]-gradientHL0[x+1][y+1]))>>1)+Round((v y *(gradientVL1[x+1][y+1]-gradientVL0[x+1][y+1]))>>1) (8-869)
[ Ed. (JC): the Round () operation is defined for floating point inputs. The Round () operation appears redundant here because the input is an integer value. To be proposed person confirmation ]
pbSamples[x][y]=Clip3(0,(2 bitDepth )-1,(predSamplesL0[x+1][y+1]+offset4+predSamplesL1[x+1][y+1]+bdofOffset)>>shift4) (8-870)
2.10 example of decoder-side motion vector refinement (DMVR)
In the bi-prediction operation, for prediction of one block region, two prediction blocks formed using a Motion Vector (MV) of list 0 and a MV of list 1, respectively, are combined to form a single prediction signal. In the decoder-side motion vector refinement (DMVR) method, two motion vectors of bi-prediction are further refined by a bilateral template matching process. Bilateral template matching is applied in the decoder to perform a distortion-based search between the bilateral template and reconstructed samples in the reference picture in order to obtain refined MVs without transmitting additional motion information.
At DMVR, bilateral templates are generated as weighted combinations (i.e., averages) of two prediction blocks from list 0's initial MV0 and list 1's MV1, respectively, as shown in fig. 24. The template matching operation consists of a cost metric between the computationally generated template and the sample areas (around the initial prediction block) in the reference picture. For each of the two reference pictures, the MV that yields the smallest template cost is considered to be the updated MV of the list to replace the original MV. In JEM, nine MV candidates are searched for each list. The nine candidate MVs include the original MV and 8 surrounding MVs having one luminance sample offset to the original MV in the horizontal or vertical direction or both directions. Finally, two new MVs, MV0 'and MV1', as shown in FIG. 24, are used to generate the final bi-prediction result. The Sum of Absolute Differences (SAD) is used as the cost metric. Note that when calculating the cost of a prediction block generated by one surrounding MV, a rounded MV (to integer pixels) is actually used to obtain the prediction block, not a true MV.
DMVR is applied to the Merge mode of bi-prediction, where one MV comes from a past reference picture and the other from a future reference picture, without transmitting additional syntax elements. In JEM, DMVR is not applied when LIC, affine motion, FRUC, or sub-CU Merge candidates are enabled for the CU.
2.11JVET-N0236
This document proposes a method for refining sub-block based affine motion compensated prediction using optical flow. After performing sub-block based affine motion compensation, the prediction samples are refined by adding the differences derived from the optical flow equations, which is referred to as prediction refinement with optical flow (PROF). The method can realize the inter-frame prediction of pixel level granularity without increasing the memory access bandwidth.
In order to obtain finer granularity of motion compensation, this document proposes a method of refining sub-block based affine motion compensation prediction using optical flow. After performing sub-block based affine motion compensation, luminance prediction samples are refined by adding differences derived from the optical flow equations. The proposed PROF (predictive refinement with optical flow) is described as the following four steps.
Step 1) performs sub-block based affine motion compensation to generate sub-block predictions I (I, j).
Step 2) use of 3 tap filter [ -1,0,1 [ -1 ]]Calculating the spatial gradient g of the sub-block prediction at each sample point position x (i, j) and g y (i,j)。
g x (i,j)=I(i+1,j)-I(i-1,j)
g y (i,j)=I(i,j+1)-I(i,j-1)
For gradient computation, the sub-block prediction is extended by one pixel on each side. To reduce memory bandwidth and complexity, pixels on the extended boundary are copied from the nearest integer pixel location in the reference picture. Thus, additional interpolation of the filled region is avoided.
Step 3) a luminance prediction refinement (denoted as Δi) is calculated from the optical flow equation.
ΔI(i,j)=g x (i,j)*Δv x (i,j)+g y (i,j)*Δv y (i,j)
Where the increment MV (denoted as Δv (i, j)) is the difference between the pixel MV (denoted as v (i, j)) calculated for the sample point (i, j) and the sub-block MV of the sub-block to which the pixel (i, j) belongs, as shown in fig. 25.
Since affine model parameters and pixel locations relative to the center of the sub-blocks are not changed between sub-blocks, Δv (i, j) can be calculated for the first sub-block and reused for other sub-blocks in the same CU. Assuming that x and y are the horizontal and vertical offsets from the pixel location to the center of the sub-block, deltav (x, y) can be derived by the following equation,
for a 4-parameter affine model,
for a 6-parameter affine model,
wherein, (v) 0x ,v 0y )、(v 1x ,v 1y )、(v 2x ,v 2y ) Is the upper left, upper right and lower left control point motion vector, w and h are the width and height of the CU.
Step 4) finally, a luma prediction refinement is added to the sub-block prediction I (I, j). The final prediction I' is generated as follows.
I'(i,j)=I(i,j)+ΔI(i,j)
Some details in JVET-N0236
a) How to derive the gradient of PROF
In JVET-N0263, gradients are calculated for each sub-block (4X 4 sub-blocks in VTM-4.0) of each reference list. For each sub-block, the nearest integer-sample of the reference block is acquired to fill the four outer lines of the sample (black circles in fig. 23).
Let MV of the current sub-block be (MVx, MVy). The fractional part is then calculated as (FracX, fracY) = (MVx &15, mvy & 15). The integer part is calculated as (IntX, intY) = (MVx > >4, mvy > > 4). The offset (OffsetX, offsetY) is derived as:
OffsetX=FracX>71:0;
OffsetY=FracY>71:0;
assume that the upper left coordinate of the current sub-block is (xCur, yCur) and the dimension of the current sub-block is w×h.
Then (xCor 0, yCor 0), (xCor 1, yCor 1), (xCor 2, yCor 2) and (xCor 3, yCor 3) are calculated as:
(xCor0,yCor0)=(xCur+IntX+OffsetX-1,yCur+IntY+OffsetY-1);
(xCor1,yCor1)=(xCur+IntX+OffsetX-1,yCur+IntY+OffsetY+H);
(xCor2,yCor2)=(xCur+IntX+OffsetX-1,yCur+IntY+OffsetY);
(xCor3,yCor3)=(xCur+IntX+OffsetX+W,yCur+IntY+OffsetY);
assume PredSample [ x ] [ y ] (where x=0..w-1, y=0..h-1) stores the prediction samples of the sub-block. Then the fill samples are derived as
PredSample [ x ] [ 1] = (Ref (xcor0+x, ycor0) < < Shift 0) -routing, for x= -1..w;
PredSample [ x ] [ H ] = (Ref (xcor1+x, ycor1) < < Shift 0) -routing, for x= -1..w;
PredSample [ -1] [ y ] = (Ref (xCor 2, yCor 2+y) < < Shift 0) -routing, for y=0..h-1;
PredSample [ W ] [ y ] = (Ref (xCor 3, yCor 3+y) < < Shift 0) -routing, for y=0..h-1;
where Rec represents a reference picture. Winding is an integer, equal to 2 in the exemplary PROF embodiment 13 。Shift0=Max(2,(14-BitDepth));
The PROF attempts to improve the accuracy of the gradient, unlike the BIO in VTM-4.0, where the gradient is output with the same accuracy as the input luminance samples.
The gradient in the PROF is calculated as follows:
Shift1=Shift0-4。
gradientH[x][y]=(predSamples[x+1][y]-predSample[x-1][y])>>Shift1
gradientV[x][y]=(predSample[x][y+1]-predSample[x][y-1])>>Shift1
it should be noted that predSamples [ x ] [ y ] maintain accuracy after interpolation.
b) How to derive Deltav of PROF
The derivation of Δv (denoted as dMvH [ posX ] [ posY ] and dmv [ posX ] [ posY ], where posx=0..w-1, posy=0..h-1) can be described as follows:
assuming that the dimension of the current block is cbwidth×cbheight, the number of control point motion vectors is numCpMv, and the control point motion vector is cpMvLX [ cpIdx ], where cpidx=0..numcpmv-1, and X is 0 or 1, two reference lists are represented.
The variables log2CbW and log2CbH were derived as follows:
log2CbW=Log2(cbWidth)
log2CbH=Log2(cbHeight)
variables mvScaleHor, mvScaleVer, dHorX and dVerX are derived as follows:
mvScaleHor=cpMvLX[0][0]<<7
mvScaleVer=cpMvLX[0][1]<<7
dHorX=(cpMvLX[1][0]-cpMvLX[0][0])<<(7-log2CbW)
dVerX=(cpMvLX[1][1]-cpMvLX[0][1])<<(7-log2CbW)
the variables dHorY and dVerY are derived as follows:
-if numCpMv is equal to 3, the following applies:
dHorY=(cpMvLX[2][0]-cpMvLX[0][0])<<(7-log2CbH)
dVerY=(cpMvLX[2][1]-cpMvLX[0][1])<<(7-log2CbH)
Otherwise (numCpMv equal to 2), the following applies:
dHorY=-dVerX
dVerY=dHorX
variables qHorX, qVerX, qHorY and qVerY are derived as follows
qHorX=dHorX<<2;
qVerX=dVerX<<2;
qHorY=dHorY<<2;
qVerY=dVerY<<2;
dMvH [0] [0] and dMvV [0] [0] are calculated as follows
dMvH[0][0]=((dHorX+dHorY)<<1)-((qHorX+qHorY)<<1);
dMvV[0][0]=((dVerX+dVerY)<<1)-((qVerX+qVerY)<<1);
dMvH [ xPOS ] [0] and dMvV [ xPOS ] [0] for xPOS from 1 to W-1 were derived as follows:
dMvH[xPos][0]=dMvH[xPos-1][0]+qHorX;
dMvV[xPos][0]=dMvV[xPos-1][0]+qVerX;
for yPos from 1 to H-1, the following applies:
dMvH [ xPOs ] [ yPos ] = dMvH [ xPOs ] [ yPos-1] +qHorY, wherein xPOs=0..W-1
dMvV [ xPOs ] [ yPos ] = dMvV [ xPOs ] [ yPos-1] +qVerY, wherein xPOs=0..W-1
Finally, dMvH [ xPOs ] [ yPos ] and dMvV [ xPOs ] [ yPos ] (where posX=0..W-1, posY=0..H-1) are right shifted to
dMvH[xPos][yPos]=SatShift(dMvH[xPos][yPos],7+2-1);
dMvV[xPos][yPos]=SatShift(dMvV[xPos][yPos],7+2-1);
Wherein SatShift (x, n) and Shift (x, n) are defined as
Shift(x,n)=(x+offset0)>>n
In one example, offset0 and/or offset1 is set to (1 < < n > >1.
c) How to derive ΔI of PROF
For the position inside the sub-block (posX, posY), its corresponding Deltav (i, j) is denoted (dMvH [ posX ] [ posY ], dMvV [ posX ] [ posY ]). The corresponding gradient is expressed as (gradientH [ posX ] [ posY ], gradientV [ posX ] [ posY ]).
Then Δi (posX, posY) is derived as follows.
(dMvH [ posX ] [ posY ], dMvV [ posX ] [ posY ]) is cut into
dMvH[posX][posY]=Clip3(-32768,32767,dMvH[posX][posY]);
dMvV[posX][posY]=Clip3(-32768,32767,dMvV[posX][posY]);
ΔI(posX,posY)=dMvH[posX][posY]×gradientH[posX][posY]+dMvV[posX][posY]×gradientV[posX][posY];
ΔI(posX,posY)=Shift(ΔI(posX,posY),1+1+4);
ΔI(posX,posY)=Clip3(-(2 13 -1),2 13 -1,ΔI(posX,posY));
d) How to derive I 'of PROF'
If the current block is not coded as bi-directional prediction or weighted prediction, then
I’(posX,posY)=Shift((I(posX,posY)+ΔI(posX,posY)),Shift0),
I’(posX,posY)=ClipSample(I’(posX,posY)),
Wherein Clipsample clips the sample value to a valid output sample value.
Then, I' (posX, posY) is output as an inter prediction value.
Otherwise (current block is encoded and decoded as bi-directional prediction or weighted prediction)
I' (posX, posY) will be stored and used to generate inter-prediction values from other prediction values and/or weighting values.
2.12JVET-N 0510
In jfet-N0510, a phase change affine sub-block motion compensation (MV) method is proposed. Conventional two-stage horizontal-vertical interpolation is applied. However, different phases of the filter may be applied to different dot rows and different dot columns in the affine sub-block, unlike the MV based phase invariant block which uses the same horizontal filter for all dot rows and the same vertical filter for all dot columns.
In order to better approximate the affine motion model in the affine sub-block, a phase change MC is applied to the sub-block. In the proposed method, the affine codec block is also divided into 4×4 sub-blocks, and sub-blocks MV are derived for each sub-block as done in VTM 4.0. The MC of each sub-block is divided into two phases. The first stage is to filter the (4+L-1) x (4+L-1) reference block window with (4+L-1) line horizontal filtering, where L is the filter tap length of the interpolation filter. However, unlike the translation MC, in the proposed phase change affine sub-block MC, the filtered phase of each sample line is different. For each sample line, MVx is derived as follows.
MVx=(subblockMVx<<7+dMvVerX×(rowIdx–L/2–2))>>7
The filter phase for each sample line is derived from MVx. The subspeck mvx is the x component of the MV of the derived sub-block MV, as is done in VTM 4.0. rowIdx is the sample line index. dMvVerX is (cuBottomLeftCPMVx-cuTopLeftCPMVx) < < (7-log 2 LumaCbHeight), where cuBottomLeftCPMVx is the x-component of the control point MV at the lower left of the CU, cuTopLeftCPMVx is the x-component of the control point MV at the upper left of the CU, and LumaCbHeight is log2 of the height of the luma Codec Block (CB).
After horizontal filtering, 4× (4+L-1) horizontal filtered samples are generated. Fig. 26 shows the proposed concept of horizontal filtering. In fig. 26 and 27, light gray points (e.g., 2602 or 2702, which are multiple points) are samples of the reference block window, and dark gray points (e.g., 2604 or 2704) represent horizontally filtered samples. The blue tube of 8×1 samples represents the application of 8 tap horizontal filtering once, as shown in fig. 26 and 27, respectively. Four horizontal filters are required per sample line. The filter phases on the sample line are identical. However, the filter phases on different rows are different. A skew of 4 x 11 samples is generated.
In the second stage, 4× (4+L-1) horizontally filtered samples (e.g., light gray samples (2602) in fig. 26) are further vertically filtered. For each sample column, MVy is derived as follows.
MVy= (subeckMVy < <7+dMvHorY× (columnIdx-2)) >7 (equation 2)
The filter phase for each sample column is derived from MVy. The subspeck mvy is the y component of the MV of the derived sub-block MV, as is done in VTM 4.0. columnIdx is the sample column index. dMvHorY is (cuTopRight CPMVy-cuTopLeftCPMVy) < < (7-log 2 LumaCbWidth), where cuTopRight CPMVy is the y-component of the control point MV at the upper right of the CU, cuTopLeftCPMVy is the y-component of the control point MV at the upper left of the CU, and log2LumaCbWidth is log2 of the width of the luminance CB.
After vertical filtering, 4×4 affine sub-block prediction samples are generated. Fig. 28 shows the proposed concept of vertical filtering. The light gray dots (2802) are horizontally filtered samples from the first stage. The dark gray points (2804) are vertically filtered samples that are the final predicted samples.
In this proposal, the interpolation filter set used is the same as in VTM 4.0. The only difference is that the horizontal filter phase is different on one sample line and the vertical filter phase is different on one sample column. For the number of filtering operations per affine sub-block in the proposed method it is the same as in VTM 4.0.
jfet-O0057: switchable interpolation filter
This document proposes a switchable interpolation filter for half-pel (half-pixel) positions as proposed in jfet-N0309. The switching of the half-pixel luminance interpolation filter is performed according to the motion vector accuracy. In addition to the existing quarter-pixel, full-pixel, and 4-pixel AMVR modes, new half-pixel precision AMVR modes have been introduced. Only in case of half-pixel motion vector accuracy, an alternative half-pixel luminance interpolation filter may be signaled. In the case of using the skip/Merge mode of the spatial Merge candidate, information of applying an interpolation filter to half pixel positions is inherited from neighboring blocks.
2.2.14JVET-O1140
When the DMVR or BDOF SPS level control flag is true, an SPS flag sps_bdif_dmvr_slice_present_flag signaled in the SPS is used to indicate the presence of slice_disable_bdif_dmvr_flag. If so, a slice_disable_bdif_dmvr_flag is signaled after the fractional MMVD flag in the slice header.
Slice_disable_bdif_dmvr_flag equal to 1 specifies that both bi-directional optical flow inter prediction and inter bi-directional prediction refined based on decoder motion vectors are disabled in the current slice. Slice_disable_bdif_dmvr_flag equal to 0 specifies that bi-directional optical flow inter prediction or inter bi-directional prediction refined based on decoder motion vectors may or may not be disabled in the current slice. When the slice_disable_bdif_dmvr_flag does not exist, the value of slice_disable_bdif_dmvr_flag is inferred to be 0.
3. Disadvantages of the prior embodiments
Some existing implementations suffer from the following disadvantages:
(1) Gradient calculation methods are different in BDOF and PROF.
(a) In BDOF, gradients are calculated for the entire block and a fill is made at one time. In the PROF, gradients are calculated for each sub-block and padding is performed N times (assuming there are N sub-blocks).
(b) The PROF requires higher gradient accuracy than the BDOF.
(2) The interaction between the PROF and other tools is not clear.
(3) It is unclear how the PROF is applied to the chrominance components.
(4) The derivation of Δv may be incorrect.
(5) For higher codec performance, the PROF may be conditionally performed.
(6) It is not clear how to combine the methods in JET-N0236 and JET-N0510.
(7) The bit width of dMvH and dmv may be too large.
4. Example method for Predictive Refinement (PROF) with optical flow
Embodiments of the presently disclosed technology overcome the shortcomings of existing implementations, thereby providing video codecs with higher codec efficiency. Based on the disclosed techniques, methods for predictive refinement with optical flow may enhance existing and future video codec standards, set forth in the following examples described for various embodiments. The examples of the disclosed technology provided below illustrate general concepts and are not meant to be construed as limiting. In the examples, various features described in these examples may be combined unless explicitly indicated to the contrary.
Reference pictures from the current picture of list 0 and list 1 are denoted by Ref0 and Ref1, respectively, denoted τ 0 =poc (current) -POC (Ref 0), τ 1 POC (Ref 1) -POC (current), and reference blocks from the current blocks of Ref0 and Ref1 are denoted by refblk0 and refblk1, respectively. For a sub-block in the current block, the MV pointing to refblk1 of its corresponding sub-block in refblk0 is represented by (v x ,v y ) And (3) representing. The MVs of the sub-blocks in Ref0 and Ref1 are respectively defined by (mvL 0 x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) And (3) representing.
Shift (x, s) is defined as Shift (x, s) = (x+off) > > s.
SignShift (x, s) is defined as
In an example, offset0 and/or offset1 is set to (1 < < n) > >1 or (1 < < (n-1)). In another example, offset0 and/or offset1 is set to 0. In yet another example, offset0 = offset1 = ((1 < < n) > > 1) -1 or ((1 < < (n-1))) -1.
Clip3 (x, min, max) is defined as
Herein, max (a, b) =a > =ba:b, and Min (a, b) =a < =ba:b.
In the following discussion, the operation between two motion vectors means that the operation will be applied to both components of the motion vector. For example, mv3=mv1+mv2 corresponds to mv3 x =MV1 x +MV2 x And MV3 y =MV1 y +MV2 y . Alternatively, the operation may be applied to only the horizontal or vertical components of the two motion vectors. The term "absolute value" of MV (MVx, MVy) may refer to abs (MVx), or abs (MVy), or max (abs (MVx), abs (MVy)), or abs (MVx) +abs (MVy), wherein the function abs (x) returns the absolute value of x and the function max (x, y) returns the greater one of x and y.
In the following discussion, the left side neighboring block, the lower left neighboring block, the upper side neighboring block, the upper right neighboring block, and the upper left neighboring block are represented as block a as shown in fig. 2 1 、A 0 、B 1 、B 0 And B 2
1. It is proposed that gradient calculations in PROF can be performed in M x N region levels different from the sub-block sizes used for motion compensation in affine mode.
a. In one example, gradient calculations in PROF may be performed for M N regions that are larger than the sub-blocks.
b. In one example, M and N may be some predefined number, e.g., m=n=8 or m=n=16.
c. In one example, M and N may be some number defined according to the width/height of the sub-block size, e.g., m=n=2×wmc, where Wmc is the width/height of the sub-block size used in motion compensation.
d. The filling process for deriving gradients in the PROF is performed in the mxn region level.
e. For all the above examples, the definition of M and N is as follows:
i. in one example, m=min (K0, block width), where K0 is an integer value.
in one example, n=min (K1, block height), where K0 is an integer value.
K0=k1=16 for the above example.
in one example, K0 and K1 are aligned for BDOF.
f. The gradient of the first sample in the first sub-block may be deduced with the second sample in the second sub-block.
i. In one example, the second sub-block is adjacent to the first sub-block.
in one example, when the second sample is in the first sub-block or the second sub-block, the second sample is used to derive the gradient of the first sample in the same manner.
The above method may be applied when mxn is larger than the sub-block.
g. One or more MVs may be derived for the filling process of each mxn region.
i. In one example, one particular MV is derived for the filling process of the mxn region. Integer reference samples may be located with a particular MV and then used to fill samples outside of the mxn region.
(i) In one example, the specific MV may be one MV of one sub-block in the mxn area, such as an upper left corner sub-block or a center sub-block in the mxn area. Fig. 31 shows an example. The MV of the sub-block A, B, C, D or E may be selected as a specific MV.
(ii) In one example, a particular MV may be derived from an affine model that is oriented to a particular location (such as the center) of an mxn region.
(iii) In one example, a specific MV may be derived from MVs of sub-blocks in an mxn region.
a. For example, a specific MV may be derived as an average of MVs of all sub-blocks in an mxn region.
b. For example, a particular MV may be derived as an average of several MVs of the center sub-block.
i. For example, a particular MV may be derived as an average of several MVs of B, C, D and E in fig. 31.
For example, a particular MV may be derived as an average of several MVs of B and E in fig. 31.
For example, a particular MV may be derived as an average of several MVs of C and D in fig. 31.
c. For example, a particular MV may be derived as a function of multiple MVs (e.g., CPMV or MVs of a sub-block).
in one example, multiple MVs are derived for the filling process of the MXN region. Integer reference samples may be located with one of the MVs and then used to fill samples outside the mxn region.
(i) In one example, when filling a first sample adjacent to a first sub-block of an mxn region, a first MV of the first sub-block may be used to locate integer reference sample point(s) for filling the first sample.
The above method is applied when mxn is larger than the sub-block and a filling procedure for deriving gradients in the PROF is performed for each mxn region.
2. Gradient calculations in PROF/BIO can be performed in the M N region level and M/N can be adaptively changed.
a. In one example, M and N may depend on the dimension w×h of the current block.
i. For example, the region may be the entire current block, i.e., m=w and n=h.
For example, m=w/T1 and n=h/T2, where T1 and T2 are integers, for example t1=t2=2.
M and/or N may be signaled from the encoder to the decoder, such as in VPS/DPS/SPS/PPS/APS/slice header/slice group header/CTU/CU, in one example.
(i) Alternatively, M and/or N may be specified in a profile/level/hierarchy of the video codec standard.
in one example, m=min (W, T1) and n=min (H, T2). For example, t1=t2=16.
(i) In one example, T1 and/or T2 may be signaled from the encoder to the decoder, such as in VPS/DPS/SPS/PPS/APS/slice header/slice/CTU/CU.
(ii) Alternatively, T1 and/or T2 may be specified in a profile/level/hierarchy of the video codec standard.
3. For the above method, the following may be further applicable:
a. in one example, M is at least equal to Mmin, and N is at least equal to Nmin, e.g., mmin=nmin=8.
b. In one example, a filling process is performed once for each mxn region to obtain a filled (m+dm) × (n+dn) region, e.g., dm=dn=2.
i. In one example, samples inside the region (such as white circles in fig. 23) may be derived from motion compensation with interpolation filtering.
(i) In one example, samples inside the region may be derived from motion compensation for several sub-blocks in the region.
in one example, four outer lines of spots (such as black circles in fig. 23) may be filled.
(i) In one example, the samples to be filled may replicate the intensity of the nearest integer samples in the reference block.
(ii) In one example, the samples to be filled may replicate the intensity of the nearest samples in the unfilled region.
4. For each region where gradient computation in PROF/BIO is applied, instead of computing the gradient value for each sample, it is proposed to compute the gradient based on part of the samples.
a. In one example, the gradient associated with a sample point at a given coordinate may be used in PROF/BIO, such as at (2 x, y) or (x, 2 y) or (2x+1, 2y+1) or (2 x,2 y), where (m, n) is the coordinate relative to the upper left sample point in the current block.
b. In one example, the samples may be modified first (e.g., downsampled), and the gradient may be derived using the modified samples.
5. It is proposed that the accuracy of the gradient values calculated in BDOF and PROF may be the same.
a. In one example, the sample differences may be shifted with the same value.
i. In one example, the horizontal and/or vertical gradients (respectively denoted gradientH, gradientV) can be calculated by:
gradientH[x][y]=(predSamples[x+1][y]-predSample[x-1][y])>>Shift0
gradientV[x][y]=(predSample[x][y+1]-predSample[x][y-1])>>Shift1
alternatively, the number of the first and second electrodes,
gradientH[x][y]=Shift((predSamples[x+1][y]-predSample[x-1][y]),Shift0)
gradientV[x][y]=Shift((predSample[x][y+1]-predSample[x][y-1]),Shift1)
alternatively, the number of the first and second electrodes,
gradientH[x][y]=SatShift((predSamples[x+1][y]-predSample[x-1][y]),Shift0)
gradientV[x][y]=SatShift((predSample[x][y+1]-predSample[x][y-1]),Shift1)
in one example, the horizontal and/or vertical gradients (respectively denoted by gradientH, gradientV) can be calculated by:
gradientH[x][y]=(predSamples[x][y]*2-predSamples[x+1][y]-predSample[x-1][y])>>Shift0
gradientV[x][y]=(predSamples[x][y]*2-predSample[x][y+1]-predSample[x][y-1])>>Shift1
alternatively, the number of the first and second electrodes,
gradientH[x][y]=Shift((predSamples[x][y]*2-predSamples[x+1][y]-predSample[x-1][y]),Shift0)
gradientV[x][y]=Shift((predSamples[x][y]*2-predSample[x][y+1]-predSample[x][y-1]),Shift1)
alternatively, the number of the first and second electrodes,
gradientH[x][y]=SatShift((predSamples[x][y]*2-predSamples[x+1][y]-predSample[x-1][y]),Shift0)
gradientV[x][y]=SatShift((predSamples[x][y]*2-predSample[x][y+1]-predSample[x][y-1]),Shift1)
in one example, shift0 and/or Shift1 may be set to Max (2, (14-BitDepth)), where BitDepth is the bit depth of the reconstructed/input samples.
6. The following method of filling the outer lines of the spots (denoted as filling the spots, such as the black circles in fig. 23) may be applied to the PROF, or the BIO, or both the PROF and the BIO.
a. Filling the spots may be performed in the same way as pro and/or BIO. The "same method" may be any filling method disclosed below.
b. In one example, padding samples may be derived (e.g., copied) from integer samples in reference pictures of the PROF and/or BIO.
i. In one example, integer samples used to derive the padding samples may be located by the positions of the padding samples, adding MVs that may be rounded to integer MVs in the addition operation.
(i) In one example, MV (MvX, mvY) may be rounded to a downward rounded integer MV (IntX, intY). For example, intx= MvX > > P, inty= MvY > > P, where P is MV precision.
(ii) In one example, MV (MvX, mvY) may be rounded to the nearest whole integer MV (IntX, intY). For example, setting fracx= MvX & ((1 < < P) -1), fracy= MvY & ((1 < < P) -1), offX = (FracX > = (1 < < (P-1))? 1:0, offy= (FracY > = (1 < < (P-1))? 1:0, where P is MV precision, then intx= (MvX > > P) +offx, inty= (MvY > > P) +offy. HalfFrac may be equal to 1< < (P-1), in other examples it may be equal to (1 < < (P-1)) -1 or (1 < < (P-1)) +1.
(iii) In one example, when intx=satshift (MvX, P), inty=satshift (MvY, P), MV (MvX, mvY) may be rounded to an integer MV (IntX, intY), where P is MV precision.
(iv) In the bullets above, MV precision P may depend on color format and/or color components.
a. For example, the MV precision of the Cb/Cr component may be equal to the MV precision of the luma component plus K in the 4:2:0 color format. For example, K may be equal to 1.
(v) How padding is performed may be signaled from the encoder to the decoder, such as in VPS/DPS/SPS/PPS/APS/slice header/slice group header/slice/CTU/CU.
a. Alternatively, how the padding is performed may be specified in a profile/level/hierarchy of the video codec standard.
(vi) How this is filled may depend on the block dimensions.
7. It is proposed that when PROF is applied, the codec tool X cannot be applied.
a. Alternatively, when the codec tool X is applied, the pro cannot be applied.
b. In one example, if codec tool X cannot be applied, syntax element(s) to indicate codec tool X may not be signaled.
c. In one example, the codec tool X may be generalized bi-directional prediction (GBI).
i. For example, when gbidix is not equal to 0, pro is not applied.
Alternatively, gbidix must be 0 when PROF is applied.
Alternatively, gbidix is not signaled and is inferred to be 0 when PROF is applied.
Alternatively, when PROF is applied, gbi is not applied regardless of whether gbidix is equal to 0.
d. In one example, the codec tool X may be a local illumination compensation.
e. In one example, the codec tool X may be a Multiple Transform Set (MTS).
i. For example, when PROF is applied, only default transformations can be applied.
(i) For example, when the PROF is applied, the syntax element related MTS is not applied.
f. In one example, the codec tool X may be a weighted prediction.
i. For example, when unequal weights and/or unequal offsets due to weighted prediction are applied to one block, the PROF is not applied.
8. The proposed how to apply the PROF may depend on the color format and/or the use of separate plane codecs.
a. In one example, if the color format is 4:0:0, PROF cannot be applied to the chroma components.
b. In one example, if the color format is 4:4:4, PROF may be applied to the chroma components.
c. In one example, if the color format is not equal to 4:0:0, PROF may be applied to the chroma components.
d. In one example, how the delta MV (e.g., Δv in section 2.11) is derived may depend on the color format.
9. It is proposed how the PROF is applied may depend on the color components.
a. In one example, the gradient may be calculated independently for each color component.
i. Alternatively, the gradient calculated for the first color component may be used by the second color component.
Alternatively, the gradient may be calculated twice, one for the luminance/dominant color component and another for the two chrominance/related color components.
b. In one example, delta MV (e.g., Δv in section 2.11) may be calculated independently for each color component.
i. Alternatively, the delta MV calculated for the first color component may be used by the second color component.
c. In one example, a predictive refinement (e.g., Δi in section 2.11) may be calculated independently for each color component.
i. Alternatively, a predictive refinement (e.g., Δi in section 2.11) calculated for the first color component may be used by the second color component.
d. In one example, the accuracy of the gradient in the PROF may depend on the color component.
e. In one example, the accuracy of the delta MV in the PROF (e.g., Δv in section 2.11) may depend on the color component.
f. In one example, whether and how the clipping operation is performed in the PROF may depend on the color components.
g. In one example, whether and how the shift operation is performed in the PROF may depend on the color components.
h. In one example, the PROF may be applied to the luminance component only.
i. In one example, the PROF may be applied to different color components of different sub-block sizes.
i. Alternatively, the PROF may be applied to different color components of the same sub-block size.
j. In one example, the PROF may be applied to chroma components of M x N sub-block sizes.
i. For example, M and N are set equal to 4.
k. The above method (bullets h-j) may further depend on the color format (e.g., 4:2:0 or 4:4:4).
10. The derivation of the proposed increment MV (e.g., Δv in section 2.11) may depend on the width and/or height of the sub-block.
a. In one example, dMvH [0] [0] and dMvV [0] [0] are calculated as
qHorX=dHorX*P0;
qVerX=dVerX*P0;
qHorY=dHorY*P0;
qVerY=dVerY*P0;
dMvH[0][0]=((iDMvHorX+iDMvVerX)*P1)-(quadHorX*(blockWidth>>1)+quadVerX*(blockHeight*P1));
dMvV[0][0]=((iDMvHorY+iDMvVerY)*P1)-(quadHorY*(blockWidth>>1)+quadVerY*(blockHeight*P1));
Where blockWidth and blockHeight represent the width and height of the sub-block, respectively. P0 and P1 are two numbers of control accuracy.
i. For example, p0=4 and p1=2, then dMvH [0] [0] and dmv [0] [0] are calculated as:
qHorX=dHorX<<2;
qVerX=dVerX<<2;
qHorY=dHorY<<2;
qVerY=dVerY<<2;
dMvH[0][0]=((iDMvHorX+iDMvVerX)<<1)-(quadHorX*(blockWidth>>1)+quadVerX*(blockHeight>>1));
dMvV [0] [0] = ((idmvhoriy+idmvvery) < < 1) - (quadwary (blockWidth > > 1) +quadwvery (blockHeight > > 1)); 11. it is proposed that for affine codec blocks, the pro may be performed conditionally, instead of always being applied.
a. In one example, whether and how the PROF is performed may depend on the dimension w×h of the current block.
i. For example, if W < =t1 and/or H < =t2, no PROF may be applied, e.g. t1=t2=16;
for example, if W < T1 and/or H < T2, then no PROF may be applied, e.g., t1=t2=16;
for example, if W > =t1 and/or H > =t2, then no PROF may be applied, e.g. t1=t2=64;
for example, if W > T1 and/or H > T2, then no PROF may be applied, e.g., t1=t2=64;
For example, if w×h > T1, then no PROF may be applied, e.g., t1=64×64;
for example, if w×h > =t1, then no PROF may be applied, e.g. t1=64×64;
for example, if W x H < T1, no pro may be applied, e.g. t1=16 x 16;
for example, if w×h < =t1, then no PROF may be applied, e.g., t1=16×16;
for example, if min (W, H) > = T1, no PROF may be applied, e.g. t1=64;
x. for example, if min (W, H) > T1, then no PROF may be applied, e.g. t1=64;
for example, if max (W, H) <=t1, then no PROF may be applied, e.g., t1=16;
for example, if max (W, H) < T1, then no PROF may be applied, e.g., t1=16;
b. in one example, whether and/or how the PROF is performed may depend on the control point motion vector.
c. In one example, whether and/or how the PROF is performed may depend on affine parameters and/or the number of affine parameters.
i. For a 4-parameter affine model, wherein
Whether and how the PROF is performed may depend on the parameters a and b.
For a 4-parameter affine model, wherein
Whether and how the PROF is performed may depend on the parameters a, b, c and d.
in one example, if the maximum affine parameter is less than (or not greater than) the threshold, then no PROF may be applied.
(i) Alternatively, if all (such as four or six) affine parameters are less than (or not greater than) the threshold, then no PROF may be applied.
(ii) Alternatively, if at least one affine parameter is less than (or not greater than) the threshold, the PROF may not be applied.
in one example, if the maximum value of the absolute value of the affine parameter is less than (or not greater than) the threshold, then no PROF may be applied.
(i) Alternatively, if the absolute values of all affine parameters are less than (or not greater than) the threshold, the PROF may not be applied.
(ii) Alternatively, the PROF can be applied only when at least one of the absolute values of all affine parameters is greater than (or not less than) the threshold.
In one example, if the minimum affine parameter is greater than (or not less than) the threshold, then no PROF may be applied.
(i) Alternatively, if all (such as four or six) affine parameters are greater than (or not less than) the threshold, then no PROF may be applied.
(ii) Alternatively, if at least one affine parameter is greater than (or not less than) the threshold, the PROF may not be applied.
In one example, if the minimum value of the absolute value of the affine parameter is greater than (or not less than) the threshold, then no PROF may be applied.
(i) Alternatively, if the absolute values of all affine parameters are greater than (or not less than) the threshold, the PROF may not be applied.
(ii) Alternatively, the PROF can be applied only when at least one of the absolute values of the affine parameters is smaller (or not larger) than the threshold.
In one example, if the maximum value of the "absolute value" of the delta MV as disclosed by jfet-N0236 is less than (or not greater than) the threshold, then no pro may be applied.
(i) Alternatively, if the "absolute value" of all delta MVs is less than (or not greater than) the threshold value, no pro may be applied.
(ii) Alternatively, the PROF can be applied only when at least one of the "absolute values" of the increment MVs is greater than (or not less than) the threshold.
In one example, if the minimum value of the "absolute value" of the delta MV is greater than (or not less than) the threshold, no PROF may be applied.
(i) Alternatively, if the "absolute value" of all delta MVs is greater (or not less) than the threshold value, no pro may be applied.
(ii) Alternatively, the PROF can be applied only when at least one of the "absolute values" of the increment MVs is greater than (or not less than) the threshold.
in one example, PROF may be applied to certain locations.
(i) For example, if the "absolute value" of the corresponding increment MV of a location is less than (or not greater than) a threshold, the PROF may be applied to that location.
(ii) For example, if the "absolute value" of the corresponding increment MV of a location is greater than (or not less than) a threshold, the PROF may be applied to that location.
In one example, affine parameters may be expressed as integers dHorX, dVerX, dHorY and dVerY with a particular precision as described by JHET-M1001.
In one example, the threshold may depend on the bit depth.
(i) In one example, the threshold may be derived as 1< < BitDepth.
(ii) Further, alternatively, the threshold may depend on whether bi-directional prediction or uni-directional prediction is applied.
a. For example, the threshold may be derived as (1 < < BitDepth) + (Bi-prediction 1: 0).
Whether and/or how the disclosed methods in bullets 11 are applied may depend on the reference picture structure in one example.
(i) For example, if all reference pictures of the current picture are before the current picture in display order, i.e., the POC of all reference pictures is less than the POC of the current picture, one or more of the disclosed methods may not be applied.
(ii) Alternatively, whether and/or how the disclosed method in bullets 11 is applied may depend on the slice/picture type (such as I-slice or B-slice).
(iii) Alternatively, whether and/or how the disclosed method in bullets 11 is applied may depend on the time domain layer.
In bulleted 11, the codec method "PROF" may be replaced by other codec methods to enhance affine prediction codecs, such as interleaved prediction or phase change affine sub-block motion compensation as disclosed by JVET-N0216.
12. It is proposed that phase-change affine sub-block motion compensation such as that proposed in jfet-N0510 can be applied first to get the predicted value, and then the pro f is applied
13. It is proposed that the bit width of any variable used to derive dMvH [ x ] [ y ] and/or dMvV [ x ] [ y ] for any valid x and y cannot exceed a certain number, such as 32.
a. In one example, dMvH [ x ] [ y ] and/or dMvV [ x ] [ y ] are cropped prior to use to derive other dMvH [ t ] [ z ] and/or dMvV [ t ] [ z ], where (t, z) is not equal to (x, y).
b. In one example, dMvH [ x ] [ y ] and/or dMvV [ x ] [ y ] are right shifted before being used to derive other dMvH [ t ] [ z ] and/or dMvV [ t ] [ z ], where (t, z) is not equal to (x, y).
14. It is proposed that dMvH and/or dmv may have the same precision as the stored motion vectors.
a. For example, the number of the cells to be processed,
dMvH[xPos][yPos]=SatShift(dMvH[xPos][yPos],7+M);
dMvV[xPos][yPos]=SatShift(dMvV[xPos][yPos],7+M);
where M is the additional precision with which dMvH and/or hmv are derived, e.g. m=2.
15. It is proposed that clipping of dMvH and/or dmv before use in deriving the predictive refinement Δi may depend on the accuracy of dMvH and/or dmv.
a. For example
dMvH[posX][posY]=Clip3(-2 K-1 ,2 K-1 -1,dMvH[posX][posY]);
dMvV[posX][posY]=Clip3(-2 K-1 ,2 K-1 -1,dMvV[posX][posY]);
Where K depends on the accuracy of dMvH and/or dmv.
b. Alternatively, dMvH [ x ] [ y ] and/or dMvV [ x ] [ y ] are not clipped prior to use in deriving the predictive refinement.
16. It is proposed that the right shift of the predicted refinement Δi (posX, posY) may depend on the sign of Δi (posX, posY).
a. For example, Δi (posX, posY) =satshift (Δi (posX, posY), N), where N is an integer.
17. The clipping proposed for the prediction refinement Δi (posX, posY) may depend on the sample bit depth.
a. For example, Δi (posX, posY) =clip 3 (- (2) 3+BitDepth -1),2 3+BitDpeth -1,ΔI(posX,posY));
18. Whether and/or how deblocking is performed on sub-block boundaries (e.g., inner sub-block boundaries) within an affine mode block may depend on whether interleaving prediction or/and PROF or/and phase change affine sub-block motion compensation is applied to the block as disclosed by JET-N0216. The interlaced prediction includes partitioning a video block into a first set of sub-blocks according to a first mode; partitioning the video block into a second set of sub-blocks according to a second pattern, wherein at least one sub-block in the second set has a different dimension than a sub-block in the first set; and determining a prediction block that is a combination of a first intermediate prediction block generated from the first set of sub-blocks and a second intermediate prediction block generated from the second set of sub-blocks. Thus, using an interleaved prediction technique, a block is divided into sub-blocks having one or more partition modes. The division pattern indicates a manner of dividing a block into sub-blocks, including a size of the sub-block and a position of the sub-block. For each partition mode, a corresponding prediction block may be generated by deriving motion information of each sub-block based on the partition mode. Thus, in some embodiments, multiple prediction blocks may be generated by multiple partition modes, even for one prediction direction. In some embodiments, only one partition mode may be applied for each prediction direction. Thus, the interleaving prediction technique uses different ways of dividing blocks so that motion information can be obtained more robustly without increasing bandwidth consumption.
a. In one example, deblocking may be disabled when interleaved prediction or/and PROF or/and phase change affine sub-block motion compensation is applied to a block.
i. Alternatively, the deblocking filter may be weaker on sub-block boundaries where interleaved prediction or/and PROF or/and phase change affine sub-block motion compensation is applied to the block. For example, the boundary strength may be set smaller on such a boundary.
b. In one example, deblocking may be enabled when interleaved prediction or/and PROF or/and phase change affine sub-block motion compensation is not applied to a block.
19. The switchable interpolation filter may be applied to affine codec blocks.
a. When a specific MV precision is applied (such as 1/2 pixel or 1/4 pixel), a switchable interpolation filter may be applied to the affine codec block.
20. In one example, BDOF is not applied when a switchable interpolation filter is used for a block.
a. In one example, BDOF is not applied to blocks with a particular MV resolution (such as 1/2 pixels).
b. In one example, BDOF is not applied to blocks that select a particular interpolation filter.
21. In one example, when a switchable interpolation filter is used for a block, the PROF is not applied.
a. In one example, the PROF is not applied to blocks having a particular MV resolution (such as 1/2 pixels).
b. In one example, the PROF is not applied to the block that selects the particular interpolation filter.
22. It is proposed to calculate Δv (i, j) of the sub-blocks in the PROF only when the PROF is applied to the sub-blocks.
23. It is proposed to signal a first syntax element (such as a flag named slice _ disable _ bdif _ PROF _ DMVR _ flag) at the slice or picture level (such as in the slice header) to control the use of BDOF, pro and DMVR in the slice.
a. When the first syntax element is not signaled, it is inferred that BDOF, PROF, and DMVR are not all turned off in the stripe.
b. In one example, the first syntax element is signaled only when at least one of BDOF, PROF, and DMVR is enabled at the sequence level (e.g., the corresponding SPS control flag is equal to 1).
c. Alternatively, a second syntax element (such as sps_bdofdmvr_profslice_present_flag in SPS) is signaled at the sequence level to indicate whether the first syntax element should be signaled.
i. In one example, the second syntax element is signaled only when at least one of BDOF, PROF, and DMVR is enabled at the sequence level.
in one example, when the second syntax element is not signaled, it is inferred that the first syntax element is not signaled.
d. In one example, when the first syntax element indicates that BDOF, PROF, and DMVR are all off (e.g., slice_disable_bdofprofdmvr flag is equal to 1), neither BDOF, PROF, nor DMVR can be applied in the current slice (picture).
24. It is proposed to signal a first syntax element (such as a flag named slice _ disable _ PROF _ bdif _ flag) at the slice or picture level (such as in the slice header) to control the use of BDOF and pro in the slice.
a. When the first syntax element is not signaled, it is inferred that BDOF and PROF are not both turned off in the slice.
b. In one example, the first syntax element is signaled only when at least one of BDOF and PROF is enabled at the sequence level (e.g., the corresponding SPS control flag is equal to 1).
c. Alternatively, a second syntax element (such as sps_bdofprofslice_present_flag in SPS) is signaled at the sequence level to indicate whether the first syntax element should be signaled.
i. In one example, the second syntax element is signaled only when at least one of BDOF and PROF is enabled at the sequence level.
in one example, when the second syntax element is not signaled, it is inferred that the first syntax element is not signaled.
d. In one example, when the first syntax element indicates that both BDOF and PROF are off (e.g., slice_disable_bdofprofflag equals 1), neither BDOF nor PROF can be applied in the current slice (picture).
Example 5
5.1 working draft 16x16 filled in from the working draft provided by JVET-O0070.
The working draft is based on JVET-N1001.
Changes in JET-O0070 are in bold and italics. The deleted text is marked with double brackets (e.g., [ [ a ] ] represents the deleted character "a").
The proposed changes areUnderline line
8.5.1 general decoding procedure for codec units for encoding and decoding in inter prediction mode
The inputs to this process are:
a luma position (xCb, yCb) specifying an upper left sample of the current codec block relative to an upper left luma sample of the current picture,
a variable cbWidth specifying the width of the current codec block in the luma samples,
a variable cbHeight specifying the height of the current codec block in the luma samples,
the variable treeType specifies whether a single tree or a double tree is used, and if a double tree is used, whether the current tree corresponds to a luma component or a chroma component.
The output of this process is a modified reconstructed picture before loop filtering.
The derivation of the quantization parameter as specified in clause 8.7.1 is invoked with the luminance location (xCb, yCb), the width cbWidth of the current codec block in the luminance sample, the height cbHeight of the current codec block in the luminance sample, and the variable treeType as inputs.
The decoding process of the codec unit for encoding and decoding in inter prediction mode consists of the following ordered steps:
1. the variable dmvrFlag is set equal to 0.
2. The motion vector component and the reference index of the current codec unit are derived as follows:
if MergeTriangleflag [ xCb ] [ yCb ], inter_af_flag [ xCb ] [ yCb ] and merge_interlock_flag [ xCb ] [ yCb ] are all equal to 0, then the following applies:
the derivation procedure for motion vector components and reference indices as specified in clause 8.5.2.1 is invoked with luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight as inputs and luma motion vectors mvL0[0] [0] and mvL1[0] [0], reference indices refIdxL0 and refIdxL1, and prediction list utilization flags predflag l0[0] [0] and predflag l1[0] [0] and bi-prediction weight index bcwIdx as outputs.
-dmvrFlag is set equal to 1 when all the following conditions are true:
-sps_dmvr_enabled_flag is equal to 1
-general_merge_flag [ xCb ] [ yCb ] is equal to 1
Both predFlagL0[0] [0] and predFlagL1[0] [0] are equal to 1
-mmvd_merge_flag [ xCb ] [ yCb ] equal to 0
DiffPicOrderCnt (currPicList [0] [ refIdxL0 ]) is equal to DiffPicOrderCnt (refPicList [1] [ refIdxL1], currPic
BcWIdx [ xCb ] [ yCb ] is equal to 0
Both the luma_weight_l0_flag [ refidxl0] and the luma_weight_l1_flag [ refidxl1] are equal to 0
-cbWidth is greater than or equal to 8
-cbHeight is greater than or equal to 8
-cbHeight cbWidth greater than or equal to 128
-if dmvrFlag is equal to 1, the following applies:
-for X0 and 1, an ordered two-dimensional array refPicLX of luminance samples L Two ordered two-dimensional arrays refPicLX of chroma samples Cb And refPicLX Cr The composed reference picture is derived by invoking the procedure specified in clause 8.5.6.2 with X and refIdxLX as inputs.
The number of luma coding sub-blocks in horizontal direction numSbX and the number of luma coding sub-blocks in vertical direction numSbY, the sub-block width sbWidth and the sub-block height sbHeight are derived as follows:
numSbX=(cbWidth>16)?(cbWidth>>4):1 (8-240)
numSbY=(cbHeight>16)?(cbHeight>>4):1 (8-241)
sbWidth=(cbWidth>16)?16:cbWidth (8-242)
sbHeight=(cbHeight>16)?16:cbHeight (8-243)
for xsbidx=0..numsbx-1 and ysbidx=0..numsby-1, the following applies:
The luminance motion vector mvLX [ xsbdx ] [ ysbdx ] and the prediction list are derived as follows using the flags predflag lx [ xsbdx ] [ ysbdx ] (where X equals 0 and 1), and the luminance position (xSb [ xsbdx ] [ ysbdx ], ySb [ xsbdx ] [ ysbdx ]) of the upper left sample point of the specified encoded sub-block with respect to the upper left luminance sample point of the current picture:
mvLX[xSbIdx][ySbIdx]=mvLX[0][0] (8-244)
predFlagLX[xSbIdx][ySbIdx]=predFlagLX[0][0] (8-245)
xSb[xSbIdx][ySbIdx]=xCb+xSbIdx*sbWidth (8-246)
ySb[xSbIdx][ySbIdx]=yCb+ySbIdx*sbHeight (8-247)
the decoder-side motion vector refinement procedure specified in clause 8.5.3.1 is in xSb [ xsbdx ]][ySbIdx]、ySb[xSbIdx][ySbIdx]sbWidth, sbHeight motion vector mvLX [ xSbIdx ]][ySbIdx]And a reference picture array refPicLX L As input and with an incremental motion vector dMvLX [ xsbddx][ySbIdx]Invoked as output (where X equals 0 and 1).
The derivation procedure for chroma motion vectors in clause 8.5.2.13 is invoked with mvLX [ xsbdx ] [ ysbdx ] and refIdxLX as inputs and mvCLX [ xsbdx ] [ ysbdx ] as outputs (where X equals 0 and 1).
Otherwise (dmvrFlag equal to 0), the following applies:
when treeType is equal to SINGLE_TREE and predFlagLX [0] [0] (where X is 0 or 1) is equal to 1, the derivation process for chroma motion vectors in clause 8.5.2.13 is invoked with mvLX [0] [0] and refIdxLX as inputs and mvCLX [0] [0] as outputs.
The number of luminance coding sub-blocks in the horizontal direction numSbX and the number of luminance coding sub-blocks in the vertical direction numSbY are both set equal to 1.
Otherwise, if mergertriangueflag [ xCb ] [ yCb ] is equal to 1, inter_affine_flag [ xcb ] [ yCb ] and merge_subband_flag [ xCb ] [ yCb ] are both equal to 0, then the derivation process for the triangle motion vector component and the reference index as specified in clause 8.5.4.1 is invoked with luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight as inputs and with luma motion vectors mvA and mvB, chroma motion vectors mvCA and mvCB, reference indices refIdxA and refIdxB, and prediction list flags predlistfaga and predlistfagab as outputs.
-otherwise (inter_affine_flag [ xCb)][yCb]Or merge_sub_flag [ xCb ]][yCb]Equal to 1) as specified in clause 8.5.5.1 for sub-block motion vector components and reference indicesThe deriving process of the index takes the luminance codec block position (xCb, yCb), the luminance codec block width cbWidth, the luminance codec block height cbHeight as input and takes the reference indexes refIdxL0 and refIdxL1, the number of luminance codec sub-blocks in the horizontal direction numSbX and the number of luminance codec sub-blocks in the vertical direction numSbY, the prediction list utilization flag predflag lx [ xsbddx ]][ySbIdx]Luminance motion vector array mvLX [ xsbddx ] ][ySbIdx]Chrominance motion vector array mvCLX [ xsbddx ]][ySbIdx](wherein xsbidx=0. (cbWidth>>2) -1 and ysbdx=0. (cbHeight>>2) -1, and wherein X is 0 or 1), [ [ and ]]]Bidirectional predictive weight index bcwIdx,Invoked as output.
3. The arrays refMvLX [ xSbIdx ] [ ySbIdx ] and refMvCLX [ xSbIdx ] [ ySbIdx ] (where X is 0 and 1) of luminance and chrominance motion vectors after refinement of the decoder side motion vectors are derived as follows for xsbidx=0..numsbx-1, ysbidx=0..numsby-1:
-if dmvrFlag is equal to 1, the derivation procedure for chrominance motion vectors in clause 8.5.2.13 is invoked with refMvLX [ xsbdx ] [ ysbdx ] and refIdxLX as inputs and refMvCLX [ xsbdx ] [ ysbdx ] as outputs, and the inputs refMvLX [ xsbdx ] [ ysbdx ] are derived as follows:
refMvLX[xSbIdx][ySbIdx]=mvLX[xSbIdx][ySbIdx]+dMvLX[xSbIdx][ySbIdx] (8-248)
refMvLX[xSbIdx][ySbIdx][0]=Clip3(-2 17 ,2 17 -1,refMvLX[xSbIdx][ySbIdx][0]) (8-249)
refMvLX[xSbIdx][ySbIdx][1]=Clip3(-2 17 ,2 17 -1,refMvLX[xSbIdx][ySbIdx][1]) (8-250)
otherwise (dmvrFlag equal to 0), the following applies:
refMvLX[xSbIdx][ySbIdx]=mvLX[xSbIdx][ySbIdx] (8-251)
refMvCLX[xSbIdx][ySbIdx]=mvCLX[xSbIdx][ySbIdx] (8-252)
note that-the array refMvLX is stored in MvDmvrLX and used in the derivation process of the collocated motion vector in clause 8.5.2.12. The array of non-refined luminance motion vectors MvLX is used in spatial motion vector prediction and deblocking boundary strength derivation.
4. The prediction samples of the current codec unit are derived as follows:
if MergeTriangleflag [ xCb ] [ yCb ] is equal to 0, the prediction samples of the current codec unit are derived as follows:
The decoding procedure for inter blocks as specified in clause 8.5.6.1 is with luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight, number of luma codec sub-blocks in horizontal direction numSbX and number of luma codec sub-blocks in vertical direction numSbY, luma motion vector mvL0[ xsbsidx ]][ySbIdx]And mvL1[ xSbIdx ]][ySbIdx]And a refined luminance motion vector refMvL0 xsbddx][ySbIdx]And refMvL1[ xSbIdx ]][ySbIdx](where xsbidx=0..numsbx-1 and ysbidx=0..numsby-1), reference indices refIdxL0 and refIdxL1, prediction list utilization flag predflag l0[ xSbIdx ]][ySbIdx]And predFlagL1[ xSbIdx ]][ySbIdx]Bidirectional predictive weight index bcwIdx, [ [ and ]]]A variable cIdx set equal to 0,(cbWidth) x (cbHeight) array predSamples as input and as predicted luminance samples L Inter prediction samples (predSamples) are invoked as output.
The decoding procedure for inter blocks as specified in clause 8.5.6.1 is with luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight, number of luma codec sub-blocks in horizontal direction numSbX and number of luma codec sub-blocks in vertical direction numSbY, chroma motion vector mvCL0[ xsbdx ] ][ySbIdx]And mvCL1[ xSbIdx ]][ySbIdx]And a refined chrominance motion vector refMvCL0[ xSbIdx ]][ySbIdx]And refMvCL1[ xSbIdx ]][ySbIdx](where xsbidx=0..numsbx-1 and ysbidx=0..numsby-1), reference indices refIdxL0 and refIdxL1, prediction list utilization flag predflag l0[ xSbIdx ]][ySbIdx]And predFlagL1[ xSbIdx ]][ySbIdx]Bidirectional predictive weight index bcwIdx, [ [ and ]]]Variable cIdx set to 1, and motion vector difference array diffMv as input and with (cbWidth/2) x (cbHeight/2) array predSamples as predicted chroma samples for chroma component Cb Cb Inter prediction samples (predSamples) are invoked as output.
The decoding procedure for inter blocks as specified in clause 8.5.6.1 is with luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight, number of luma codec sub-blocks in horizontal direction numSbX and number of luma codec sub-blocks in vertical direction numSbY, chroma motion vector mvCL0[ xsbdx ]][ySbIdx]And mvCL1[ xSbIdx ]][ySbIdx]And a refined chrominance motion vector refMvCL0[ xSbIdx ]][ySbIdx]And refMvCL1[ xSbIdx ]][ySbIdx](where xsbidx=0..numsbx-1 and ysbidx=0..numsby-1), reference indices refIdxL0 and refIdxL1, prediction list utilization flag predflag l0[ xSbIdx ] ][ySbIdx]And predFlagL1[ xSbIdx ]][ySbIdx]Bidirectional predictive weight index bcwIdx, [ [ and ]]]The variable cIdx set to 2,As input and in (cbWidth/2) x (cbHeight/2) array predSamples as predicted chroma samples for chroma component Cr Cr Inter prediction samples (predSamples) are invoked as output.
Otherwise (MergeTriangleflag [ xCb)][yCb]Equal to 1), the decoding process for triangle inter blocks as specified in clause 8.5.7.1 is with luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight, luma motion vectors mvA and mvB, chroma motion vectors mvCA and mvCB, reference indices refIdxA and refIdxB, and prediction list flags predlistfrag and predlistfrag as inputs and with (cbWidth) x (cbHeight) array predsamps as prediction luma samples L And two (cbWidth/2) x (cbHeight/2) arrays predSamples for predicting chroma samples Cb And predSamples Cr Inter prediction samples (predSamples), one for each of the chrominance components Cb and Cr, are invoked as output.
5. The variables NumSbX [ xCb ] [ yCb ] and NumSbY [ xCb ] [ yCb ] are set equal to numSbX and numSbY, respectively.
6. The residual samples of the current codec unit are derived as follows:
The decoding procedure of the residual signal of the codec block for the codec in inter prediction mode as specified in clause 8.5.8 is with the position (xTb 0, yTb 0) set equal to the luminance position (xCb, yCb), the width nTbW set equal to the luminance codec block width cbWidth, the height nTbH set equal to the luminance codec block height cbHeight and the variable cIdx set equal to 0 as inputs and with the array resSamples L Invoked as output.
The decoding procedure of the residual signal of the codec block for the codec in inter prediction mode as specified in clause 8.5.8 is input with a position (xTb, yTb 0) set equal to the chroma position (xCb/2, ycb/2), a width nTbW set equal to the chroma codec block width cbWidth/2, a height nTbH set equal to the chroma codec block height cbHeight/2, and a variable cIdx set equal to 1 and with an array resSamples Cb Invoked as output.
The decoding procedure of the residual signal of the codec block for the codec in inter prediction mode as specified in clause 8.5.8 is inputted with a position (xTb, yTb 0) set equal to the chroma position (xCb/2, ycb/2), a width nTbW set equal to the chroma codec block width cbWidth/2, a height nTbH set equal to the chroma codec block height cbHeight/2, and a variable cIdx set equal to 2 and is inputted with an array resSamples Cr Invoked as output.
7. The reconstructed samples of the current codec unit are derived as follows:
the picture reconstruction procedure for color components as specified in clause 8.7.5 is with a block position (xB, yB) set equal to (xCb, yCb), a block width bWidth set equal to cbWidth, a block height bhight set equal to cbHeight, a variable cIdx set equal to 0, a block width set equal to predSamples L (cbWidth) x (cbHeight) array predSamples and set equal to resSamples L Is called (cbWidth) x (cbHeight) array resSamples as inputFor example, the output is a modified reconstructed picture prior to loop filtering.
The picture reconstruction procedure for color components as specified in clause 8.7.5 is with a block position (xB, yB) set equal to (xCb/2, ycb/2), a block width bWidth set equal to cbWidth/2, a block height bHeight set equal to cbHeight/2, a variable cIdx set equal to 1, a variable cIdx set equal to predSamples Cb (cbWidth/2) x (cbHeight/2) array predSamples and set equal to resSamples Cb Is called as input, and the output is a modified reconstructed picture before loop filtering.
The picture reconstruction procedure for color components as specified in clause 8.7.5 is invoked with the block position (xB, yB) set equal to (xCb/2, ycb/2), the block width bWidth set equal to cbWidth/2, the block height bHeight set equal to cbHeight/2, the variable cIdx set equal to 2, the (cbWidth/2) x (cbHeight/2) array predSamples set equal to predsamplecr, and the (cbWidth/2) x (cbHeight/2) array ressamplecr set equal to ressamplecr as inputs, and the output is a modified reconstructed picture prior to loop filtering.
8.5.5 derivation procedure for sub-block motion vector component and reference index
8.5.5.1 general purpose
The inputs to this process are:
a luminance position (xCb, yCb) of the current luminance codec block relative to an upper left sample of the upper left luminance sample of the current picture,
a variable cbWidth specifying the width of the current codec block in the luma samples,
the variable cbHeight specifies the height of the current codec block in the luma samples.
The output of this process is:
reference indices refIdxL0 and refIdxL1,
the number of luma codec sub-blocks in the horizontal direction numSbX and the number of luma codec sub-blocks in the vertical direction numSbY,
The prediction list uses the flag arrays predFlagL0[ xSbIdx ] [ ySbIdx ] and predFlagL1[ xSbIdx ] [ ySbIdx ], where xSbIdx=0..numSbX-1, ySbIdx=0..numSbX-1,
luminance sub-block motion vector arrays mvL0[ xsbdx ] [ ysbdx ] and mvL1[ xsbdx ] [ ysbdx ], where xsbdx=0..numsbx-1, ysbdx=0..numsby-1,
a 1/32 fractional-pel precision chroma subblock motion vector array mvCL0[ xsbdx ] [ ysbdx ] and mvCL1[ xsbdx ] [ ysbdx ], where xsbdx=0..numsbx-1, ysbdx=0..numsby-1,
bi-directional prediction weight index bcwodx.
For the derivation of the variables mvL0[ xsbdx ] [ ysbdx ], mvL1[ xsbdx ] [ ysbdx ], mvCL0[ xsbdx ] [ ysbdx ] and mvCL1[ xsbdx ] [ ysbdx ], refIdxL0, refIdxL1, numSbX, numSbY, predFlagL0[ xsbdx ] [ ysbdx ] and predflag 1[ xsbdx ] [ ysbdx ], the following applies:
if Merge _ sub _ flag xCb, yCb is equal to 1, the derivation process for motion vectors and reference indices in sub-block Merge mode as specified in 8.5.5.2 takes as input the luma codec block position (xCb, yCb), luma codec block width cbWidth and luma codec block height cbHeight, the number of luminance coding sub-blocks numSbX in the horizontal direction and the number of luminance coding decoding sub-blocks numSbY in the vertical direction, the reference indices refIdxL0, refIdxL1, the prediction list are called with flag arrays predFlagL0[ xSbIdx ] [ ySbIdx ] and predFlagL1[ xSbIdx ] [ ySbIdx ], the luminance sub-block motion vector arrays mvL0[ xSbIdx ] [ ySbIdx ] and mvL0[ xSbIdx ] [ ySbIdx ], and the chrominance sub-block motion vector arrays mvCL0[ xSbIdx ] [ ySbIdx ] and mvCL1[ xSbIdx ] [ ySbIdx ] (wherein xSbIdx=0.numSbX-1, ySbIdx=0..numSbY-1), and the bi-directional prediction weight index.
Otherwise (merge_interlock_flag [ xCb ] [ yCb ] equals 0), for X replaced by 0 or 1 in variables predFlagLX, cpMvLX, mvdCpLX and refIdxLX, in pred_lx and in syntax element ref_idx_lx, the following sequential steps apply:
-derivation of the number of control point motion vectors numCpMv, control point motion vectors cpMvLX [ cpIdx ] (where cpIdx ranges from 0 to numCpMv-1), refIdxLX, predFlagLX [0] [0], the following applies:
1. the number of control point motion vectors numCpMv is set equal to motioncodeldicc [ xCb ] [ yCb ] +1.
2. The variables refIdxLX and predflag lx are derived as follows:
if inter PRED idc xCb yCb equals PRED LX or PRED BI,
refIdxLX=ref_idx_lX[xCb][yCb] (8-457)
predFlagLX[0][0]=1 (8-458)
otherwise, the variables refIdxLX and predflag lx are specified by:
refIdxLX=-1 (8-459)
predFlagLX[0][0]=0 (8-460)
3. the variable mvdCPLX [ cpIdx ] (where cpIdx ranges from 0 to numCpMv-1) is derived as follows:
mvdCpLX[cpIdx][0]=MvdCpLX[xCb][yCb][cpIdx][0] (8-461)
mvdCpLX[cpIdx][1]=MvdCpLX[xCb][yCb][cpIdx][1] (8-462)
4. when predflag lx [0] [0] is equal to 1, the derivation process for luma affine control point motion vector predictors as specified in clause 8.5.5.7 is invoked with luma codec block positions (xCb, yCb), and variables cbWidth, cbHeight, refIdxLX, and the number of control point motion vectors numCpMv as inputs, and the output is mvpcpllx [ cpIdx ], where cpIdx ranges from 0 to numCpMv-1.
5. When predflag LX [0] [0] is equal to 1, the luminance motion vector cpmvLX [ cpIdx ] (where cpIdx ranges from 0 to NumCpMv-1) is derived as follows:
uLX[cpIdx][0]=(mvpCpLX[cpIdx][0]+mvdCpLX[cpIdx][0]+2 18 )%2 18 (8-463)
cpMvLX[cpIdx][0]=(uLX[cpIdx][0]>=2 17 )?(uLX[cpIdx][0]-2 18 ):uLX[cpIdx][0] (8-464)
uLX[cpIdx][1]=(mvpCpLX[cpIdx][1]+mvdCpLX[cpIdx][1]+2 18 )%2 18 (8-465)
cpMvLX[cpIdx][1]=(uLX[cpIdx][1]>=2 17 )?(uLX[cpIdx][1]-2 18 ):uLX[cpIdx][1] (8-466)
variables numSbX and numSbY are derived as follows:
numSbX=(cbWidth>>2) (8-467)
numSbY=(cbHeight>>2) (8-468)
for xsbidx=0..numsbx-1, ysbidx=0..numsby-1, the following applies:
predFlagLX[xSbIdx][ySbIdx]=predFlagLX[0][0] (8-469)
when predFlagLX [0]][0]When equal to 1, the derivation procedure for the motion vector array from affine control point motion vector as specified in sub-clause 8.5.5.9 is to use the luminance codec block position (xCb, yCb), the luminance codec block width cbWidth, the luminance prediction block height cbHeight, the number of control point motion vectors numCpMv, the control point motion vector cpMvLX [ cpIdx ]](cpIdx is 0..2), a reference index refIdxLX, and the number of luminance coding sub-blocks in the horizontal direction numSbX and the number of luminance coding sub-blocks in the vertical direction numSbY as inputs, with a luminance motion vector array mvLX [ xsbddx ]][ySbIdx](s) [ [ s ]]]Chrominance motion vector array mvCLX [ xsbdx ]][ySbIdx](where xsbidx=0..numsbx-1, ysbidx=0..numsby-1),Invoked as output.
The bi-prediction weight index bcwIdx is set equal to bcw _idx [ xCb ] [ yCb ].
8.5.5.9 Process for deriving a motion vector array of motion vectors from affine control points
The inputs to this process are:
a luminance position (xCb, yCb) of the current luminance codec block relative to an upper left sample of the upper left luminance sample of the current picture,
two variables cbWidth and cbHeight, specifying the width and height of the luma codec block,
the number of control point motion vectors numCpMv,
control point motion vector cpMvLX [ cpIdx ], where cpidx=0..numcpmv-1 and X is 0 or 1,
a reference index refIdxLX and X is 0 or 1,
the number of luma coding sub-blocks in the horizontal direction numSbX and the number of luma coding sub-blocks in the vertical direction numSbY.
The output of this process is:
-an luma sub-block motion vector array mvLX [ xSbIdx ] [ ySbIdx ], wherein
xsbidx=0..numsbx-1, ysbidx=0..numsby-1, and X is 0 or 1,
-a chroma sub-block motion vector array mvCLX [ xSbIdx ] [ ySbIdx ], wherein xSbIdx = 0..numsbx-1, ySbIdx = 0..numsby-1, and X is 0 or 1.
For x=xcb.. xCb +cbwidth-1 and y=ycb.. yCb +cbheight-1, the following assignments are made:
CpMvLX[x][y][0]=cpMvLX[0] (8-666)
CpMvLX[x][y][1]=cpMvLX[1] (8-667)
CpMvLX[x][y][2]=cpMvLX[2] (8-668)
the variables log2CbW and log2CbH were derived as follows:
log2CbW=Log2(cbWidth) (8-669)
log2CbH=Log2(cbHeight) (8-670)
variables mvScaleHor, mvScaleVer, dHorX and dVerX are derived as follows:
mvScaleHor=cpMvLX[0][0]<<7 (8-671)
mvScaleVer=cpMvLX[0][1]<<7 (8-672)
dHorX=(cpMvLX[1][0]-cpMvLX[0][0])<<(7-log2CbW) (8-673)
dVerX=(cpMvLX[1][1]-cpMvLX[0][1])<<(7-log2CbW) (8-674)
the variables dHorY and dVerY are derived as follows:
-if numCpMv is equal to 3, the following applies:
dHorY=(cpMvLX[2][0]-cpMvLX[0][0])<<(7-log2CbH) (8-675)
dVerY=(cpMvLX[2][1]-cpMvLX[0][1])<<(7-log2CbH) (8-676)
Otherwise (numCpMv equal to 2), the following applies:
dHorY=-dVerX (8-677)
dVerY=dHorX (8-678)
the variable fallback modetrigged is set equal to 1 and is modified as follows:
variable bxWX 4 、bxHX 4 、bxWX h 、bxHX h 、bxWX v And bxHX v Is deduced as follows:
maxW 4 =Max(0,Max(4*(2048+dHorX),Max(4*dHorY,4*(2048+dHorX)+4*dHorY))) (8-679)
minW 4 =Min(0,Min(4*(2048+dHorX),Min(4*dHorY,4*(2048+dHorX)+4*dHorY))) (8-680)
maxH 4 =Max(0,Max(4*dVerX,Max(4*(2048+dVerY),4*dVerX+4*(2048+dVerY)))) (8-681)
minH 4 =Min(0,Min(4*dVerX,Min(4*(2048+dVerY),4*dVerX+4*(2048+dVerY)))) (8-682)
bxWX 4 =((maxW 4 -minW 4 )>>11)+9 (8-683)
bxHX 4 =((maxH 4 -minH 4 )>>11)+9 (8-684)
bxWX h =((Max(0,4*(2048+dHorX))-Min(0,4*(2048+dHorX)))>>11)+9 (8-685)
bxHX h =((Max(0,4*dVerX)-Min(0,4*dVerX))>>11)+9 (8-686)
bxWX v =((Max(0,4*dHorY)-Min(0,4*dHorY))>>11)+9 (8-687)
bxHX v =((Max(0,4*(2048+dVerY))-Min(0,4*(2048+dVerY)))>>11)+9 (8-688)
-if inter predidc idc xCb][yCb]Equal to PRED_BI and bxWX 4 *bxHX 4 Less than or equal to 225, then fallbacksmodetrigged is set equal to 0.
Otherwise, if bxWX h *bxHX h Less than or equal to 165, and bxWX v *bxHX v Less than or equal to 165, then fallbacksmodetrigged is set equal to 0.
For xsbidx=0..numsbx-1 and ysbidx=0..numsby-1, the following applies:
the variables xPosCb and yPosCb are derived as follows
-if fallback modetrigged is equal to 1, the following applies:
xPosCb=(cbWidth>>1) (8-689)
yPosCb=(cbHeight>>1) (8-690)
otherwise (fallback modetrigged equals 0), the following applies:
xPosCb=2+(xSbIdx<<2) (8-691)
yPosCb=2+(ySbIdx<<2) (8 692)
the luminance motion vector mvLX [ xsbdx ] [ ysbdx ] is derived as follows:
mvLX[xSbIdx][ySbIdx][0]=(mvScaleHor+dHorX*xPosCb+dHorY*yPosCb) (8-693)
mvLX[xSbIdx][ySbIdx][1]=(mvScaleVer+dVerX*xPosCb+dVerY*yPosCb) (8-694)
the rounding procedure for motion vectors as specified in clause 8.5.2.14 is invoked with mvX set equal to mvLX [ xSbIdx ] [ ySbIdx ], lightshift set equal to 7, and leftShift set equal to 0 as inputs and rounded mvLX [ xSbIdx ] [ ySbIdx ] as outputs.
The motion vector mvLX [ xsbdx ] [ ysbdx ] is clipped as follows:
mvLX[xSbIdx][ySbIdx][0]=Clip3(-2 17 ,2 17 -1,mvLX[xSbIdx][ySbIdx][0]) (8-695)
mvLX[xSbIdx][ySbIdx][1]=Clip3(-2 17 ,2 17 -1,mvLX[xSbIdx][ySbIdx][1]) (8-696)
for xsbidx=0..numsbx-1 and ysbidx=0..numsby-1, the following applies:
The average luminance motion vector mvAvgLX is derived as follows:
mvAvgLX=mvLX[(xSbIdx>>1<<1)][(ySbIdx>>1<<1)]+mvLX[(xSbIdx>>1<<1)+1][(ySbIdx>>1<<1)+1] (8-697)
mvAvgLX[0]=(mvAvgLX[0]+1-(mvAvgLX[0]>=0))>>1 (8-698)
mvAvgLX[1]=(mvAvgLX[1]+1-(mvAvgLX[1]>=0))>>1 (8-699)
the derivation procedure for chroma motion vectors in clause 8.5.2.13 is invoked with mvAvgLX and refIdxLX as inputs and chroma motion vectors mvCLX [ xsbdx ] [ ysbdx ] as outputs.
Thus, four 2×2 chroma sub-blocks (4×4 chroma blocks) share the same motion vector derived from the average of the two 4×4 luma sub-block motion vectors. In the decoding process, motion compensation is still performed on the 2×2 chroma block, however this is motion compensation on the chroma 4×4 block because all chroma MVs inside the 4×4 chroma block are the same. Affine chroma MC is performed on 4 x 4 chroma blocks.
8.5.6 decoding procedure for inter blocks
8.5.6.1 general purpose
This procedure is invoked when decoding a codec unit that is coded in inter prediction mode. The inputs to this process are:
a luma position (xCb, yCb) specifying an upper left sample of the current codec block relative to an upper left luma sample of the current picture,
a variable cbWidth specifying the width of the current codec block in the luma samples,
a variable cbHeight specifying the height of the current codec block in the luma samples,
the variables numSbX and numSbY specify the number of luma codec sub-blocks in the horizontal direction and the number of luma codec sub-blocks in the vertical direction,
Motion vectors mvL0[ xsbdx ] [ ysbdx ] and mvL1[ xsbdx ] [ ysbdx ], where xsbdx=0..numsbx-1 and ysbdx=0..numsby-1,
the refined motion vectors refMvL0[ xSbIdx ] [ ySbIdx ] and refMvL1[ xSbIdx ] [ ySbIdx ], where xSbIdx=0..numSbX-1 and ySbIdx=0..numSbY-1,
reference indices refIdxL0 and refIdxL1,
the prediction list uses the flags predflag l0[ xsbdx ] [ ysbdx ] and predflag l1[ xsbdx ] [ ysbdx ], where xsbdx=0..numsbx-1 and ysbdx=0..numsby-1,
bi-directional prediction weight index bcwodx,
the variable cIdx specifies the color component index of the current block.
The output of this process is:
array predSamples of predicted samples.
Suppose predSamplesL0 L 、predSamplesL1 L And predsamples intra L (cbWidth) x (cbHeight) array for predicting luminance sample value, and predSamplesL0 Cb 、predSamplesL1 Cb 、predSamplesL0 Cr predSamplesL1 Cr 、predSamplesIntra Cb And predsamples intra Cr (cbWidth/2) x (cbHeigh) for predicting chroma-sample valuet/2) array.
The variable currPic specifies the current picture and the variable bdofFlag is derived as follows:
-bdofFlag is set equal to TRUE if all the following conditions are TRUE.
-sps_bdofenabled_flag is equal to 1.
Both predFlagL0[ xSbIdx ] [ ySbIdx ] and predFlagL1[ xSbIdx ] [ ySbIdx ] are equal to 1.
DiffPicOrderCnt (currPic, refPicList [0] [ refIdxL0 ]) is less than 0.
Motionmodel idc [ xCb ] [ yCb ] is equal to 0.
-merge_sub_flag [ xCb ] [ yCb ] is equal to 0.
Sym_mvd_flag [ xCb ] [ yCb ] is equal to 0.
BcWIdx [ xCb ] [ yCb ] is equal to 0.
Both the luma_weight_l0_flag [ refidxl0] and the luma_weight_l1_flag [ refidxl1] are equal to 0.
-cbHeight is greater than or equal to 8
-cIdx is equal to 0.
Otherwise, bdofFlag is set equal to FALSE.
-if numSbY is equal to 1 and numSbX is equal to 1, the following applies:
when bdofFlag is equal to TRUE, variable numSbY, numSbX is modified as follows:
numSbX=(cbWidth>16)?(cbWidth>>4):1 (8700)
numSbY=(cbHeight>16)?(cbHeight>>4):1 (8-701)
for x=0..1, xsbdx=0..numsbx-1 and ysbdx=0..numsby-1, the following applies:
PredFlagLX [ xSbIdx ] [ ySbIdx ] is set equal to predFlagLX [0] [0].
refMvLX [ xsbdx ] [ ysbdx ] is set equal to refMvLX [0] [0].
-mvLX [ xsbdx ] [ ysbdx ] is set equal to mvLX [0] [0].
The width sbWidth and height sbHeight of the current encoded sub-block in the luminance samples are derived as follows:
sbWidth=cbWidth/numSbX (8-702)
sbHeight=cbHeight/numSbY (8-703)
if inter_affine_flag [ xCb ]][yCb]Equal to 1, cidx equal to 1, and profFlag equal to TRUE, the following applies:
-for the sub-block index (xSbIdx, ySbIdx) (where xsbidx=0..numsbx-1 and ysbidx=0.) Each coding sub-block at numSbY-1), the following applies:
-specifying luminance bits of an upper left sample of a current encoded sub-block relative to an upper left luminance sample of a current picture The (xSb, ySb) is derived as follows:
(xSb,ySb)=(xCb+xSbIdx*sbWidth,yCb+ySbIdx*sbHeight)(8-704)
-for each of X0 and 1, when predFlagLX [ xSbIdx ]][ySbIdx]When equal to 1, the following applies:
- L from an ordered two-dimensional array refPicLX of luminance samples and two ordered two-dimensional arrays of chrominance samples Cb Cr The reference pictures composed of refPicLX and refPicLX are called under the clauses by taking X and refIdxLX as input 8.5.6.2.
-Array padddedsamplesX [ xSbIdx ]][ySbIdx]By using brightness positions (xSb, ySb), brightness samples In (3) coding sub-block width sbWidth, coding sub-block height sbHeight, luminance motion vector offset (0, 0), refinement L Luminance motion vector refMvLX [ xSbIdx ]][xSbIdx]Reference arrays refPicLX, bdofFlag and cIdx as inputs Invoking the fractional sample interpolation process specified in clause 8.5.6.3.
- L notPaddedSamplesX[x+xSb][y+ySb]Is set equal to predsamplesLX [ x+1 ]][y+1], Where x=0,..sbwidth-1, y=0,..sbheight-1)
For each encoded sub-block at the sub-block index (xSbIdx, ySbIdx) (where xsbidx=0..numsbx-1 and ysbidx=0..numsby-1), the following applies:
-specifying the luminance position (xSb, ySb) of the current encoded sub-block relative to the upper left luminance sample of the current picture is derived as follows:
(xSb,ySb)=(xCb+xSbIdx*sbWidth,yCb+ySbIdx*sbHeight) (8-704)
-for each of X0 and 1, when predflag lx [ xsbdx ] [ ysbdx ] is equal to 1, the following applies:
-from an ordered two-dimensional array refPicLX of luminance samples L Two ordered two-dimensional arrays refPicLX of chroma samples Cb And refPicLX Cr The composed reference picture is derived by invoking the procedure specified in clause 8.5.6.2 with X and refIdxLX as inputs.
The motion vector offset mvOffset is set equal to refMvLX [ xsbdx ] -mvLX [ xsbdx ] [ ysbdx ].
-mvOffset [0] is set equal to 0 when one or more of the following conditions are true:
-xSb is not equal to xCb and mvOffset [0] is less than 0
- (xSb+sbWidth) is not equal to (xCb +cbWidth), and mvOffset [0] is greater than 0
-mvOffset [1] is set equal to 0 when one or more of the following conditions are true:
ySb is not equal to yCb and mvOffset [1] is less than 0
- (ySb +sbHeight) is not equal to (yCb +cbHeight), and mvOffset [1] is greater than 0
-if cIdx is equal to 0, the following applies:
- L otherwise, array predsamplesLX is encoded by encoding the sub-block width sbWidth in luminance samples The decode sub-block height sbHeight, the codec block width cbWidth, the codec block height cbHeight, the luminance position (xSb, ySb), sample array padddedsamplesX [ xSbIdx ]][ySbIdx]A sampling array notPaddedSamplesX,Brightness fortune Motion vector offset mvOffset and refined luminance motion vector refMvLX [ xsbddx ]][xSbIdx]Reference array L refPicLX, bdofFlag and cIdx are pushed as input calls to the fractional sample interpolation process specified in clause 8.5.6.3 And (3) guiding.
Otherwise, if cIdx is equal to 1, the following applies:
array predSamplesLX Cb By shifting the chroma motion vector by mvOffset, refined chroma motion vector refMvLX [ xsbddx ] with luminance position (xSb, ySb), coding sub-block width sbWidth/2, coding sub-block height sbHeight/2][xSbIdx]Reference array refPicLX Cb bdofFlag and cIdx are derived as inputs to invoke the fractional sample interpolation process specified in clause 8.5.6.3.
Otherwise (cIdx equal to 2), the following applies:
array predSamplesLX Cr By shifting the chroma motion vector by mvOffset, refined chroma motion vector refMvLX [ xsbddx ] with luminance position (xSb, ySb), coding sub-block width sbWidth/2, coding sub-block height sbHeight/2][xSbIdx]Reference array refPicLX Cr bdofFlag and cIdx are derived as inputs to invoke the fractional sample interpolation process specified in clause 8.5.6.3.
-if bdofFlag is equal to TRUE, the following applies:
the variable shift is set equal to Max (2, 14-BitDepth Y )。
Variables sbDiffThres, bdofBlkDiffThres and sbSumDiff are derived as follows:
sbDiffThres=(1<<(BitDepth Y -8+shift))*sbWidth*sbHeight (8-705)
bdofBlkDiffThres=1<<(BitDepth Y -3+shift) (8-706)
sbSumDiff=0 (8-707)
for xidx=0.(sbWidth > > 2) -1 and yidx=0.(sbHeight > > 2) -1, the variables bdofBlkSumDiff and the bi-directional optical flow utilization flag bdofUtilizationFlag [ xIdx ] [ yIdx ] are derived as follows:
bdofUtilizationFlag[xIdx][yIdx]=bdofBlkSumDiff>=bdofBlkDiffThres (8-709)
sbSumDiff+=bdofBlkSumDiff (8-710)
the variable sbBdofFlag is derived as follows:
-if sbSumDiff is smaller than sbDiffThres, sbBdofFlag is set equal to FALSE.
Otherwise, sbBdofFlag is set equal to TRUE.
The array predSamples of predicted samples is derived as follows:
-prediction samples predSamples [ x ] inside the current luma codec sub-block if cIdx is equal to 0 L +xSb][y L +ySb](wherein x L sbWidth-1 and y =0. L sbHeight-1) is derived as follows:
if sbBdofFlag is equal to TRUE, then the bi-directional optical flow sample prediction process as specified in clause 8.5.6.4 is with nCbW set equal to luma coding sub-block width sbWidth, nCbH set equal to luma coding sub-block height sbHeight, and sample array predSamplesL0 L And predSamplesL1 L And the variable predFlagL0[ xSbIdx ]][ySbIdx]、predFlagL1[xSbIdx][ySbIdx]refIdxL0, refIdxL1 and bdofulizationflag [ xIdx ]][yIdx](wherein xidx=0. (sbWidth>>2)-1,yIdx=0..(sbHeight>>2) -1) as input and predSamples [ x ] L +xSb][y L +ySb]Invoked as output.
Otherwise (sbBdofFlag equals FALSE), the weighted sample prediction procedure as specified in clause 8.5.6.5 is with luma codec sub-block width sbWidth, luma codec sub-block height sbHeight, and sample array predSamplesL0 L And predSamplesL1 L And the variable predFlagL0[ xSbIdx ]][ySbIdx]、predFlagL1[xSbIdx][ySbIdx]refIdxL0, refIdxL1, bcwIdx and cIdx as inputs and predSamples [ x ] L +xSb][y L +ySb]Invoked as output.
Otherwise, if cIdx is equal to 1, the prediction samples predSamples [ x ] inside the current chroma component Cb codec sub-block C +xSb/2][y C +ySb/2](wherein x C =0..sbwidth/2-1 and y C sbHeight/2-1) is obtained by sampling the sample array predsamplesL0 with nCbW set equal to sbWidth/2, nCbH set equal to sbHeight/2 Cb And predSamplesL1 Cb And the variable predFlagL0[ xSbIdx ]][ySbIdx]、predFlagL1[xSbIdx][ySbIdx]refIdxL0, refIdxL1, bcwIdx, and cIdx are derived as input calls to the weighted sample prediction process specified in clause 8.5.6.5.
Otherwise (cIdx equals 2), the prediction samples predSamples [ x ] inside the current chroma component Cr codec sub-block C +xSb/2][y C +ySb/2](wherein x C =0..sbwidth/2-1 and y C sbHeight/2-1) is obtained by sampling the sample array predsamplesL0 with nCbW set equal to sbWidth/2, nCbH set equal to sbHeight/2 Cr And predSamplesL1 Cr And the variable predFlagL0[ xSbIdx ]][ySbIdx]、predFlagL1[xSbIdx][ySbIdx]refIdxL0, refIdxL1, bcwIdx, and cIdx are derived as input calls to the weighted sample prediction process specified in clause 8.5.6.5.
-when cIdx is equal to 0, for x=0..sbwidth-1 and y=0..sbheight-1, the following assignments are made:
MvL0[xSb+x][ySb+y]=mvL0[xSbIdx][ySbIdx] (8-711)
MvL1[xSb+x][ySb+y]=mvL1[xSbIdx][ySbIdx] (8-712)
MvDmvrL0[xSb+x][ySb+y]=refMvL0[xSbIdx][ySbIdx] (8-713)
MvDmvrL1[xSb+x][ySb+y]=refMvL1[xSbIdx][ySbIdx] (8-714)
RefIdxL0[xSb+x][ySb+y]=refIdxL0 (8-715)
RefIdxL1[xSb+x][ySb+y]=refIdxL1 (8-716)
PredFlagL0[xSb+x][ySb+y]=predFlagL0[xSbIdx][ySbIdx] (8-717)
PredFlagL1[xSb+x][ySb+y]=predFlagL1[xSbIdx][ySbIdx] (8-718)
BcwIdx[xSb+x][ySb+y]=bcwIdx (8-719)
when ciip_flag [ xCb ] [ yCb ] equals 1, the array predSamples of predicted samples is modified as follows:
-if cIdx is equal to 0, the following applies:
the general intra sample prediction procedure as specified in clause 8.4.5.2.5 is set equal to IntraPredModeY [ xCb ] with the position (xTbClp, yTbClp) set equal to (xCb, yCb)][yCb]Is called as input, and the output is assigned to (cbWidth) x (cbHeight) array predsamples intra L
The weighted sample prediction procedure for combined Merge and intra prediction as specified in clause 8.5.6.6 is performed with the position (xTbCmp, yTbCmp) set equal to (xCb, yCb), the codec block width cbWidth, the codec block height cbHeight, set equal to predSamples and predSamples intra, respectively L Is set equal to IntraPredModeY [ xCb ]][yCb]The intra prediction mode predModeIntra of (c), and the color component index cIdx are invoked as inputs, and the output is assigned to the (cbWidth) x (cbHeight) array predSamples.
Otherwise, if cIdx is equal to 1, the following applies:
the general intra sample prediction procedure as specified in clause 8.4.5.2.5 is set equal to IntraPredModeY [ xCb ] with the position (xTbClp, yTbClp) set equal to (xCb/2, yCb/2)][yCb]Is called as input, and the input is assigned to (cbWidth/2) x (cbHeight/2) array predsamples intra Cb
The weighted sample prediction procedure for combined Merge and intra prediction as specified in clause 8.5.6.6 is with the position (xTbCmp, yTbCmp) set equal to (xCb, yCb), the codec block width cbWidth/2, the codec block height cbHeight/2, set equal to predSamples, respectively Cb And predsamples intra Cb Is set equal to IntraPredModeY [ xCb ] ][yCb]The intra prediction mode predModeIntra of (c), and the color component index cIdx are invoked as inputs, and the output is assigned to the (cbWidth/2) x (cbHeight/2) array predSamples.
Otherwise (cIdx equal to 2), the following applies:
the general intra sample prediction procedure as specified in clause 8.4.5.2.5 is set equal to IntraPredModeY [ xCb ] with the position (xTbClp, yTbClp) set equal to (xCb/2, yCb/2)][yCb]Is called as input, and the output is assigned to (cbWidth/2) x (cbHeight/2) array predsamples intra Cr
The weighted sample prediction procedure for combined Merge and intra prediction as specified in clause 8.5.6.6 is to set equal to predSamples at positions (xTbCmp, yTbCmp) equal to (xCb, yCb), codec block width cbWidth/2, codec block height cbHeight/2, respectively Cr And predsamples intra Cr Is set equal to IntraPredModeY [ xCb ]][yCb]The intra prediction mode predModeIntra of (c), and the color component index cIdx are invoked as inputs, and the output is assigned to (cbWidth) (cbWidth/2) x (cbHeight/2) array predSamples.
8.5.6.3 fractional sample interpolation process
8.5.6.3.1 general purpose
The inputs to this process are:
a luminance position (xSb, ySb) specifying an upper left sample of the current encoded sub-block relative to an upper left luminance sample of the current picture,
the variable sbWidth, specifies the width of the currently encoded sub-block,
a variable sbHeight, specifying the height of the current codec sub-block,
motion vector offset mvOffset,
a refined motion vector refMvLX,
the selected reference picture sample array refPicLX,
the bidirectional optical flow flag bdofFlag,
the variable cIdx specifies the color component index of the current block.
The output of this process is:
-predicting the sample value Array predSamplesLX.
[ [ bidirectional optical flow boundary offset bdofOffset ]]]Is deduced as follows:
-if cIdx is equal to 0, the following applies:
-assuming (xIntL, yIntL) as the luminance position given in full-pel units and (xFracl, yFracl) as the offset given in 1/16 pel units. These variables are used only in this clause to specify fractional sample positions inside the reference sample array refPicLX.
-for each luminance sample point location inside the prediction luminance sample array predsamplelx
Corresponding predicted luminance sample value predsamplelx [ x ] L ][y L ]Is deduced as follows:
-variable xInt L 、yInt L 、xFrac L And yFrac L Is deduced as follows:
xInt L =xSb+(refMvLX[0]>>4)+x L (8-721)
yInt L =ySb+(refMvLX[1]>>4)+y L (8-722)
xFrac L =refMvLX[0]&15 (8-723)
yFrac L =refMvLX[1]&15 (8-724)
-if bdofFlag is equal to TRUE and one or more of the following conditions are TRUE, predicting the luminance sample value predSamplesLX [ x ] L ][y L ]By using (xInt) L ,yInt L )[[,(xFrac L ,yFrac L )]]And refPicLX as input call brightness integer point extraction procedure as specified in clause 8.5.6.3.3:
-x L equal to 0.
-x L Equal to sbWidth +1.
-y L Equal to 0.
-y L Equal to sbheight+1.
Otherwise, the following applies:
the motion vector mvLX is set equal to (refMvLX-mvOffset).
For dir=0..1, the list padVal [ dir ] is derived as follows:
the variable disp is derived as follows:
disp=(refMvLX[dir]>>4)-(mvLX[dir]>>4)+(dir==0x L :y L ) (8-725)
-if disp is less than 0, padVal dir is set equal to disp.
Otherwise, if disp is greater than (dir= 0sbwidth: sbheight) -1, padVal [ dir ] is set equal to disp- ((dir= 0sbwidth: sbheight) -1.
Otherwise, padVal dir is set equal to 0.
The predicted luminance sample value predsamplelx [ xL ] [ yL ] is derived by invoking as input the luminance sample point 8 tap interpolation filtering process as specified in clause 8.5.6.3.2 with (xIntL, yIntL), (xFracL, yFracL), refPicLX, sbWidth, sbHeight, (xSb, ySb) and padVal.
Otherwise (cIdx is not equal to 0), the following applies:
-assuming (xIntC, yIntC) is the chromaticity position given in full-pel units and (xFracc, yFracc) is the offset given in 1/32 pel units. These variables are used only in this clause to specify the universal fractional sample location inside the reference sample array refPicLX.
For each chroma-sample point within the predicted chroma-sample array predsamplelx (xc=0..sbwidth-1, yc=0..sbheight-1), the corresponding predicted chroma-sample value predsamplelx [ xC ] [ yC ] is derived as follows:
variables xIntC, yIntC, xFracC and yFracc are derived as follows:
xInt C =(xSb/SubWidthC)+(mvLX[0]>>5)+x C (8-726)
yInt C =(ySb/SubHeightC)+(mvLX[1]>>5)+y C (8-727)
xFrac C =mvLX[0]&31 (8-728)
yFrac C =mvLX[1]&31 (8-729)
the motion vector mvLX is set to (refMvLX-mvOffset).
For dir=0..1, the list padVal [ dir ] is derived as follows:
the variable disp is derived as follows:
disp=(refMvLX[dir]>>4)-(mvLX[dir]>>4)+(dir==0x C :y C ) (8-730)
-if disp is less than 0, padVal dir is set equal to disp.
Otherwise, if disp is greater than (dir= 0 sbWidth/subwindithc: sbHeight/subwindithc) -1, padVal [ dir ] is set equal to disp- ((dir= 0 sbWidth/subwindithc: sbHeight/subwindithc) -1).
Otherwise, padVal dir is set equal to 0.
The predicted sample value predSamplesLX [ xC ] [ yC ] is derived by calling the procedure specified in clause 8.5.6.3.4 with (xIntC, yIntC), (xFracC, yFracC), refPicLX, and padVal as inputs.
Filling process for x-ray flow process
-Two variables, cbWidth and cbHeight, specify the width and height of the current block,
-the brightness position (xSb, ySb),
-a (sbWidth+2) x (sbHeight+2) predicted sample array padddedsamples
-cbWidthxcbHeight prediction sample array nopadd samples
The variable paddingW is derived as Min (16, cbwidth).
The variable paddingH is derived as Min (16, cbheight).
The (sbwidth+2) x (sbheight+2) sample array predSamples is derived as:
predSamples[x][y]is set equal to padddedsamples x][y]Where x=1 … sbWidth, y =1…sbHeight。
If ySb% paddingH is equal to 0, predSamples [ x ]][0]Is set equal to padddedsamples [x][0]Where x=1 … sbWidth. Otherwise, predSamples [ x ]][0]Is set equal to nopadd samples [ x ] xSb-1][ySb-1]Where x=1 … sbWidth.
If (ySb +sbHeight)% paddinghis equal to 0, predSamples [ x ]][sbHeight+1]Is provided with Is equal to padddedsamples [ x ]][sbHeight+1]Where x=1 … sbWidth. Otherwise, predSamples [ x ]] [sbHeight+1]Is set equal to nopadd samples [ x+xSb-1 ]][ySb+sbHeight]Wherein x=1 … sbWidth。
If xSb% of paddingW is equal to 0, predSamples [0 ]][y]Is set equal to padddedsamples [0][y]Where y=1 … sbHeight. Otherwise, predSamples [0 ]][y]Is set equal to nopadd samples [xSb-1][ySb+y-1]Where y=1 … sbHeight.
If (xSb+sbWidth)% paddingW is equal to 0, predSamples [ sbWidth+1 ]][y]Is arranged as Equal to padddedsamples [ sbwidth+1 ]][y]Where y=1 … sbHeight. Otherwise, predSamples [ sbWidth ] 1][y]Is set equal to nopaddedSamples xSb+sbWidth][ySb+y-1]Where y=1 … sbHeight.
5.2 working draft of bulleted 22.
The working draft is based on JHET-O2001.
Changes in JET-O0070 are in bold and italics. The deleted text is marked with double brackets (e.g., [ [ a ] ] represents the deleted character "a").
8.5.5.9 Process for deriving a motion vector array of motion vectors from affine control points
……
The variable cbprofflag lx is derived as follows:
-cbprofflag lx is set equal to FALSE if one or more of the following conditions are true.
-affine_prof_enabled_flag is equal to 0.
Fallbacksemoditrieged is equal to 1.
Numcpmmx is equal to 2 and cpMvLX [1] [0] is equal to cpMvLX [0] [0] and cpMvLX [1] [1] is equal to cpMvLX [0] [1].
Numcplmv is equal to 3 and cpMvLX [1] [0] is equal to cpMvLX [0] [0] and cpMvLX [1] [1] is equal to cpMvLX [0] [1] and cpMvLX [2] [0] is equal to cpMvLX [0] [0] and cpMvLX [2] [1] is equal to cpMvLX [0] [1].
Otherwise, cbprofflag lx is set equal to TRUE.
The variables sbWidth and sbHeight are derived as follows:
sbWidth=cbWidth/numSbX
sbHeight=cbHeight/numSbY
the variable bitDepth is set equal to bitDepth Y And variable shift1 is set equal to Max (6, bitdepth-6).
The variable dmvLimit is set equal to 1< < shift1.
The variables posOffsetX and posOffsetY are derived as follows:
posOffsetX=6*dHorX+6*dVerX
posOffsetY=6*dHorY+6*dVerY
for x=0..sbwidth-1 and y=0..sbheight-1, the following applies:
-the following applies:
diffMv[x][y][0]=x*(dHorX<<2)+y*(dVerX<<2)–posOffsetX
diffMv[x][y][1]=x*(dHorY<<2)+y*(dVerY<<2)–posOffsetY
for i=0..1, the following applies:
the rounding procedure for motion vectors as specified in clause 8.5.2.14 is invoked with mvX set equal to diffMv [ x ] [ y ] [ i ], lightshift set equal to 7, and leftShift set equal to 0 as inputs and rounded diffMv [ x ] [ y ] [ i ] as outputs.
diffMv [ x ] [ y ] [ i ] is clipped as follows:
diffMv[x][y][i]=Clip3(-dmvLimit,dmvLimit-1,diffMv[x][y][i])
5.2 working draft with bullets 23.
The working draft is based on JHET-O2001.
Changes in JET-O0070 are in bold and italics. The deleted text is marked with double brackets (e.g., [ [ a ] ] represents the deleted character "a").
7.3.2.3 sequence parameter set RBSP syntax
7.3.6 header grammar
7.3.6.1 generic slice header syntax
Equal to 1The slice_disable_bdofprofdmvr_flag is specified to exist in the slice header of the reference SPS. The sps_bdif_prof_dmvr_slice_present_flag equal to 0 specifies that slice_disable_bdif_prof_dmvr_flag is not present in the slice header of the reference SPS. When the sps_bdif_prof_dmvr_slice_present_flag does not exist, the value of sps_bdif_prof_dmvr_slice_present_flag is inferred to be equal to 0.
Equal to 1Inter-prediction using prediction refinement of optical flow and inter-bi-prediction based on decoder motion vector refinement are specified that do not enable bi-directional optical flow inter-prediction in the current slice. Slice_disable_bdif_prof_dmvr_flag equal to 0 specifies that bi-directional optical flow inter prediction may or may not be enabled in the current slice, prediction refinement with optical flow, or inter bi-prediction based on decoder motion vector refinement. When the slice_disable_bdif_prof_dmvr_flag does not exist, the value of slice_disable_prof_bdif_dmvr_flag is inferred to be 0.
8.5.1 general decoding procedure for codec units for encoding and decoding in inter prediction mode
……
-dmvrFlag is set equal to 1 when all the following conditions are true:
-sps_dmvr_enabled_flag is equal to 1, and
-general_merge_flag [ xCb ] [ yCb ] is equal to 1
Both predFlagL0[0] [0] and predFlagL1[0] [0] are equal to 1
-mmvd_merge_flag [ xCb ] [ yCb ] equal to 0
-ciip_flag [ xCb ] [ yCb ] equals 0
DiffPicOrderCnt (currPicList [0] [ refIdxL0 ]) is equal to DiffPicOrderCnt (refPicList [1] [ refIdxL1], currPic
BcWIdx [ xCb ] [ yCb ] is equal to 0
Both the luma_weight_l0_flag [ refidxl0] and the luma_weight_l1_flag [ refidxl1] are equal to 0
-cbWidth is greater than or equal to 8
-cbHeight is greater than or equal to 8
-cbHeight cbWidth greater than or equal to 128
-pic_width_in_luma_samples and pic_height_in_luma_samples of the reference picture refPicLX associated with refIdxLX are equal to pic_width_in_luma_samples and pic_height_in_luma_samples, respectively, of the current picture for each of X0 and 1.
……
8.5.6.1
-bdofFlag is set to TRUE if all the following conditions are TRUE.
-sps_bdofenabled_flag is equal to 1, and
both predFlagL0[ xSbIdx ] [ ySbIdx ] and predFlagL1[ xSbIdx ] [ ySbIdx ] are equal to 1.
DiffPicOrderCnt (currPic, refPicList [0] [ refIdxL0 ]) is less than 0.
Motionmodel idc [ xCb ] [ yCb ] is equal to 0.
-merge_sub_flag [ xCb ] [ yCb ] is equal to 0.
Sym_mvd_flag [ xCb ] [ yCb ] is equal to 0.
-ciip_flag [ xCb ] [ yCb ] equals 0.
BcWIdx [ xCb ] [ yCb ] is equal to 0.
Both the luma_weight_l0_flag [ refidxl0] and the luma_weight_l1_flag [ refidxl1] are equal to 0.
-cbWidth is greater than or equal to 8.
-cbHeight is greater than or equal to 8.
-cbHeight is greater than or equal to 128.
-pic_width_in_luma_samples and pic_height_in_luma_samples of the reference picture refPicLX associated with refIdxLX are equal to pic_width_in_luma_samples and pic_height_in_luma_samples, respectively, of the current picture for each of X0 and 1.
-cIdx is equal to 0.
Otherwise, bdofFlag is set equal to FALSE.
……
8.5.5.9
……
The variable cbprofflag lx is derived as follows:
-cbprofflag lx is set equal to FALSE if one or more of the following conditions are true.
-affine_prof_enabled_flag is equal to 0.
Fallbacksemoditrieged is equal to 1.
Numcpmmx is equal to 2 and cpMvLX [1] [0] is equal to cpMvLX [0] [0] and cpMvLX [1] [1] is equal to cpMvLX [0] [1].
Numcplmv is equal to 3 and cpMvLX [1] [0] is equal to cpMvLX [0] [0] and cpMvLX [1] [1] is equal to cpMvLX [0] [1] and cpMvLX [2] [0] is equal to cpMvLX [0] [0] and cpMvLX [2] [1] is equal to cpMvLX [0] [1].
Otherwise, cbprofflag lx is set equal to TRUE.
……
The examples described above may be incorporated in the context of the methods described below (e.g., methods 2900, 2920, 2940, and 2960) that may be implemented at a video decoder or video encoder.
Fig. 29A shows a flowchart of an exemplary method for video processing. The method 2910 includes, at step 2912, making a first determination regarding a codec mode for representing a current video block of the video in a codec representation of the video. The method 2910 further includes, at step 2914, making a second determination as to whether to apply the deblocking filter based on the first determination. The method 2910 further includes, at step 2916, performing a transition between the current video block and the codec representation according to the first determination and the second determination. In some implementations, the codec mode uses affine codec tools and specific motion prediction/compensation tools for conversion.
Fig. 29B shows a flowchart of an exemplary method for video processing. The method 2920 includes, at step 2922, determining to enable use of the switchable interpolation filter tool as a result of using a particular motion vector precision in the affine codec tool for representing a current video block in a codec representation of the video. The method 2920 further includes, at step 2924, performing a conversion based on the determination, wherein the switchable interpolation filter tool allows switching to another interpolation filter for the current video block that is different from the interpolation filter used to process the previous video block.
Fig. 29C shows a flowchart of an exemplary method for video processing. The method 2930 includes, at step 2932, for a current video block of a video comprising one or more video blocks, making a decision regarding applicability of bidirectional optical flow (BDOF) and/or motion information to use a Prediction Refinement Optical Flow (PROF) that refines optical flow for the current video block based on use of a switchable interpolation filter tool that allows the current video block and another video block to use different interpolation filters for determining a prediction block. The method 2930 further includes, at step 2934, performing a transition between the video and the codec representation of the video based on the determination.
Fig. 29D shows a flowchart of an exemplary method for video processing. The method 2940 includes, at step 2942, performing a transition between video blocks of a video region of the video and a codec representation of the video according to a rule. In some implementations, the rule specifies that the first syntax element is included in the codec representation at a level corresponding to a video region of applicability of a codec-side motion vector refinement tool or a decoder-side motion vector refinement tool based on the optical flow model, and the converting is performed according to a value of the first syntax element.
5. Example implementations of the disclosed technology
Fig. 30A is a block diagram of the video processing apparatus 3000. The apparatus 3000 may be used to implement one or more of the methods described herein. The apparatus 3000 may be embodied in a smart phone, tablet, computer, internet of things (Internet of Things, ioT) receiver, or the like. The apparatus 3000 may include one or more processors 3002, one or more memories 3004, and video processing hardware 3006. The processor(s) 3002 may be configured to implement one or more methods described in this document, including but not limited to method 2900. Memory(s) 3004 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 3006 may be used to implement some of the techniques described in this document in hardware circuitry.
Fig. 30B is another example of a block diagram of a video processing system in which the disclosed techniques may be implemented. Fig. 30B is a block diagram illustrating an example video processing system 4100 in which various techniques disclosed herein may be implemented. Various embodiments may include some or all of the components of system 4100. The system 4100 can include an input 4102 for receiving video content. The video content may be received in an original or uncompressed format, such as 8 or 10 bit multi-component pixel values, or may be in a compressed or encoded format. Input 4102 can represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, passive optical network (Passive Optical Network, PON), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.
The system 4100 can include a codec component 4104 that can implement various codec or encoding methods described in this document. The codec component 4104 can reduce the average bit rate of the video from the input 4102 to the output of the codec component 4104 to produce a codec representation of the video. Codec techniques are therefore sometimes referred to as video compression or video transcoding techniques. The output of the codec component 4104 can be stored or transmitted via a communication connection as represented by component 4106. The stored or communicatively transmitted bitstream (or codec) representation of the video received at input 4102 may be used by component 4108 to generate pixel values or displayable video transmitted to display interface 4110. The process of generating user-viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "codec" operations or tools, it will be appreciated that a codec tool or operation is used at the encoder and that a corresponding decoding tool or operation that inverts the codec results will be performed by the decoder.
Examples of the peripheral bus interface or the display interface may include a universal serial bus (Universal Serial Bus, USB), or a high-definition multimedia interface (High Definition Multimedia Interface, HDMI), or a display port (Displayport), or the like. Examples of storage interfaces include SATA (Serial Advanced Technology Attachment ), PCI, IDE interfaces, and the like. The techniques described in this document may be embodied in various electronic devices such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and/or video display.
Some embodiments of the disclosed technology include making decisions or determinations to enable video processing tools or modes. In an example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of video blocks, but may not necessarily modify the generated bitstream based on the use of the tool or mode. That is, when a video processing tool or mode is enabled based on a decision or determination, a transition from a block of video to a bitstream representation of the video will use the video processing tool or mode. In another example, when the video processing tool or mode is enabled, the decoder will process the bitstream with knowledge that the bitstream has been modified based on the video processing tool or mode. That is, the conversion of the bitstream representation of the video into blocks of the video will be performed using video processing tools or modes that are enabled based on the decision or determination.
Some embodiments of the disclosed technology include making a decision or determination to disable a video processing tool or mode. In an example, when a video processing tool or mode is disabled, the encoder will not use the tool or mode in the conversion of blocks of video into a bitstream representation of video. In another example, when a video processing tool or mode is disabled, the decoder will process the bitstream with the knowledge that the bitstream is not modified using the video processing tool or mode that is disabled based on the decision or determination.
In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during the transition from a pixel representation of the video to a corresponding bit stream representation, and vice versa. The bitstream representation of the current video block may, for example, correspond to bits collocated or interspersed in different locations within the bitstream, as defined by the syntax. For example, a macroblock may be encoded according to the transformed and encoded error residual values and also using bits in the header and other fields in the bitstream.
It should be appreciated that by allowing the techniques disclosed in this document to be used, the disclosed methods and techniques will benefit video encoder and/or decoder embodiments incorporated within video processing devices such as smartphones, laptops, desktops, and the like.
In some embodiments, the video encoding method may be implemented using an apparatus implemented on a hardware platform as described with reference to fig. 30A or 30B.
Various techniques and embodiments may be described using the following clause-based format.
The first set of terms describes certain features and aspects of the disclosed technology in the previous section.
1. A method for video processing, comprising: performing gradient computation in a first region of the current video block, wherein a size (mxn) of the first region is different from a size of a sub-block of the current video block for motion compensation in affine mode, and wherein M and N are positive integers; and performing a transition between the current video block and a bitstream representation of the video that includes the current video block based on the gradient calculations.
2. The method of clause 1, wherein the size of the first region is greater than the size of the sub-block.
3. The method of clause 1 or 2, wherein M and N are predefined positive integers.
4. The method of clause 1 or 2, wherein the size of the first region is based on the size of the sub-block.
5. The method of clause 1, wherein M/N is adaptively changed.
6. The method of clause 1, wherein M and N are based on the dimensions of the current video block.
7. The method of any of clauses 1-6, wherein M has a minimum value Mmin, and wherein N has a minimum value Nmin.
8. The method of clause 7, wherein Mmin = Nmin = 8.
9. The method of any of clauses 1-6, wherein the first region is filled to generate a first filled region having a size of (m+dm) × (n+dn).
10. The method of clause 9, wherein the samples in the first region or the first filled region are derived based on motion compensation with interpolation filtering.
11. The method of clause 1, wherein at least one sample in the first region is omitted when performing the gradient calculation.
12. The method of clause 1, wherein the gradient computation is performed with a first precision in bidirectional optical flow (BDOF) and with a second precision in Predictive Refinement (PROF) with optical flow, and wherein the first precision and the second precision are equal.
13. A method for video processing, comprising: based on a Prediction Refinement (PROF) that selectively applies optical flow to a current video block, making a decision regarding selectively applying a codec tool to the current video block, wherein the codec tool is different from the PROF; and performing a transition between the current video block and a bitstream representation of the video that includes the current video block based on the determination.
14. The method of clause 13, wherein the PROF is not applied and the codec tool is applied.
15. The method of clause 13, wherein the codec tool comprises generalized bi-directional prediction.
16. The method of clause 15, wherein the PROF is not applied, and wherein the index associated with generalized bi-prediction is non-zero.
17. The method of clause 13, wherein the codec tool is local illumination compensation.
18. The method of clause 13, wherein the codec tool is a Multiple Transform Set (MTS).
19. The method of clause 18, wherein the PROF is applied and only the default transform from the MTS is applied to the current video block.
20. The method of clause 13, wherein the codec tool is weighted prediction.
21. A method for video processing, comprising: during a transition between a current video block and a bitstream representation of a video that includes the current video block, a decision is made regarding selectively applying a Predictive Refinement (PROF) operation that utilizes optical flow, wherein the decision is based on color information of the current video block.
22. The method of clause 21, wherein the PROF operation is not applied to one or more chroma components of the current video block, and wherein the color information includes a 4:0:0 color format.
23. The method of clause 21, wherein the PROF operation is applied to one or more chroma components of the current video block, and wherein the color information includes a 4:4:4 color format.
24. The method of clause 21, wherein the PROF operation is applied to one or more chroma components of the current video block, and wherein the color information includes a 4:0:0 color format.
25. The method of clause 21, wherein a PROF operation is applied, and wherein the color information includes a plurality of color components
26. The method of clause 25, wherein the one or more gradients of the PROF operation are calculated independently for each of the plurality of color components.
27. The method of clause 25, wherein the one or more gradients of the PROF operation are calculated for a first color component of the plurality of color components and reused for a second color component of the plurality of color components.
28. The method of clause 26 or 27, wherein the accuracy of the gradient is based on at least one of the plurality of color components.
29. A method for video processing, comprising: based on the height (H) or width (W) of the current video block, a decision is made regarding a Predictive Refinement (PROF) operation for selective application with optical flow; and performing a transition between the current video block and a bitstream representation of the video that includes the current video block based on the determination.
30. The method of clause 29, wherein the PROF operation is applied to the luminance component of the current video block.
31. The method of clause 29, wherein the current video block is encoded using affine mode.
32. The method of clause 31, wherein the PROF operation is not applied, wherein w+.t1 and/or h+.t2, and wherein t1=t2=16.
33. The method of clause 31, wherein the PROF operation is not applied, wherein w≡t1 and/or h≡t2, and wherein t1=t2=64.
34. The method of clause 31, wherein the PROF operation is not applied, wherein W x H +.t or max (W, H) +.t, and wherein T = 16.
35. The method of clause 31, wherein the PROF operation is not applied, wherein W x H ≡t or min (W, H) ≡t, and wherein T = 64.
36. The method of clause 1 or 2, wherein the current video block is W x H in size, wherein M = min (K, W), and wherein K is an integer.
37. The method of clause 1 or 2, wherein the current video block is W x H in size, wherein N = min (K, H), and wherein K is an integer.
38. The method of clause 36 or 37, wherein K = 16.
39. The method of clause 1 or 2, further comprising:
before performing the gradient computation, a padding process is performed in a first region of the current video block.
40. The method of clause 39, wherein performing the filling process comprises deriving one or more motion vectors.
41. The method of clause 40, wherein the one or more motion vectors comprise motion vectors derived from an affine model directed to a specific location of the first region.
42. The method of clause 40, wherein the one or more motion vectors comprise a motion vector derived from at least one motion vector of at least one sub-block of the first region.
43. The method of clause 39, wherein performing the padding process is based on a height or width of the current video block.
44. The method of clause 39, wherein performing the padding process is based on signaling in a Video Parameter Set (VPS), a Decoder Parameter Set (DPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), a slice header, a slice group header, a Codec Tree Unit (CTU), or a Codec Unit (CU).
45. The method of clause 5 or 6, wherein M and N are signaled in a Video Parameter Set (VPS), a Decoder Parameter Set (DPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), a slice header, a slice group header, a Codec Tree Unit (CTU), or a Codec Unit (CU).
46. The method of clause 5 or 6, wherein M and N are specified in a profile, level or hierarchy of a video codec standard.
47. A method of video processing, comprising: determining applicability of a codec mode from a codec representation of a video comprising a plurality of video blocks based on a field in the codec representation of a video region level, wherein the video region level comprises one or more video blocks; and performing conversion between the codec representation and the plurality of video blocks using a result of the determination such that a codec mode is selectively used in determining applicability to the video region.
48. The method of clause 47, wherein the video region comprises a video stripe.
49. The method of any of clauses 47-48, wherein the codec mode comprises a bi-directional optical flow mode or a predictive refinement mode utilizing optical flow or a decoder-side motion vector refinement mode.
50. The method of any of clauses 47-49, wherein the determining comprises inferring that the codec mode is disabled based on detecting a value of an absence of a field or an absence of a field in the codec representation.
51. The method of any of clauses 47-50, wherein the determining further comprises determining the presence of another field of another video region level.
52. The method of any of clauses 47-49, wherein the determining comprises inferring that the codec mode is enabled based on detecting a value of a field or a presence of a field in the codec representation.
53. The method of any of clauses 47 and 49-52, wherein the video region comprises a video picture.
54. The method of any of clauses 47-53, wherein the converting comprises encoding the video to generate a codec representation.
55. The method of any of clauses 47-53, wherein the converting comprises decoding the encoded and decoded representation to generate the video.
56. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method according to any of clauses 1-55.
57. A video encoder comprising a processor or circuitry configured to implement the method according to one or more of clauses 1-55.
58. A video decoder comprising a processor or circuitry configured to implement the method according to one or more of clauses 1-55.
59. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method according to any one of clauses 1 to 55.
The second set of terms describes certain features and aspects of the disclosed technology in the previous section (including, for example, example embodiments 18-21, 23, and 24).
1. A video processing method, comprising: making a first determination regarding a codec mode for representing a current video block of the video in a codec representation of the video; based on the first determination, a second determination is made as to whether to apply the deblocking filter; and performing a transition between the current video block and the codec representation according to the first determination and the second determination, wherein the codec mode uses an affine codec tool and a specific motion prediction/compensation tool for the transition.
2. The method of clause 1, wherein the particular motion prediction/compensation tool comprises an interleaved prediction, wherein the interleaved prediction comprises partitioning the current video block into a first set of sub-blocks according to a first mode, and partitioning the current video block into a second set of sub-blocks according to a second mode, wherein at least one sub-block in the second set has a different dimension than a sub-block in the first set.
3. The method of clause 1 or 2, wherein the particular motion prediction/compensation tool comprises a phase change affine sub-block motion compensation tool with filters having different phases applied to each row of samples and each column of samples in the sub-block of the current video block.
4. The method of any of clauses 1-3, wherein the particular motion prediction/compensation tool includes a Prediction Refinement Optical Flow (PROF) in which the motion information is refined using optical flow applied to the current video block.
5. The method of any of clauses 2-4, wherein the deblocking filter is not applied to the current video block in the event that at least one of the interleaved prediction, the PROF, or the phase change affine sub-block motion compensation is applied to the current video block.
6. The method of any of clauses 2-4, wherein, in the case where at least one of the interleaved prediction, the PROF, or the phase change affine sub-block motion compensation is applied to the current video block, the deblocking filter is applied to boundaries of sub-blocks of the current video block with less intensity than that applied to another video block.
7. The method of clauses 2-4, wherein the deblocking filter is applied to the current video block in the event that at least one of the interleaved prediction, the PROF, or the phase change affine sub-block motion compensation is not applied to the current video block.
8. The method of any of clauses 1-7, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
9. A video processing method, comprising: determining to enable use of the switchable interpolation filter tool as a result of using a particular motion vector precision in the affine codec tool for representing a current video block of the video in a codec representation of the video; and performing a conversion based on the determination, wherein the switchable interpolation filter tool allows switching to another interpolation filter for the current video block that is different from the interpolation filter used to process the previous video block.
10. The method of clause 9, wherein the specific motion vector precision is 1/2 pixel or 1/4 pixel.
11. The method of clause 9 or 10, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
12. A video processing method, comprising: for a current video block of a video comprising one or more video blocks, making a decision regarding the applicability of bidirectional optical flow (BDOF) and/or motion information to use Prediction Refinement Optical Flow (PROF) that refines optical flow of the current video block based on the use of a switchable interpolation filter tool that allows the current video block and another video block to use different interpolation filters for determining a prediction block; and based on the decision, performing a transition between the video and the codec representation of the video.
13. The method of clause 12, wherein, in the case where the switchable interpolation filter tool is used for the current video block, BDOF is not applied.
14. The method of clause 12, wherein whether BDOF and/or PROF are applied to the current block is based on an interpolation filter for the current video block and/or motion vector accuracy corresponding to the interpolation filter.
15. The method of any of clauses 12 to 14, wherein BDOF is not applied due to the use of a particular motion vector precision for representing a current video block of video.
16. The method of any of clauses 12-14, wherein BDOF is not applied due to the use of a specific interpolation filter as an optional half-pixel precision filter or a default half-pixel precision filter.
17. The method of any of clauses 12-14, wherein no pro is applied in case a switchable interpolation filter tool is used for the current video block.
18. The method of any of clauses 12-14, wherein no pro is applied as a result of using a particular motion vector precision for representing a current video block of the video.
19. The method of any of clauses 12-14, wherein no PROF is applied due to the use of a specific interpolation filter as an optional half-pixel precision filter or a default half-pixel precision filter.
20. The method of any of clauses 12-19, wherein performing the conversion includes generating a codec representation from the video or generating the video from the codec representation.
21. A video processing method, comprising: the conversion between video blocks of a video region of the video and a codec representation of the video is performed according to a rule, wherein the rule specifies that a first syntax element is included in the codec representation at a level corresponding to an applicability of a codec tool or a decoder-side motion vector refinement tool based on an optical flow model, and wherein the conversion is performed according to a value of the first syntax element.
22. The method of clause 21, wherein the video region comprises a video slice or picture.
23. The method of clause 21, wherein the codec tool comprises at least one of bidirectional optical flow (BDOF), predictive Refinement Optical Flow (PROF), or decoder-side motion vector refinement (DMVR).
24. The method of any of clauses 21-23, wherein the rule specifies that a codec tool is not to be applied in case the first syntax element is not included in the codec representation.
25. The method of any of clauses 21 to 23, wherein the rule specifies that the first syntax element is only included if a codec tool is applied at a sequence level in the codec representation.
26. The method of any of clauses 21 to 23, wherein the rule further specifies that the second syntax element is to indicate the presence of the first syntax element.
27. The method of clause 26, wherein the second syntax element is signaled at another video region level to indicate that the first syntax element is included in the codec representation.
28. The method of clause 26, wherein the absence of the second syntax element in the codec representation indicates that the syntax element is not included in the codec representation.
29. The method of any of clauses 21 to 23, wherein in case the first syntax element indicates that no codec tool is used, no codec tool is applied.
30. The method of any of clauses 21 to 29, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
31. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of clauses 1-30.
32. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method according to any one of clauses 1 to 30.
From the foregoing it will be appreciated that specific embodiments of the presently disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the presently disclosed technology is not limited except as by the appended claims.
Embodiments of the subject matter and the functional operations described in this patent document may be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on tangible and non-transitory computer readable media for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a combination of materials affecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. In addition to hardware, an apparatus may include code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (Field Programmable Gate Array ) or an ASIC (Application Specific Integrated Circuit, application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
The specification and drawings are to be regarded in an illustrative rather than a restrictive sense, and the illustrative rather than the restrictive sense is intended to be exemplary. As used herein, the use of "or" is intended to include "and/or" unless the context clearly indicates otherwise.
Although this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Furthermore, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only some embodiments and examples are described and other embodiments, enhancements, and variations may be made based on what is described and shown in this patent document.

Claims (30)

1. A video processing method, comprising:
making a first determination regarding a codec mode for representing a current video block of the video in a codec representation of the video;
based on the first determination, a second determination is made as to whether to apply the deblocking filter; and
based on the first determination and the second determination, performing a transition between the current video block and the codec representation,
wherein the codec mode uses an affine codec tool and a specific motion prediction/compensation tool for conversion, the specific motion prediction/compensation tool comprising at least one of an interleaved prediction, a prediction refinement optical flow PROF, or a phase change affine sub-block motion compensation, and
wherein,
in the case that at least one of the interlaced prediction, the PROF, or the phase change affine sub-block motion compensation is applied to the current video block, the deblocking filter is not applied to the current video block, or
In the case where at least one of the interlaced prediction, the PROF, or the phase change affine sub-block motion compensation is applied to the current video block, the deblocking filter is applied to the boundaries of the sub-blocks of the current video block with less intensity than that applied to the other video block.
2. The method of claim 1, wherein the interlaced prediction comprises partitioning the current video block into a first set of sub-blocks according to a first mode, and partitioning the current video block into a second set of sub-blocks according to a second mode, wherein at least one sub-block in the second set has a different dimension than a sub-block in the first set.
3. The method according to claim 1 or 2, wherein in the phase change affine sub-block motion compensation, filters with different phases are applied to each row of samples and each column of samples in a sub-block of a current video block.
4. The method according to claim 1 or 2, wherein in the PROF, the motion information is refined using optical flow applied to the current video block.
5. The method of claim 1 or 2, wherein the deblocking filter is applied to the current video block in the event that at least one of the interleaved prediction, the PROF, or the phase change affine sub-block motion compensation is not applied to the current video block.
6. The method of claim 1 or 2, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
7. The method of claim 1, further comprising:
Determining to enable use of the switchable interpolation filter tool as a result of using a particular motion vector precision in the affine codec tool for representing a current video block of the video in a codec representation of the video; and
based on the determination a conversion is performed,
wherein the switchable interpolation filter tool allows switching to another interpolation filter for the current video block than the interpolation filter used to process the previous video block.
8. The method of claim 7, wherein the particular motion vector precision is 1/2 pixel or 1/4 pixel.
9. The method of claim 7 or 8, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
10. The method of claim 1, further comprising:
for a current video block of a video comprising one or more video blocks, making a decision regarding the applicability of bidirectional optical flow BDOF and/or motion information to use prediction refinement optical flow PROF that refines the optical flow of the current video block based on the use of a switchable interpolation filter tool that allows the current video block and another video block to use different interpolation filters for determining the prediction block; and
Based on the decision, a transition between the video and the codec representation of the video is performed.
11. The method of claim 10, wherein, in the case where a switchable interpolation filter tool is used for a current video block, BDOF is not applied.
12. The method of claim 10, wherein whether the BDOF and/or PROF is applied to the current block is based on an interpolation filter for the current video block and/or motion vector accuracy corresponding to the interpolation filter.
13. The method of any of claims 10 to 12, wherein BDOF is not applied due to the use of a particular motion vector precision for representing a current video block of video.
14. A method according to any of claims 10 to 12, wherein BDOF is not applied due to the use of a specific interpolation filter as an optional half-pixel precision filter or a default half-pixel precision filter.
15. The method according to any of claims 10 to 12, wherein no pro is applied in case a switchable interpolation filter tool is used for the current video block.
16. The method according to any of claims 10 to 12, wherein no pro is applied as a result of using a specific motion vector precision for representing a current video block of the video.
17. The method according to any of claims 10 to 12, wherein no pro is applied due to the use of a specific interpolation filter as an optional half-pixel precision filter or a default half-pixel precision filter.
18. The method of any of claims 10 to 12, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
19. The method of claim 1, further comprising:
the conversion between video blocks of a video region of the video and the codec representation of the video is performed according to rules,
wherein the rule specifies that the first syntax element is included in the codec representation at a level corresponding to a video region of applicability of a codec-side motion vector refinement tool or a decoder-side motion vector refinement tool based on the optical flow model, and wherein the converting is performed according to a value of the first syntax element.
20. The method of claim 19, wherein the video region comprises a video slice or picture.
21. The method of claim 19, wherein the codec tool comprises at least one of bi-directional optical flow, BDOF, predictive refinement optical flow, PROF, or decoder side motion vector refinement, DMVR.
22. The method of any of claims 19 to 21, wherein the rule specifies that no codec tool is applied in case the first syntax element is not included in the codec representation.
23. The method of any of claims 19 to 21, wherein the rule specifies that the first syntax element is only included if a codec tool is applied at a sequence level in the codec representation.
24. The method of any of claims 19-21, wherein the rule further specifies that a second syntax element is to indicate the presence of a first syntax element.
25. The method of claim 24, wherein the second syntax element is signaled at another video region level to indicate that the first syntax element is included in the codec representation.
26. The method of claim 24, wherein the absence of the second syntax element in the codec representation indicates that the syntax element is not included in the codec representation.
27. The method of any of claims 19 to 21, wherein, in case the first syntax element indicates that no codec tool is used, no codec tool is applied.
28. The method of any of claims 19 to 21, wherein performing the conversion comprises generating a codec representation from the video or generating the video from the codec representation.
29. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-28.
30. A non-transitory computer readable medium storing code which, when executed by a processor, causes the processor to perform the method of any one of claims 1 to 28.
CN202080041806.9A 2019-06-05 2020-06-03 Interaction between motion vector refinement and other codec tools Active CN114503596B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
CNPCT/CN2019/090201 2019-06-05
CN2019090201 2019-06-05
CN2019094767 2019-07-04
CNPCT/CN2019/094767 2019-07-04
CNPCT/CN2019/096180 2019-07-16
CN2019096180 2019-07-16
PCT/CN2020/094156 WO2020244545A1 (en) 2019-06-05 2020-06-03 Interaction between motion vector refinements and other coding tools

Publications (2)

Publication Number Publication Date
CN114503596A CN114503596A (en) 2022-05-13
CN114503596B true CN114503596B (en) 2023-12-29

Family

ID=73653054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080041806.9A Active CN114503596B (en) 2019-06-05 2020-06-03 Interaction between motion vector refinement and other codec tools

Country Status (2)

Country Link
CN (1) CN114503596B (en)
WO (1) WO2020244545A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022228420A1 (en) * 2021-04-27 2022-11-03 Beijing Bytedance Network Technology Co., Ltd. Method, device, and medium for video processing
EP4258666A1 (en) * 2022-04-07 2023-10-11 Beijing Xiaomi Mobile Software Co., Ltd. Encoding/decoding video picture data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104396248A (en) * 2012-10-12 2015-03-04 韩国电子通信研究院 Image encoding/decoding method and device using same

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10412419B2 (en) * 2013-07-12 2019-09-10 Qualcomm Incorporated Adaptive filtering in video coding
AU2014202921B2 (en) * 2014-05-29 2017-02-02 Canon Kabushiki Kaisha Method, apparatus and system for de-blocking a block of video samples
US10200717B2 (en) * 2014-06-19 2019-02-05 Sharp Kabushiki Kaisha Image decoding device, image coding device, and predicted image generation device
US10341659B2 (en) * 2016-10-05 2019-07-02 Qualcomm Incorporated Systems and methods of switching interpolation filters
WO2019070770A1 (en) * 2017-10-02 2019-04-11 Arris Enterprises Llc System and method for reducing blocking artifacts and providing improved coding efficiency

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104396248A (en) * 2012-10-12 2015-03-04 韩国电子通信研究院 Image encoding/decoding method and device using same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BoG report on CE2 sub-block based motion prediction related contributions;Chun-Chi Chen;《JVET-N0776-v4,Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11》;部1.1.2 *

Also Published As

Publication number Publication date
WO2020244545A1 (en) 2020-12-10
CN114503596A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN113711609B (en) Incremental motion vectors in predictive refinement using optical flow
CN113728630B (en) Region-based gradient computation in different motion vector refinements
CN110933421B (en) Syntax reuse of affine patterns with adaptive motion vector resolution
CN113711608B (en) Suitability of predictive refinement procedure with optical flow
CN113574869B (en) Optical flow-based predictive refinement
CN112956197A (en) Restriction of decoder-side motion vector derivation based on coding information
CN117915083A (en) Interaction between intra copy mode and inter prediction tools
CN110662041B (en) Method and apparatus for video bitstream processing, method of storing video bitstream, and non-transitory computer-readable recording medium
CN113412623A (en) Recording context of affine mode adaptive motion vector resolution
CN113366851A (en) Fast algorithm for symmetric motion vector difference coding and decoding mode
CN115918080A (en) Affine prediction improvement for video coding and decoding
CN114503596B (en) Interaction between motion vector refinement and other codec tools
CN113678444B (en) Entropy coding of affine patterns with adaptive motion vector resolution
CN111010580B (en) Size limitation based on motion information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant