CN116095312A - Video processing method, apparatus and computer readable medium - Google Patents

Video processing method, apparatus and computer readable medium Download PDF

Info

Publication number
CN116095312A
CN116095312A CN202310085146.XA CN202310085146A CN116095312A CN 116095312 A CN116095312 A CN 116095312A CN 202310085146 A CN202310085146 A CN 202310085146A CN 116095312 A CN116095312 A CN 116095312A
Authority
CN
China
Prior art keywords
block
current block
motion vector
motion
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310085146.XA
Other languages
Chinese (zh)
Inventor
刘鸿彬
张莉
张凯
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Publication of CN116095312A publication Critical patent/CN116095312A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Embodiments of the present disclosure relate to video processing methods, apparatuses, and computer readable media, and in particular, to restrictions on use of updated motion information. The present disclosure provides a video processing method, including: determining original motion information associated with the current block; generating updated motion information based on the particular prediction mode; and performing a conversion between the current block and a bitstream representation of video data comprising the current block based on the updated motion information, wherein the particular prediction mode comprises one or more of bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques.

Description

Video processing method, apparatus and computer readable medium
The present application is a divisional application of the inventive patent application with application date 2019, 8-5, application number 201910718717.2, and the title of the invention "video processing method, apparatus, and computer-readable medium".
Technical Field
This patent document relates to video coding techniques, apparatuses, and systems.
Background
Despite advances in video compression, digital video is still a maximum bandwidth usage over the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth required for digital video usage is expected to continue to increase.
Disclosure of Invention
Signaling describes devices, systems, and methods related to digital video coding, and in particular, motion refinement based on updated motion vectors generated from two-step inter prediction. The described methods may be applied to existing video coding standards, such as High Efficiency Video Coding (HEVC), and future video coding standards or video codecs.
In one representative aspect, there is provided a video processing method comprising: determining original motion information of a current block; scaling an original motion vector of the original motion information and a derived motion vector derived based on the original motion vector to the same target precision; generating an updated motion vector from the scaled original and derived motion vectors; and performing a conversion between the current block and a bitstream representation of the video including the current block based on the updated motion vector.
In another representative aspect, there is provided a video processing method comprising: determining original motion information of a current block; updating an original motion vector of original motion information of the current block based on a refinement method; clipping the updated motion vector to a range; and performing a conversion between the current block and a bitstream representation of the video including the current block based on the cropped updated motion vector.
In yet another representative aspect, there is provided a video processing method comprising: determining original motion information associated with the current block; generating updated motion information based on the particular prediction mode; and performing a conversion between the current block and a bitstream representation of the video data including the current block based on the updated motion information, wherein the particular prediction mode includes one or more of bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques.
In yet another representative aspect, there is provided a video processing method comprising: determining an Motion Vector Difference (MVD) precision of the current block processed with the affine mode from a MVD precision set; based on the determined MVD precision, a transition between the current block and a bitstream representation of the video comprising the current block is performed.
In yet another representative aspect, there is provided a video processing method comprising: determining non-updated motion information associated with the current block; updating the non-updated motion information based on a plurality of decoder-side motion vector derivation (DMVD) methods to generate updated motion information for the current block; and performing a conversion between the current block and a bitstream representation of the video including the current block based on the updated motion information.
In yet another representative aspect, the disclosed techniques may be used to provide a method for video encoding. The method comprises receiving a bitstream representation of a current block of video data, generating updated first and second reference motion vectors based on a weighted sum of the first scaled motion vector and first and second scaled reference motion vectors, respectively, wherein the first motion vector is derived based on the first reference motion vector from the first reference block and the second reference motion vector from the second reference block, wherein the current block is associated with the first and second reference blocks, wherein the first scaled motion vector is generated by scaling the first motion vector to a target precision, and wherein the first and second scaled reference motion vectors are generated by scaling the first and second reference motion vectors to the target precision, respectively, and processing the bitstream representation based on the updated first and second reference motion vectors to generate the current block.
In yet another representative aspect, the disclosed techniques may be used to provide a method for video encoding. The method includes generating an intermediate prediction for the current block based on first motion information associated with the current block, updating the first motion information to second motion information, and generating a final prediction for the current block based on the intermediate prediction or the second motion information.
In yet another representative aspect, the disclosed techniques may be used to provide a method for video encoding. The method includes receiving a bitstream representation of a current block of video data, generating intermediate motion information based on motion information associated with the current block, generating updated first and second reference motion vectors based on first and second reference motion vectors, respectively, wherein the current block is associated with the first and second reference blocks, and wherein the first and second reference motion vectors are associated with the first and second reference blocks, respectively, and processing the bitstream representation based on the intermediate motion information or the updated first and second reference motion vectors to generate the current block.
In yet another representative aspect, the disclosed techniques may be used to provide a method for video encoding that includes generating an updated reference block for a bitstream representation of a current block by modifying the reference block associated with the current block; calculating a temporal gradient for bi-directional optical flow (BIO) motion refinement based on the updated reference block; and performing a conversion including BIO motion refinement between the bitstream representation and the current block based on the temporal gradient.
In yet another representative aspect, the disclosed techniques may be used to provide a method for video encoding that includes generating a temporal gradient for bi-directional optical flow (BIO) motion refinement for a bitstream representation of a current block; generating an updated time gradient by subtracting a difference of a first mean and a second mean from the time gradient, wherein the first mean is a mean of a first reference block, wherein the second mean is a mean of a second reference block, and wherein the first and second reference blocks are associated with the current block; and performing a conversion including BIO motion refinement between the bitstream representation and the current block based on the updated temporal gradient.
In yet another representative aspect, the above-described methods are embodied in the form of processor-executable code and stored in a computer-readable program medium.
In yet another representative aspect, an apparatus configured or operable to perform the above-described method is disclosed. The apparatus may include a processor programmed to implement the method.
In yet another representative aspect, a video decoder device may implement the methods as described herein.
The above and other aspects and features of the disclosed technology are described in more detail in the accompanying drawings, description and claims.
Drawings
Fig. 1 shows an example of constructing a Merge candidate list.
Fig. 2 shows an example of the location of a spatial candidate.
Fig. 3 shows an example of candidate pairs subject to redundancy check of spatial Merge candidates.
Fig. 4A and 4B show examples of the location of a second Prediction Unit (PU) based on the size and shape of a current block.
Fig. 5 shows an example of motion vector scaling for a temporal Merge candidate.
Fig. 6 shows an example of candidate locations of time Merge candidates.
Fig. 7 shows an example of generating combined bi-predictive Merge candidates.
Fig. 8 shows an example of constructing motion vector prediction candidates.
Fig. 9 shows an example of motion vector scaling for spatial motion vector candidates.
Fig. 10 shows an example of motion prediction using an Alternative Temporal Motion Vector Prediction (ATMVP) algorithm for a Coding Unit (CU).
Fig. 11 shows an example of a Coding Unit (CU) with sub-blocks and neighboring blocks used by a spatio-temporal motion vector prediction (STMVP) algorithm.
Fig. 12A and 12B show example snapshots of sub-blocks when using an Overlapped Block Motion Compensation (OBMC) algorithm.
Fig. 13 shows an example of neighboring samples for deriving parameters of a Local Illumination Compensation (LIC) algorithm.
Fig. 14 shows an example of a simplified affine motion model.
Fig. 15 shows an example of affine Motion Vector Field (MVF) of each sub-block.
Fig. 16 shows an example of Motion Vector Prediction (MVP) of an AF INTER affine motion mode.
Fig. 17A and 17B show example candidates of the af_merge affine motion mode.
Fig. 18 shows an example of bilateral matching in a Pattern Matching Motion Vector Derivation (PMMVD) pattern, which is a special mere pattern based on a Frame Rate Up Conversion (FRUC) algorithm.
Fig. 19 shows an example of template matching in a FRUC algorithm.
Fig. 20 shows an example of single-sided motion estimation in a FRUC algorithm.
FIG. 21 shows an example of optical flow trajectories used by a bi-directional optical flow (BIO) algorithm.
FIGS. 22A and 22B illustrate example snapshots using a bi-directional optical flow (BIO) algorithm without block expansion.
Fig. 23 shows an example of a decoder-side motion vector refinement (DMVR) algorithm based on bilateral template matching.
Fig. 24 shows an example of template definition used in transform coefficient context modeling.
Fig. 25 shows different examples of motion vector scaling.
Fig. 26A and 26B show examples of internal and boundary sub-blocks in a PU/CU.
Fig. 27 shows a flowchart of an example method for video encoding in accordance with the presently disclosed technology.
Fig. 28 shows a flow chart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 29 shows a flowchart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 30 shows a flow chart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 31 shows a flowchart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 32 shows an example of deriving motion vectors in bidirectional optical flow based video coding.
Fig. 33 shows a flow chart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 34 illustrates a flow chart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 35 shows a flow chart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 36 shows a flowchart of another example method for video encoding in accordance with the presently disclosed technology.
Fig. 37 is a block diagram of an example of a hardware platform for implementing the visual media decoding or visual media encoding techniques described in this document.
Fig. 38 shows a flowchart of another example method for video processing in accordance with the presently disclosed technology.
Fig. 39 illustrates a flow chart of another example method for video processing in accordance with the presently disclosed technology.
Fig. 40 shows a flowchart of another example method for video processing in accordance with the presently disclosed technology.
Fig. 41 shows a flowchart of another example method for video processing in accordance with the presently disclosed technology.
Fig. 42 shows a flowchart of another example method for video processing in accordance with the presently disclosed technology.
Detailed Description
Video coding methods and techniques are prevalent in modern technology due to the increasing demand for higher resolution video. Video codecs often include electronic circuitry or software that compresses or decompresses digital video and are continually improved to provide higher coding efficiency. The video codec converts uncompressed video into a compressed format and vice versa. There is a complex relationship between video quality, the amount of data used to represent the video (determined by the bit rate), the complexity of the encoding and decoding algorithms, the sensitivity to data loss and errors, the ease of editing, random access and end-to-end delay. The compression format typically conforms to standard video compression specifications, such as the High Efficiency Video Coding (HEVC) standard (also known as h.265 or MPEG-H part 2), the universal video coding standard to be completed, or other current and/or future video coding standards.
Embodiments of the disclosed technology may be applied to existing video coding standards (e.g., HEVC, h.265) and future standards to improve compression performance. Section headings are used in this document to improve the readability of the description and are not intended to limit the discussion or embodiments (and/or implementations) in any way to only corresponding sections.
Examples of inter prediction in HEVC/H.265
Video coding standards have improved significantly over the years and now offer, in part, high coding efficiency and support for higher resolutions. Recent standards such as HEVC and h.265 are based on hybrid video coding structures, where temporal prediction plus transform coding is utilized.
1.1. Examples of predictive models
Each inter-predicted PU (prediction unit) has motion parameters for one or two reference picture lists. In some embodiments, the motion parameters include a motion vector and a reference picture index. In other embodiments, inter predidc may also be used to signal the use of one of the two reference picture lists. In yet other embodiments, the motion vector may be explicitly encoded as an increment relative to the predictor.
When a CU is encoded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no encoded motion vector delta, or reference picture index. The Merge mode is specified so that the current PU's motion parameters, including spatial and temporal candidates, are obtained from neighboring PUs. The Merge mode may be applied to any inter-predicted PU, not just the skip mode. An alternative to the Merge mode is explicit transmission of motion parameters, in which, for each PU, a motion vector, a corresponding reference picture index for each reference picture list, and a reference picture list use are explicitly signaled.
When the signaling indicates that one of the two reference picture lists is to be used, a PU is generated from one sample block. This is called "uni-prediction". Unidirectional prediction may be used for both P-stripes and B-stripes.
When the signaling indicates that two reference picture lists are to be used, a PU is generated from two sample blocks. This is called "bi-prediction". Bi-directional prediction is only applicable to B-stripes.
1.1.1.1 embodiment of constructing candidates for Merge mode
When predicting a PU using the Merge mode, an index to an entry in the Merge candidate list is parsed from the bitstream and used to retrieve motion information. The construction (construction) of this list can be summarized according to the following sequence of steps:
step 1: original candidate derivation
Step 1.1: spatial candidate derivation
Step 1.2: redundancy check of spatial candidates
Step 1.3: time candidate derivation
Step 2: inserting additional candidates
Step 2.1: creating bi-prediction candidates
Step 2.2: inserting zero motion candidates
Fig. 1 shows an example of constructing a Merge candidate list based on the above summarized sequence of steps. For spatial Merge candidate derivation, up to four Merge candidates are selected among the candidates located at five different positions. For time Merge candidate derivation, a maximum of one Merge candidate is selected among the two candidates. Since a constant number of candidates is assumed for each PU at the decoder, additional candidates are generated when the number of candidates does not reach the maximum number of Merge candidates signaled in the slice header (MaxNumMergeCand). Since the number of candidates is constant, truncated unary binarization (Truncated Unary binarization, TU) is used to encode the index of the best Merge candidate. If the size of the CU is equal to 8, then all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2Nx2N prediction unit.
1.1.2 construction of space Merge candidates
In the derivation of spatial Merge candidates, up to four Merge candidates are selected among candidates located at the positions depicted in fig. 2. The deduced sequence is A 1 、B 1 、B 0 、A 0 And B 2 . Only when position A 1 、B 1 、B 0 、A 0 When any PU of (e.g., because it belongs to another slice or block) is unavailable or intra-codedOnly consider position B 2 . In addition position A 1 After the candidates at this point, redundancy check is performed on the addition of the remaining candidates, which ensures that candidates having the same motion information are excluded from the list, so that the encoding efficiency is improved. In order to reduce the computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only the pairs connected by arrows in fig. 3 are considered, and candidates are added to the list only when the corresponding candidates for redundancy check have different motion information. Another source of duplicate motion information is a "second PU" associated with a partition other than 2n×2n. As an example, fig. 4A and 4B depict a second PU for the case of nx2n and 2nxn, respectively. When the current PU is partitioned into N2N, position A 1 Candidates at which are not considered for list construction. In some embodiments, it may be possible to result in two prediction units with the same motion information by adding the candidate, which is redundant for having only one PU in the coding unit. Similarly, when the current PU is partitioned into 2N×N, position B is not considered 1
1.1.1.3 construction time Merge candidate
In this step, only one candidate is added to the list. In particular, in the derivation of this temporal Merge candidate, the scaled motion vector is derived based on co-located PUs belonging to the picture within the given reference picture list that has the smallest POC difference from the current picture. The reference picture list to be used for deriving co-located PUs is explicitly signaled in the slice header.
Fig. 5 shows an example of the derivation of a scaled motion vector for a temporal Merge candidate (as dashed line) that is scaled from the motion vector of a co-located PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and td is defined as the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of the temporal Merge candidate is set equal to zero. For the B slice, two motion vectors are obtained, one for reference picture list 0 and the other for reference picture list 1, and the two motion vectors are combined to obtain the bi-prediction Merge candidate.
In co-located PU (Y) belonging to reference frame, in candidate C 0 And C 1 The location of the time candidate is selected in between, as shown in fig. 6. If position C 0 The PU at is unavailable, intra-coded, or outside the current CTU, then position C is used 1 . Otherwise, position C 0 For the derivation of time mere candidates.
1.1.4 construction of additional types of Merge candidates
In addition to the space-time Merge candidates, there are two additional types of Merge candidates: combined bi-predictive Merge candidate and zero Merge candidate. A combined bi-predictive Merge candidate is generated by utilizing the space-time Merge candidate. The combined bi-predictive Merge candidate is only for the B stripe. A combined bi-prediction candidate is generated by combining the first reference picture list motion parameter of the original candidate with the second reference picture list motion parameter of another candidate. If the two tuples provide different motion hypotheses they will form a new bi-prediction candidate.
Fig. 7 shows an example of this procedure, where two candidates in the original list (710 on the left) have mvL0 and refIdxL0 or mvL1 and refIdxL1, which are used to create a combined bi-prediction Merge candidate that is added to the final list (right).
Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list, thereby achieving MaxNumMergeCand capacity. These candidates have a zero spatial displacement and a reference picture index that starts from zero and increases each time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is 1 and 2 for unidirectional and bi-directional prediction, respectively. In some embodiments, no redundancy check is performed on these candidates.
1.1.5 examples of motion estimation regions for parallel processing
In order to speed up the encoding process, motion estimation may be performed in parallel, thereby deriving motion vectors for all prediction units within a given region at the same time. Deriving Merge candidates from a spatial neighborhood may interfere with parallel processing because one prediction unit cannot derive motion parameters from neighboring PUs until its associated motion estimation is complete. To mitigate the tradeoff between coding efficiency and processing latency, a motion estimation region (Motion Estimation Region, MER) may be defined, the size of which is signaled in a Picture Parameter Set (PPS) using a "log 2_parameter_merge_level_minus2" syntax element. When defining MERs, the Merge candidates that fall into the same region are marked as unavailable and are therefore not considered in list construction.
1.2 embodiment of Advanced Motion Vector Prediction (AMVP)
AMVP exploits the spatio-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. It constructs a motion vector candidate list by first checking the availability above the temporally adjacent PU locations, left, removing redundant candidates and adding zero vectors to make the candidate list a constant length. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, truncated unary is used to encode the index of the best motion vector candidate. The maximum value to be encoded in this case is 2 (see fig. 8). In the following section, details are provided regarding the derivation process of motion vector prediction candidates.
1.2.1 examples of constructing motion vector prediction candidates
Fig. 8 summarizes the derivation process of the motion vector prediction candidates, and may be implemented with an index as an input for each reference picture list.
In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For spatial motion vector candidate derivation, two motion vector candidates are ultimately derived based on the motion vector of each PU located at five different locations previously shown in fig. 2.
For temporal motion vector candidate derivation, one motion vector candidate is selected from the two candidates, which is derived based on two different co-located positions. After generating the first list of spatio-temporal candidates, the repeated motion vector candidates in the list are removed. If the number of potential candidates is greater than 2, motion vector candidates within the associated reference picture list whose reference picture index is greater than 1 are removed from the list. If the number of space-time motion vector candidates is less than 2, additional zero motion vector candidates are added to the list.
1.2.2 construction of spatial motion vector candidates
In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates, which are derived from PUs located at positions as previously shown in fig. 2, those positions being the same as the positions of motion Merge. The derivation order of the left side of the current PU is defined as A 0 、A 1 And scaled A 0 Scaled A 1 . Defining the derivation order of the upper side of the current PU as B 0 、B 1 、B 2 Scaled B 0 Scaled B 1 Scaled B 2 . Thus, for each side, there are four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling, and two of which use spatial scaling. Four different cases are summarized as follows:
no spatial scaling
- (1) identical reference picture list, and identical reference picture index (identical POC) - (2) different reference picture list, but identical reference picture (identical POC)
Spatial scaling
- (3) identical reference picture list, but different reference pictures (different POC)
- (4) different reference picture list, different reference picture (different POC)
The case without spatial scaling is checked first, and then the spatial scaling is checked. Spatial scaling is considered when POC differs between the reference picture of a neighboring PU and the reference picture of the current PU regardless of the reference picture list. If all PUs of the left candidate are not available or are intra coded, the above motion vectors are allowed to be scaled to aid in the parallel derivation of the left and upper MV candidates. Otherwise, spatial scaling of the motion vectors is not allowed.
As shown in the example of fig. 9, for the case of spatial scaling, the motion vectors of neighboring PUs are scaled in a similar manner as temporal scaling. One difference is that the reference picture list and the index of the current PU are given as inputs; the actual scaling process is the same as the time scaling process.
1.2.3 construction of temporal motion vector candidates
All procedures for deriving temporal Merge candidates are the same as those for deriving spatial motion vector candidates (as shown in the example of fig. 6), except for reference picture index derivation. In some embodiments, the decoder is signaled with reference picture index.
2. Inter prediction method example in Joint Exploration Model (JEM)
In some embodiments, future video coding techniques are explored using reference software called Joint Exploration Model (JEM). In JEM, sub-block based predictions such as affine prediction, alternative Temporal Motion Vector Prediction (ATMVP), space-time motion vector prediction (STMVP), bi-directional optical flow (BIO), frame rate up-conversion (FRUC), local Adaptive Motion Vector Resolution (LAMVR), overlapped Block Motion Compensation (OBMC), local Illumination Compensation (LIC), and decoder side motion vector refinement (DMVR) are employed in several coding tools.
2.1 example of sub-CU based motion vector prediction
In a JEM with a quadtree plus binary tree (QTBT), each CU may have at most one set of motion parameters for each prediction direction. In some embodiments, two sub-CU level motion vector prediction methods are considered in the encoder by dividing the large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. An optional temporal motion vector prediction (Alternative Temporal Motion Vector Prediction, ATMVP) method allows each CU to extract multiple sets of motion information from multiple blocks that are smaller than the current CU in the collocated reference picture. In the space-time motion vector prediction (STMVP) method, a motion vector of a sub-CU is recursively derived by using a temporal motion vector predictor and a Spatial neighboring motion vector. In some embodiments, in order to preserve a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame may be disabled.
2.1.1 example of Alternative Temporal Motion Vector Prediction (ATMVP)
In the ATMVP method, a Temporal Motion Vector Prediction (TMVP) method is modified by extracting a plurality of sets of motion information (including a motion vector and a reference index) from a block smaller than a current CU.
Fig. 10 shows an example of an ATMVP motion prediction procedure of the CU 1000. The ATMVP 1000 method predicts the motion vector of the sub-CU 1001 within the CU 1000 in two steps. The first step is to identify the corresponding block 1051 in the reference picture 1050 using a temporal vector. The reference picture 1050 is also referred to as a motion source picture. The second step is to divide the current CU 1000 into sub-CUs 1001 and obtain a motion vector and a reference index for each sub-CU from the block corresponding to each sub-CU.
In a first step, the reference picture 1050 and corresponding block are determined from motion information of spatially neighboring blocks of the current CU 1000. To avoid the repeated scanning process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU 1000 is used. The first available motion vector and its associated reference index are set as the index of the motion source picture and the time vector. In this way, the corresponding block (sometimes referred to as a collocated block) may be more accurately identified than the TMVP, where the corresponding block is always located at a lower right or center position relative to the current CU.
In a second step, by adding a time vector to the coordinates of the current CU, the corresponding block of the sub-CU 1051 is identified by the time vector in the motion source picture 1050. For each sub-CU, the motion information of its corresponding block (e.g., the smallest motion grid covering the center sample) is used to derive the motion information of the sub-CU. After the motion information of the corresponding nxn block is identified, it is converted into a reference index and a motion vector of the current sub-CU in the same manner as TMVP of HEVC, wherein motion scaling and other processes are also applicable. For example, the decoder checks whether a low delay condition (e.g., POC of all reference pictures of the current picture is smaller than POC of the current picture) is satisfied and possibly uses the motion vector MV x Predicting motion vectors MV for each sub-CU (e.g., motion vectors corresponding to reference picture list X) y (e.g., wherein X is equal to 0 or 1 andy is equal to 1-X).
2.1.2 example of space-time motion vector prediction (STMVP)
In the STMVP method, motion vectors of sub-CUs are recursively deduced in raster scan order. Fig. 11 shows an example of one CU with four sub-blocks and neighboring blocks. Consider an 8 x 8CU1100 that contains four 4 x 4 sub-CUs a (1101), B (1102), C (1103), and D (1104). Adjacent 4 x 4 blocks in the current frame are labeled a (1111), b (1112), c (1113), and d (1114).
The motion derivation of sub-CU a begins by identifying its two spatial neighbors. The first neighbor is an nxn block above sub CU a1101 (block c 1103). If the block c (1113) is not available or intra-coded, the other N blocks above sub-CU A (1101) are checked (from left to right, starting with block c 1113). The second neighbor is the block to the left of sub-CU a1101 (block b 1112). If block b (1112) is not available or is intra coded, then (starting from block b 1112, top down) the other blocks to the left of sub-CU A1101 are checked. The motion information obtained from neighboring blocks of each list is scaled to the first reference frame of the given list. Next, temporal motion vector prediction of sub-block a1101 is derived by following the same procedure as TMVP derivation specified in HEVC (Temporal Motion Vector Predictor, TMVP). Motion information for the collocated block at D1104 is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors are averaged separately for each reference list. The average motion vector is designated as the motion vector of the current sub-CU.
2.1.3 examples of sub-CU motion prediction mode Signaling
In some embodiments, the sub-CU mode is enabled as an additional Merge candidate and no additional syntax elements are needed to signal the mode. Two additional Merge candidates are added to the Merge candidate list for each CU to represent ATMVP mode and STMVP mode. In some embodiments, if the sequence parameter set indicates that ATMVP and STMVP are enabled, a maximum of seven Merge candidates may be used. The coding logic of the extra Merge candidate is the same as the Merge candidate in the HM, which means that for each CU in the P or B slice, two extra Merge candidates may require two more RD checks. In some embodiments, for example, in JEM, all bins (bins) of the Merge index are context coded by CABAC (context-based adaptive binary arithmetic coding). In other embodiments, for example, in HEVC, only the first bin is context coded, while the remaining bins are context bypass coded.
2.2 adaptive motion vector differential resolution
In some embodiments, when use_integer_mv_flag in the slice header is equal to 0, the motion vector difference (Motion Vector Difference, MVD) between the motion vector of the PU and the predicted motion vector is signaled in quarter luma samples. In JEM, locally adaptive motion vector resolution (Locally Adaptive Motion Vector Resolution, LAMVR) is introduced. In JEM, MVDs may be encoded in units of quarter-luminance samples, integer-luminance samples, or four-luminance samples. The MVD resolution is controlled at the Coding Unit (CU) level and a MVD resolution flag is conditionally signaled for each CU having at least one non-zero MVD component.
For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter-luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter-luma sample MV precision is not used, another flag is signaled to indicate whether integer-luma sample MV precision or four-luma sample MV precision is used.
When the first MVD resolution flag of a CU is zero or is not coded for the CU (meaning that all MVDs in the CU are zero), the quarter luma sample MV resolution is used for the CU. When a CU uses integer-luminance sample MV precision or four-luminance sample MV precision, the MVPs in the AMVP candidate list of the CU are rounded to the corresponding precision.
In the encoder, a CU level RD check is used to determine which MVD resolution to use for the CU. That is, three CU-level RD checks are performed for each MVD resolution. In order to accelerate the encoder speed, the following encoding scheme is applied in JEM.
During RD checking of a CU with normal quarter-luma sample MVD resolution, the motion information (integer luma sample accuracy) of the current CU is stored. The stored motion information (after rounding) is used as a starting point for further small range motion vector refinement during RD-checking for the same CU with integer luma samples and 4 luma sample MVD resolution, so that the time-consuming motion estimation process is not repeated three times.
Conditionally invoke RD check of CUs with 4 luma sample MVD resolution. For a CU, when the RD cost integer luma sample MVD resolution is much greater than the quarter luma sample MVD resolution, the RD check for the 4 luma sample MVD resolution of the CU is skipped.
2.3 example of higher motion vector storage accuracy
In HEVC, motion vector accuracy is one-quarter pixel (one-quarter luma samples and one-eighth chroma samples of 4:2:0 video). In JEM, the accuracy of the internal motion vector store and the Merge candidate increases to 1/16 pixel. Higher motion vector accuracy (1/16 pixel) is used for motion compensated inter prediction of CUs encoded in skip/Merge mode. For CUs encoded using normal AMVP mode, integer-pixel or quarter-pixel motion is used.
An SHVC upsampling interpolation filter having the same filter length and normalization factor as the HEVC motion compensation interpolation filter is used as the motion compensation interpolation filter for the additional fractional pixel positions. The chrominance component motion vector accuracy in JEM is 1/32 sample, and the additional interpolation filter for the 1/32 pixel fractional position is derived by using the average of the filters for two adjacent 1/16 pixel fractional positions.
2.4 example of overlapped block motion Compensation OBMC (Overlapped Block Motion Compensation)
In JEM, OBMC can be turned on and off using CU-level syntax. When OBMC is used in JEM, OBMC is performed on all motion compensated (Motion Compensation, MC) block boundaries except the right and lower boundaries of the CU. In addition, it is also applied to luminance and chrominance components. In JEM, MC blocks correspond to coded blocks. When a CU is encoded with sub-CU modes (including sub-CU Merge, affine, and FRUC modes), each sub-block of the CU is a MC block. To process CU boundaries in a unified manner, OBMC is performed at the sub-block level for all MC block boundaries, with the sub-block size set equal to 4 x 4, as shown in fig. 12A and 12B.
Fig. 12A shows sub-blocks at the CU/PU boundary, the shaded sub-blocks being the locations of the OBMC applications. Similarly, fig. 12B shows sub-blocks in ATMVP mode.
When OBMC is applied to the current sub-block, the motion vectors of the four connected neighboring sub-blocks (if available and different from the current motion vector) are also used to derive the prediction block of the current sub-block, in addition to the current motion vector. These multiple prediction blocks based on multiple motion vectors are combined to generate a final prediction signal for the current sub-block.
Representing a predicted block based on a motion vector of a neighboring sub-block as P N Where N indicates indexes of adjacent upper, lower, left, and right sub-blocks, and a prediction block based on a motion vector of a current sub-block is represented as P C . When P N Is based on the motion information of the adjacent sub-block containing the same motion information as the current sub-block, and is not selected from P N OBMC is performed. Otherwise, each P N Sample addition to P C In the same sample of (i.e., P) N Is added to P C . Weighting factors {1/4,1/8,1/16,1/32} are used for P N And weighting factors {3/4,7/8,15/16,31/32} are used for the PC. The exception is a small MC block (i.e., when the height or width of the coding block is equal to 4 or the CU is coded with a sub-CU mode), for which only P will be present N Is added to P C . In this case, the weighting factor {1/4,1/8} is used for P N And weighting factors {3/4,7/8} are used for P C . P for motion vector generation based on vertical (horizontal) neighboring sub-blocks N Will P N Samples in the same row (column) of (a) are added to P with the same weighting factor C
In JEM, for CUs of size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether OBMC is applied to the current CU. For CUs that are over 256 luma samples in size or are not encoded using AMVP mode, OBMC is applied by default. At the encoder, when OBMC is applied to the CU, its effect is taken into account during the motion estimation phase. The prediction signal formed by the OBMC using the motion information of the upper and left neighboring blocks is used to compensate the upper and left boundaries of the original signal of the current CU, and then a normal motion estimation process is applied.
2.5 example of Local Illumination Compensation (LIC)
The illumination compensation LIC is based on a linear model for illumination variation, using a scaling factor a and an offset b. And an encoding unit (CU) encoded for each inter mode adaptively enables or disables it.
When LIC is applied to a CU, a least squares error method is employed to derive parameters a and b by using neighboring samples of the current CU and their corresponding reference samples. Fig. 13 shows an example of adjacent samples for deriving parameters of an IC algorithm. More specifically, as shown in fig. 13, neighboring samples of sub-samples (2:1 sub-samples) of a CU and corresponding samples (identified by motion information of the current CU or sub-CU) in a reference picture are used. IC parameters are derived and applied to each prediction direction separately.
When a CU is encoded with the Merge mode, the LIC flag is copied from the neighboring block in a similar manner to the motion information copy in the Merge mode; otherwise, an LIC flag is signaled to the CU to indicate whether LIC is applied.
When LIC is enabled for pictures, an additional CU-level RD check is needed to determine whether LIC is applied to the CU. When LIC is enabled for CU, the sum of the de-Mean absolute differences (Mean-Removed Sum Of Absolute Difference, MR-SAD) and the sum of the de-Mean absolute hadamard transform differences (Mean-Removed Sum Of Absolute Hadamard-Transformed Difference, MR-SATD) are used instead of SAD and SATD, respectively for integer-pixel motion search and fractional-pixel motion search.
In order to reduce the coding complexity, the following coding scheme is applied in JEM.
The LIC is disabled for the entire picture when there is no apparent change in illumination between the current picture and its reference picture. To identify this, a histogram of the current picture and each reference picture of the current picture is calculated at the encoder. Disabling the LIC for the current picture if the histogram difference between the current picture and each reference picture of the current picture is less than a given threshold; otherwise, LIC is enabled for the current picture.
2.6 example of affine motion compensated prediction
In HEVC, only translational motion models are applied to motion compensated prediction (Motion Compensation Prediction, MCP). However, there may be a variety of movements of the camera and the object, such as zoom in/out, rotation, perspective movement, and/or other irregular movements. On the other hand, in JEM, simplified affine transformation motion compensation prediction is applied. FIG. 14 shows that the affine motion field of block 1400 is composed of two control point motion vectors V 0 And V 1 Examples described. The motion vector field (Motion Vector Field, MVF) of block 1400 is described by the following equation:
Figure BDA0004068675780000161
as shown in fig. 14, (v) 0x ,v 0y ) Is the motion vector of the upper left corner control point, (v) 1x ,v 1y ) Is the motion vector of the upper right corner control point.
To further simplify motion compensated prediction, sub-block based affine transformation prediction may be applied. The subblock size mxn is derived as follows:
Figure BDA0004068675780000162
here, mvPre is the motion vector score accuracy (e.g., 1/16 in JEM), (v 2x ,v 2y ) Is a motion vector of the lower left control point calculated according to equation 1. If desired, M and N can be adjusted downward to be divisors of w and h, respectively.
Fig. 15 shows an example of affine MVF for each sub-block of block 1500. To derive the motion vector for each mxn sub-block, the motion vector for the center sample for each sub-block is calculated according to equation 1 and rounded to a motion vector score accuracy (e.g., 1/16 in JEM). Then, a motion compensated interpolation filter is applied to generate a prediction for each sub-block using the derived motion vectors.
After MCP, the high accuracy motion vector for each sub-block is rounded and saved with the same accuracy as the normal motion vector.
In JEM, there are two affine motion modes: af_inter mode and af_merge mode. For CUs with width and height both greater than 8, the af_inter mode may be applied. Affine flags at CU level are signaled in the bitstream to indicate whether af_inter mode is used. In the AF INTER mode, adjacent block constructions are used with a pair of motion vectors { (v) 0 ,v 1 )|v 0 ={v A ,v B ,v c },v 1 ={v D ,v E Candidate list of }.
Fig. 16 shows an example of Motion Vector Prediction (MVP) of a block 1600 in AF INTER mode. As shown in fig. 16, v is selected from the motion vectors of sub-blocks A, B or C 0 . The motion vectors from neighboring blocks may be scaled according to the reference list. The motion vector from the neighboring block may also be scaled according to a relationship between a Picture Order Count (POC) of the reference for the neighboring block, a POC of the reference for the current CU, and a POC of the current CU. Selecting v from adjacent sub-blocks D and E 1 Is similar. If the number of candidate lists is less than 2, the list may be populated with motion vector pairs composed by copying each AMVP candidate. When the candidate list is greater than 2, the candidates may first be ordered according to neighboring motion vectors (e.g., based on the similarity of the two motion vectors in the candidate pair). In some embodiments, the first two candidates are retained. In some embodiments, a Rate Distortion (RD) cost check is used to determine which motion vector pair candidate to select as the control point motion vector prediction (Control Point Motion Vector Prediction, CPMUP VP) of the current CU. An index indicating the location of the CPMVP in the candidate list may be signaled in the bitstream. After determining the CPMVP of the current affine CU, affine motion estimation is applied and a control point motion vector is found (Control Point Motion Vector, CPMV). The difference between CPMV and CPMVP is then signaled in the bitstream.
When a CU is applied in af_merge mode, it reconstructs the block from valid neighborsThe first block encoded with affine pattern is obtained. Fig. 17A shows an example of a selection order of candidate blocks of the current CU 1700. As shown in fig. 17A, the selection order may be from left (1701), up (1702), up right (1703), down left (1704) to up left (1705) of the current CU 1700. Fig. 17B shows another example of a candidate block of the current CU 1700 in the af_merge mode. If the neighboring lower left block 1701 is encoded in affine mode, as shown in FIG. 17B, the motion vectors v of the upper left, upper right and lower left corners of the CU containing block A are derived 2 、v 3 And v 4 . And according to v 2 、v 3 And v 4 Calculating motion vector v of upper left corner of current CU 1700 0 . The motion vector v1 at the upper right of the current CU may be calculated accordingly.
CPMVv of the current CU is calculated according to the affine motion model in equation (1) 0 And v 1 Thereafter, the MVF of the current CU may be generated. In order to identify whether the current CU is encoded in the af_merge mode, an affine flag may be signaled in the bitstream when at least one neighboring block is encoded in the affine mode.
2.7 example of motion vector derivation (PMMVD) for pattern matching
The PMMVD mode is a special Merge mode based on the Frame-rate up-conversion (Frame-Rate Up Conversion, FRUC) method. With this mode, the motion information of the block is pushed on the decoder side, instead of signaling the motion information of the block.
When the Merge flag of the CU is true, the FRUC flag may be signaled to the CU. When the FRUC flag is false, the Merge index may be signaled and the regular Merge mode may be used. When the FRUC flag is true, additional FRUC mode flags may be signaled to indicate which method (e.g., bilateral matching or template matching) to use to derive motion information for the block.
At the encoder side, the decision as to whether or not to use FRUC Merge mode for the CU is based on RD cost selection made for the normal Merge candidate. For example, multiple matching patterns (e.g., bilateral matching and template matching) of the CU are verified by using RD cost selection. The matching pattern that results in the least cost is further compared to other CU patterns. If the FRUC match pattern is the most efficient pattern, then the FRUC flag is set to true for the CU and the relevant match pattern is used.
Typically, the motion derivation process in FRUC Merge mode has two steps: CU-level motion search is performed first, and then sub-CU-level motion refinement is performed. At the CU level, the original motion vector for the entire CU is derived based on bilateral matching or template matching. First, a MV candidate list is generated and the candidate that gives the smallest matching cost is selected as the starting point for further CU-level refinement. Then, a local search based on bilateral matching or template matching is performed near the start point. The MV result of the minimum matching cost is taken as the MV of the entire CU. Subsequently, the motion information is further refined at the sub-CU level, starting from the derived CU motion vector.
For example, the following derivation process is performed for w×hcu motion information derivation. In the first stage, the MVs of the whole W×HCU are derived. In the second stage, the CU is further divided into m×m sub-CUs. The calculation method of the value of M is as shown in (3), D is a predefined division depth, and is set to 3 by default in JEM. The MV for each sub-CU is then derived.
Figure BDA0004068675780000181
Fig. 18 shows an example of bilateral matching used in the Frame Rate Up Conversion (FRUC) method. The bilateral matching is used to derive the motion information of the current CU by finding the closest match between the two blocks along the motion trajectory of the current CU (1800) in two different reference pictures (1810, 1811). Under the assumption of a continuous motion trajectory, motion vectors MV0 (1801) and MV1 (1802) pointing to two reference blocks are proportional to temporal distances between a current picture and two reference pictures, for example, TD0 (1803) and TD1 (1804). In some embodiments, bilateral matching becomes a mirror-based bi-directional MV when the current picture 1800 is temporally between two reference pictures (1810, 1811) and the temporal distance from the current picture to the two reference pictures is the same.
Fig. 19 shows an example of template matching used in the Frame Rate Up Conversion (FRUC) method. Template matching is used to derive motion information for the current CU 1900 by finding the closest match between the template in the current picture 1910 (top and/or left neighboring block of the current CU) and the block in the reference picture (e.g., the same size as the template). Template matching may also be applied to AMVP mode in addition to FRUC Merge mode described above. As is done in both JEM and HEVC, AMVP has two candidates. New candidates are derived by a template matching method. If the candidate newly derived from the template matching is different from the first existing AMVP candidate, it is inserted at the very beginning of the AMVP candidate list, and then the list size is set to 2 (e.g., by removing the second existing AMVP candidate). When applied to AMVP mode, only CU-level search is applied.
MV candidates set at the CU level may include the following: (1) if the current CU is in AMVP mode, it is the original AMVP candidate, (2) all Merge candidates, (3) several MVs in the interpolated MV field (described later), and top and left neighboring motion vectors.
When bilateral matching is used, each valid MV of the Merge candidate may be used as input to generate MV pairs in the case of hypothetical bilateral matching. For example, in reference list a, one valid MV of the Merge candidate is (MVa, refa). Then, the reference pictures refb of its paired bilateral MVs are found in the other reference list B so that refa and refb are located on different sides of the current picture in time. If such refb is not available in reference list B, refb is determined to be a different reference than refa and its temporal distance to the current picture is the minimum in list B. After determining refb, MVb is derived by scaling MVa based on the temporal distance between the current picture refa and refb.
In some embodiments, four MVs from the interpolated MV field may also be added to the CU-level candidate list. More specifically, interpolation MVs at the locations (0, 0), (W/2, 0), (0, H/2) and (W/2, H/2) of the current CU are added. When FRUC is applied to AMVP mode, the original AMVP candidates are also added to the CU-level MV candidate set. In some embodiments, at the CU level, 15 MVs are added to the candidate list for AMVP CUs and 13 MVs are added to the candidate list for MergeCU.
MV candidates set at the sub-CU level include: (1) MVs determined from the CU level search, (2) adjacent MVs at top, left, upper left corner, and upper right corner, (3) scaled versions of collocated MVs from a reference picture, (4) one or more 4 ATMVP candidates (up to 4), (5) one or more STMVP candidates (e.g., up to 4). Scaled MVs from reference pictures are derived as follows. The reference pictures in both lists are traversed. The MV at the collocated position of the sub-CU in the reference picture is scaled to the reference of the starting CU level MV. ATMVP and STMVP candidates may be limited to only the first four. At the sub-CU level, one or more MVs (e.g., up to 17) are added to the candidate list.
And generating an interpolation MV field.Before encoding a frame, an interpolated motion field is generated for the entire picture based on the single side ME. The motion field may then later be used as a CU level or sub-CU level MV candidate.
In some embodiments, the motion field of each reference picture in the two reference lists is traversed at a 4 x 4 block level. Fig. 20 shows an example of single-sided Motion Estimation (ME) 2000 in the FRUC method. For each 4 x 4 block, if the motion associated with the block passes through the 4 x 4 block in the current picture and the block is not assigned any interpolation motion, the motion of the reference block is scaled to the current picture according to temporal distances TD0 and TD1 (in the same manner as the MV scaling of TMVP in HEVC) and the scaled motion is assigned to the block in the current frame. If no scaled MVs are assigned to a 4X 4 block, the motion of the block is marked as unusable in the interpolation motion field.
Interpolation and matching costs.Motion compensated interpolation is required when the motion vector points to a fractional sample position. To reduce complexity, bilinear interpolation may be used for bilateral matching and template matching instead of conventional 8-tap HEVC interpolation.
The computation of the matching costs is somewhat different at the different steps. When selecting candidates from the candidate set at the CU level, the matching cost may be the sum-absolute-difference (Absolute Sum Difference, SAD) of bilateral matching or template matching. After the starting MV is determined, the matching cost C of the bilateral matching for the sub-CU level search is calculated as follows:
Figure BDA0004068675780000201
/>
here, w is a weighting factor. In some embodiments, w may be set to 4.MV and MV s Indicating the current MV and the starting MV, respectively. The SAD can still be used as a matching cost for template matching for sub-CU level searches.
In FRUC mode, MVs are derived by using luminance samples only. The derived motion will be used for both luminance and chrominance of the MC inter prediction. After the MV is determined, the final MC is performed using an 8-tap interpolation filter for luminance and a 4-tap interpolation filter for chrominance.
MV refinement is a pattern-based MV search, with bilateral matching costs or template matching costs as the standard. In JEM, two search modes are supported-unlimited center biased diamond search (Unrestricted Center-Biased Diamond Search, UCBDS) and adaptive cross search, MV refinement at CU level and sub-CU level, respectively. For both CU and sub-CU level MV refinement, the MV is directly searched with quarter-luma sample MV precision, and then eighth-luma sample MV refinement. The search range for MV refinement for the CU and sub-CU steps is set equal to 8 luma samples.
In bilateral matching Merge mode, bi-prediction is applied because the motion information of a CU is derived based on the closest match between two blocks along the current CU's motion trajectory in two different reference pictures. In the template matching Merge mode, the encoder may select for the CU from among unidirectional prediction in list 0, unidirectional prediction in list 1, or bi-prediction. The template matching cost may be chosen based on:
if cosbi < = factor min (cost 0, cost 1)
Then bi-directional prediction is used;
otherwise, if cost0< = cost1
Unidirectional prediction in list 0 is used;
otherwise the first set of parameters is selected,
using unidirectional predictions in list 1;
where cost0 is the SAD of the list 0 template match, cost1 is the SAD of the list 1 template match, and cost Bi is the SAD of the bi-prediction template match. For example, when the factor value is equal to 1.25, this means that the selection process is biased towards bi-prediction. Inter prediction direction selection may be applied to the CU-level template matching process.
2.8 examples of bidirectional optical flow (BIO)
In BIO, motion compensation is first performed to generate a first prediction (in each prediction direction) of the current block. The first prediction is used to push the spatial gradient, temporal gradient, and optical flow of each sub-block/pixel within the block, which is then used to generate a second prediction, e.g., a final prediction of the sub-block/pixel. Details are described below.
The Bi-directional optical flow (Bi-directional Optical flow, BIO) method is a sample-wise motion refinement that is performed on top of block-wise motion compensation for Bi-directional prediction. In some embodiments, sample-level motion refinement does not use signaling.
Set I (k) For block motion compensation, reference k (k=0, 1) luminance value, and
Figure BDA0004068675780000211
respectively is I (k) A horizontal component and a vertical component of the gradient. Assuming that the optical flow is valid, the motion vector field (v x ,v y ) Given by the formula:
Figure BDA0004068675780000212
combining the optical flow equation with Hermite interpolation of each sample motion trail to obtain a unique third-order polynomial, wherein the third-order polynomial finally matches the function value I (k) And derivatives thereof
Figure BDA0004068675780000213
Both of which are located in the same plane. The value of the third order polynomial at t=0 is the BIO prediction:
Figure BDA0004068675780000214
FIG. 21 illustrates an example optical flow trace in a bi-directional optical flow (BIO) method. Here, τ 0 And τ 1 Representing the distance to the reference frame as shown in fig. 21. POC calculation distance τ based on Ref0 and Ref1 0 And τ 1 :τ 0 =poc (current) -POC (Ref 0), τ 1 =poc (Ref 1) -POC (current). If both predictions are from the same temporal direction (both from the past or both from the future), sign is different, i.e., τ 0 ·τ 1 And < 0. In this case, only if the predictions are not from the same instant (i.e., τ 0 ≠τ 1 ) BIO is applied only when both reference regions have non-zero motion (MVx 0 ,MVy 0 ,MVx 1 ,MVy 1 Not equal to 0) and the block motion vector is proportional to the temporal distance (MVx 0 /MVx 1 =MVy 0 /MVy 1 =-τ 01 )。
Motion vector field (v) is determined by minimizing the difference delta between the values in points a and B x ,v y ). Fig. 9 shows an example of the intersection of a motion trajectory and a reference frame plane. The model uses only the first linear term of the local taylor expansion of Δ:
Figure BDA0004068675780000221
all values in the above equation depend on the sample position, denoted (i ', j'). Assuming that the motion is uniform in the locally surrounding area, delta is minimized within a square window Ω of (2m+1) x (2m+1) centered at the current predicted point, where M equals 2:
Figure BDA0004068675780000222
for this optimization problem, JEM uses a simplified method, with minimization first in the vertical direction and then in the horizontal direction. The following results therefrom:
Figure BDA0004068675780000223
Figure BDA0004068675780000224
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004068675780000225
to avoid division by zero or very small values, regularization parameters r and m may be introduced in equations 9 and 10.
r=500·4 d-8 (12)
m=700·4 d-8 (13)
Where d is the bit depth of the video samples.
In order to keep the memory access of the BIO the same as conventional bi-predictive motion compensation, all prediction and gradient values I are calculated only for positions within the current block (k) ,
Figure BDA0004068675780000226
Fig. 22A shows an example of an access location external to block 2200. As shown in fig. 22A, in equation (9), a (2m+1) × (2m+1) square window Ω centered on the current prediction point on the boundary of the prediction block needs to access a position outside the block. In JEM, I outside the block (k) ,
Figure BDA0004068675780000227
Is set equal to the most recently available value within the block. This may be implemented, for example, as a fill region 2201, as shown in fig. 22B.
With BIO, the motion field can be refined for each sample. To reduce computational complexity, block-based BIO designs are used in JEM. Motion refinement may be calculated based on 4 x 4 blocks. In block-based BIO, s in equation 9 for all samples in a 4×4 block can be aggregated n Then s n Used to derive the BIO motion vector offset for a 4 x 4 block. More specifically, the following formula may be used for block-based BIO derivation:
Figure BDA0004068675780000231
here b k Representing the sample set of the kth 4 x 4 block belonging to the prediction block. S in equations 9 and 10 n Replaced by ((s) n,bk )>>4) To derive an associated motion vector offset.
In some scenarios, the MV group (MV region) of the BIO may be unreliable due to noise or irregular motion. Thus, in BIO, the size of the MV bolus is thresholded. The threshold is determined based on whether the reference pictures of the current picture are all from one direction. For example, if all the reference pictures of the current picture come from one direction, the value of the threshold is set to 12×2 14-d The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, it is set to 12×2 13-d
The gradient of the BIO may be calculated simultaneously using motion compensated interpolation using operations consistent with the HEVC motion compensation process, e.g., 2D separable Finite Impulse Response (FIR). In some embodiments, the input to the 2D separable FIR is the same reference frame samples as the motion compensation process and the fractional position (fracX, fracY) according to the fractional portion of the block motion vector. For horizontal gradients
Figure BDA0004068675780000232
First a bisfiltered s vertical interpolation signal corresponding to the fractional position fracY with the descaled offset d-8 is used. A gradient filter bisafiltg is then applied in the horizontal direction, which bisafiltg corresponds to the fractional position fracX with the descaled offset 18-d. For vertical gradients +.>
Figure BDA0004068675780000233
The gradient filter is first applied vertically using a bisfilterg corresponding to the fractional position fracY with the descaled offset d-8. Then use BIOfile in the horizontal directionrS performs a signal shift, the BIOfileS corresponding to the fractional position fracX with the descaled offset 18-d. The length of the interpolation filter for gradient computation, bisiltg, and for signal displacement, bisilts, may be short (e.g., 6 taps) to maintain reasonable complexity. Table 1 shows an example of a filter that can be used for gradient computation of different fractional positions of a block motion vector in a BIO. Table 2 shows an example of an interpolation filter that may be used for prediction signal generation in BIO.
Table 1: exemplary Filter for gradient computation in BIO
Fractional pixel location Gradient interpolation filter (BIOfilterG)
0 {8,-39,-3,46,-17,5}
1/16 {8,-32,-13,50,-18,5}
1/8 {7,-27,-20,54,-19,5}
3/16 {6,-21,-29,57,-18,5}
1/4 {4,-17,-36,60,-15,4}
5/16 {3,-9,-44,61,-15,4}
3/8 {1,-4,-48,61,-13,3}
7/16 {0,1,-54,60,-9,2}
1/2 {-1,4,-57,57,-4,1}
Table 2: exemplary interpolation Filter for prediction Signal Generation in BIO
Fractional pixel location Interpolation filter of prediction signal (BIOfilter S)
0 {0,0,64,0,0,0}
1/16 {1,-3,64,4,-2,0}
1/8 {1,-6,62,9,-3,1}
3/16 {2,-8,60,14,-5,1}
1/4 {2,-9,57,19,-7,2}
5/16 {3,-10,53,24,-8,2}
3/8 {3,-11,50,29,-9,2}
7/16 {3,-11,44,35,-10,3}
1/2 {3,-10,35,44,-11,3}
In JEM, BIO can be applied to all bi-predicted blocks when the two predictions come from different reference pictures. The BIO may be disabled when Local Illumination Compensation (LIC) is enabled for the CU.
In some embodiments, OBMC is applied to the block after the normal MC process. To reduce computational complexity, no BIO may be applied during OBMC. This means that BIO is applied to the MC process of a block only when its own MV is used, and is not applied to the MC process when MVs of neighboring blocks are used in the OBMC process.
2.9 example of decoder-side motion vector refinement (DMVR)
In the bi-prediction operation, for prediction of one block region, two prediction blocks formed using a Motion Vector (MV) of list0 and an MV of list1, respectively, are combined to form a single prediction signal. In the Decoder-side motion vector refinement (DMVR) method, two motion vectors of bi-prediction are further refined by a bilateral template matching process. Bilateral template matching is applied in the decoder to perform a distortion-based search between the bilateral template and reconstructed samples in the reference picture in order to obtain refined MVs without the need to transmit additional motion information.
In DMVR, the bilateral templates are generated as weighted combinations (i.e., averages) of two prediction blocks from the original MV0 of list 0 and MV1 of list 1, respectively, as shown in fig. 23. The template matching operation includes calculating a cost metric between the generated template and sample regions (around the original prediction block) in the reference picture. For each of the two reference pictures, the MV that yields the smallest template cost is considered as the updated MV for the list to replace the original MV. In JEM, nine MV candidates are searched for each list. The nine MV candidates include an original MV and 8 surrounding MVs having one luminance sample offset in the horizontal or vertical direction or both directions from the original MV. Finally, two new MVs, i.e., MV0 'and MV1' as shown in fig. 23, are used to generate the final bi-prediction result. The Sum of Absolute Differences (SAD) is used as a cost metric.
DMVR is applied to the Merge mode of bi-prediction, where one MV is from a past reference picture and another MV is from a future reference picture without transmitting additional syntax elements. In JEM, DMVR is not applied when LIC, affine motion, FRUC, or sub-CU Merge candidates are enabled for the CU.
Example of 3CABAC modification
In JEM, compared to the design in HEVC, CABAC contains the following three main variations:
context modeling for modification of transform coefficients;
a multi-hypothesis probability estimate with context-dependent update rate;
for adaptive initialization of context models.
3.1 examples of context modeling for transform coefficients
In HEVC, transform coefficients of a coded block are encoded using non-overlapping sets of Coefficients (CGs), and each CG contains coefficients of a 4 x 4 block of the coded block. The CG within the encoded block and the transform coefficients within the CG are encoded according to a predefined scan order. Encoding of transform coefficient levels of a CG having at least one non-zero transform coefficient may be divided into a plurality of scan channels. In the first channel, a first binary symbol (indicated by bin0, also called a signalcoeff flag, indicating that the size of the coefficient is greater than 0) is encoded. Next, two scan channels (denoted by bin1 and bin2, respectively, also referred to as coeff_abs_greater1_flag and coeff_abs_greater2_flag) for context-encoding the second/third binary symbol (bin) may be applied. Finally, the scan channels used to encode the symbol information and the remaining values of the coefficient levels (also referred to as coeff abs level remaining) are recalled more than twice, if necessary. Only the binary symbols in the first three scan channels are encoded in the normal mode and these binary symbols are referred to as normal binary symbols in the following description.
In JEM, the context modeling of conventional binary symbols is changed. When encoding the binary symbol i in the i-th scan channel (i is 0, 1, 2), the context index depends on the value of the i-th binary symbol of the previously encoded coefficient in the neighborhood covered by the local template. Specifically, the context index is determined based on the sum of the ith binary symbol of the neighboring coefficient.
As shown in fig. 24, the local template contains up to five spatially adjacent transform coefficients, where x represents the position of the current transform coefficient and xi (i is 0 to 4) indicates its five neighbors. In order to capture the characteristics of transform coefficients at different frequencies, one coding block may be divided into up to three regions, and the division method is fixed regardless of the coding block size. For example, when bin0 of a luminance transform coefficient is encoded, as shown in fig. 24, one encoded block is divided into three regions marked with different colors, and context indexes allocated to each region are juxtaposed. The luminance and chrominance components are treated in a similar manner, but with a separate set of context models. Furthermore, the context model selection of bin0 (e.g., active marker) of the luma component also depends on the transform size.
3.2 example of multiple hypothesis probability estimation
Binary arithmetic encoder estimates P based on two probabilities associated with each context model 0 And P 1 The "multi-hypothesis" probability update model is applied and updated independently at different adaptive rates as follows:
Figure BDA0004068675780000261
wherein the method comprises the steps of
Figure BDA0004068675780000262
And->
Figure BDA0004068675780000263
Representing the probabilities before and after decoding the dibits, respectively. Variable M i (4, 5,6, 7) is a parameter controlling the probability update rate of the context model with index equal to i; and k represents the accuracy of the probability (here equal to 15).
The probability estimate P for interval subdivision in a binary arithmetic encoder is the mean of the estimates from two hypotheses:
P=(P 0 new +P 1 new )/2 (16)
in JEM, a parameter M used in equation (15) controlling the probability update speed of each context model is allocated as follows i Is a value of (2).
On the encoder side, the encoding dibits associated with each context model are recorded. After encoding a stripe, the computation uses a different M for each context model with index equal to i i The rate cost of the value (4, 5,6, 7) and the one that provides the smallest rate cost is selected. For simplicity, this selection process is only performed when a new combination of stripe type and stripe level quantization parameters is encountered.
Signaling a 1-bit flag to indicate M for each context model i i Whether different from the default value of 4. When the flag is 1, two bits are used to indicate M i Whether equal to 5, 6 or 7.
3.3 initialization of context models
Instead of using a fixed table for context model initialization in HEVC, the initial probability state of the context model for inter-coded slices may be initialized by copying states from previously coded pictures. More specifically, after encoding the centrally located CTU of each picture, the probability states of all context models are stored to be used as initial states of the corresponding context models on subsequent pictures. In JEM, the initial state set for each inter-coded slice is copied from the stored state of the previously encoded picture with the same slice type and same slice level QP as the current slice. This lacks robustness loss but is used for coding efficiency experimental purposes in current JEM schemes.
4 examples of related embodiments and methods
Methods related to the disclosed technology include extended LAMVR where supported motion vector resolutions range from 1/4 pixel to 4 pixel (1/4 pixel, 1/2 pixel, 1-pixel, 2-pixel, and 4 pixel). When the MVD information is signaled, information on the resolution of the motion vector is signaled at the CU level.
Depending on the resolution of the CU, the Motion Vector (MV) and Motion Vector Prediction (MVP) of the CU are adjusted. If the applied motion vector resolution is denoted as R (R may be 1/4, 1/2, 1, 2, 4), then MV (MV x 、MV y ) And MVP (MVP) x 、MVP y ) The expression is as follows:
(MV x ,MV y )=(Round(MV x /(R*4))*(R*4),Round(MV y /(R*4))*(R*4)) (17)
(MVP x ,MVP y )=(Round(MVP x /(R*4))*(R*4),Round(MVP y /(R*4))*(R*4)) (18)
since both motion vector prediction and MV are adjusted by adaptive resolution, MVD (MVD x 、MVD y ) Also aligned with the resolution and signaled according to the resolution as follows:
(MVD x ,MVD y )=((MV x –MVP x )/(R*4),(MV y –MVP y )/R*4)) (19)
in this proposal, a motion vector resolution index (MVR index) indicates a MVP index as well as a motion vector resolution. As a result, the proposed method has no MVP index signaling. The following table shows what each value of the MVR index represents.
Table 3: examples of MVR index representations
Figure BDA0004068675780000271
Figure BDA0004068675780000281
In the case of bi-prediction, the AMVR has 3 modes for each resolution. The AMVR Bi-Index (Bi-Index) indicates whether to signal the MVDx, MVDy for each reference list (list 0 or list 1). An example definition of an AMVR bi-directional index is as follows.
Table 4: examples of AMVP bidirectional index
AMVR bidirectional index List 0 (MVD x ,MVD y ) List 1 (MVD x ,MVD y )
0 Signaling notification Signaling notification
1 Without signalling Signaling notification
2 Signaling notification Without signalling
5. Disadvantages of the prior embodiments
In one existing implementation using BIO, the reference blocks/sub-blocks in List 0 (by refb lk 0) and the reference block/sub-block (refblk 1) in list 1, by (v) x ,v y ) It means that only for motion compensation of the current block/sub-block, not for motion prediction, deblocking, OBMC, etc. of future coded blocks, which may be inefficient. For example, (v) may be generated for each sub-block/pixel of the block x ,v y ) And equation (7) may be used to generate a second prediction of the sub-block/pixel. However, (v) x ,v y ) This may also be inefficient without motion compensation for sub-blocks/pixels.
In another existing implementation that uses DMVR and BIO for bi-predictive PU, first, DMVR is performed. Thereafter, the motion information of the PU is updated. Then, BIO is performed using the updated motion information. That is, the input of the BIO depends on the output of the DMVR.
In yet another existing embodiment using OBMC, for AMVP mode, for a small block (width x height < =256), it is determined at the encoder whether OBMC is enabled, and the decoder is signaled. This increases the complexity of the encoder. Meanwhile, for a given block/sub-block, when OBMC is enabled, it is always applied to both luminance and chrominance, which may lead to a reduction in coding efficiency.
In yet another prior embodiment using the AF INTER mode, the MVD needs to be encoded, however, it can only be encoded with 1/4 pixel precision, which can be inefficient.
6. Example method for two-step inter prediction for visual media coding
Embodiments of the presently disclosed technology overcome the shortcomings of existing implementations and provide additional solutions to provide video coding with higher coding efficiency. Based on the disclosed techniques, two-step inter prediction may enhance existing and future video coding standards, as set forth in the examples described below for the various embodiments. The examples of the disclosed technology provided below illustrate general concepts and are not meant to be construed as limiting. In the examples, various features described in these examples may be combined unless explicitly indicated to the contrary.
With respect to terminology, fromThe reference pictures of the current pictures of list 0 and list 1 are denoted Ref0 and Ref1, respectively. Denoted τ 0 =poc (current) -POC (Ref 0), τ 1 POC (Ref 1) -POC (current), and the reference blocks from the current blocks of Ref0 and Ref1 are denoted refblk0 and refblk1, respectively. For a sub-block in the current block, the original MVs in refblk0 that point to its corresponding sub-block of refblk1 are denoted as (v) x ,v y ). The MVs of the sub-blocks in Ref0 and Ref1 are respectively defined by (mvL 0 x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) And (3) representing. From (v) x ,v y ) Representing the MV derived from the original MV in the BIO. As described in this patent document, the method for motion prediction based on updated motion vectors can be extended to existing and future video coding standards.
Example 1 MV (v x ,v y ) And MV (mvLX) x ,mvLX y ) Where x=0 or 1, should be scaled to the same precision prior to the addition operation, such as prior to performing the techniques in example 1 (e) and/or example 2.
(a) In one example, the target precision (to be scaled to) is set to MV (v x ,v y ) And MV (mvLX) x ,mvLX y ) Higher (for better performance)/lower (for lower complexity) accuracy. Alternatively, the target precision (to be scaled to) is set to a fixed value (e.g., 1/32 pixel precision) regardless of the precision of the two MVs.
(b) In one example, the original MV (mvLX x ,mvLX y ) Scaling to a higher precision before the addition operation, for example, it may scale from 1/4 pixel precision to 1/16 pixel precision. In this case, mvLX x =sign(mvLX x )*(abs(mvLX x )<<N),mvLX y
sign(mvLX y )*(abs(mvLX y )<<N), wherein the function sign (·) returns the sign of the input parameter (as shown below), and the function abs (·) returns the absolute value of the input parameter, and n=
log2 (curr_mv_precision/targ_mv_precision), and curr_mv_precision and targ_mv_precision are the current MV precision and target MV precision, respectively. For example, if MV is scaled from 1/4 pixel precision to 1/16 pixel precision, n=log2 ((1/4)/(1/16))=2.
Figure BDA0004068675780000291
(i) Alternatively, mvLX x =mvLX x <<N,mvLX y =mvLX y <<N。
(ii) Alternatively, mvLX x =mvLX x <<(N+K),mvLX y =mvLX y <<
(N+K)。
(iii) Alternatively, mvLX x =sign(mvLX x )*(abs(mvLX x )<<(N+K)),mvLX y =sign(mvLX y )*(abs(mvLX y )<<(N+K))。
(iv) Similarly, if needed, MV (v x ,v y ) Scaling to a lower precision, a scaling procedure as specified in example 1 (d) may be applied.
(c) In one example, if MV (v x ,v y ) Is lower/higher than MV (mvLX) x ,mvLX y ) The MV (v) x ,v y ) Scaling to finer/coarser precision. For example, MV (mvLX x ,mvLX y ) With 1/16 pixel precision, then MV (v x ,v y ) And also scaled to 1/16 pixel precision.
(d) If necessary (v) x ,v y ) Right shifting (i.e., scaling to lower precision) by N bits to obtain a sum (mvLX) x ,mvLX y ) With the same precision, v x =(v x +offset)>>N,v y =(v y +offset)>>N, where, for example, offset=1<<(N–1)。
(i) Alternatively, v x =sign(v x )*((abs(v x )+offset)>>N),v y =sign(v y )*((abs(v y )+offset)>>N)。
(ii) Similarly, if necessary, MV (mvLX x ,mvLX y ) Scaling to a higher precision, the scaling procedure described above as specified in example 1 (b) may be applied.
(e) In one example, a MV (v x ,v y ) Original MV (mvLX) scaled and added to current block/sub-block x ,mvLX y ) (x=0 or 1). The updated MV is calculated as: mvL0' x =-v x *(τ 0 /(τ 01 ))+mvL0 x ,mvL0’ y =-v y *(τ 0 /(τ 01 ))+mvL0 y And mvL1' x =v x *(τ 1 /(τ 01 ))+mvL1 x ,mvL1’ y =v y *(τ 1 /(τ 01 ))+mvL1 y
(i) In one example, the updated MVs are used for future motion prediction (as in AMVP, merge, and affine modes), deblocking, OBMC, and the like.
(ii) Alternatively, the updated MVs may only be used for motion prediction of its non-immediately following CU/PU in decoding order.
(iii) Alternatively, the updated MV may only be used as TMVP in AMVP, merge or affine mode.
(f) If necessary (v) x ,v y ) Right shifting (i.e., scaling to lower precision) by N bits to obtain a sum (mvLX) x ,mvLX y ) With the same precision, v x =(v x +offset)>>(N+K),v y =(v y +offset)>>(n+k), wherein, for example, offset=1<<(N+K-1). K is an integer, for example, K is equal to 1, 2, 3, -2, -1 or 0.
(i) Alternatively, v x =sign(v x )*((abs(v x )+offset)>>(N+K)),v y
sign(v y )*((abs(v y )+offset)>>(n+k)), wherein, for example, offset=1<<(N+
K–1)。
Example 2.Instead of taking into account POC distance (e.g., at τ as described above 0 And τ 1 In the calculation of (c), the scaling method of the MV called in the BIO process can be simplified.
(a)mvL0’ x =-v x /S 0 +mvL0 x ,,mvL0’ y =-v y /S 0 +mvL0 y And/or mvL1' x =v x /S 1 +mvL1 x ,mvL1’ y =v y /S 1 +mvL1 y . In one example, S 0 And/or S 1 Set to 2. In one example, under certain conditions (such as τ 0 >0 and τ 1 >0) Invoking it.
(i) Alternatively, the offset may be added during the segmentation process. For example, mvL0' x =(-
v x +offset0)/S 0 +mvL0 x ,mvL0’ y =-(v y +offset0)/S 0 +mvL0 y A kind of electronic device
mvL1’ x =(v x +offset1)/S 1 +mvL1 x ,mvL1’ y =(v y +offset1)/S 1 +mvL1 y
In one example, offset0 is set to S 0 And set offset1 to S 1 /2。
(ii) In one example, mvL0' x =((-v x +1)>>1)+mvL0 x ,mvL0’ y =(-
(v y +1)>>1)+mvL0 y And/or mvL1' x =((v x +1)>>1)+mvL1 x ,mvL1’ y
((v y +1)>>1)+mvL1 y
(b)mvL0’ x =-SF 0 *v x +mvL0 x ,mvL0’ y =-v y *SF 0 +mvL0 y And/or mvL1' x =-SF 1 *v x +mvL1 x ,mvL1’ y =-SF 1 *v y +mvL1 y . In one example, SF is 0 Set to 2, and/or SF is set 1 Set to 1. In one example, under certain conditions (such as
τ 0 >0 and τ 1 <0 and τ 0 >|τ 1 I) calls it as shown in (b) of fig. 25.
(c)mvL0’ x =SFACT 0 *v x +mvL0 x ,mvL0’ y =SFACT 0 *v y +mvL0 y And/or mvL1' x =SFACT 1 *v x +mvL1 x ,mvL1’ y =SFACT 1 *v y +mvL1 y . In one example, SFACT is used 0 Set to 1, and/or SFACT 1 Set to 2. In one example, under certain conditions (such as τ 0 >0 and τ 1 <0 and τ 0 <|τ 1 I) calls it as shown in (c) of fig. 25.
Example 3 when τ 0 >0 and τ 1 >At 0, (v) x ,v y ) Is the sum of the derivatives (mvLX) x ,mvLX y ) The updating of (c) can be done together to maintain high accuracy.
(a) In one example, if desired (v) x ,v y ) Right shifting (i.e., scaling to lower precision) by N bits to obtain a sum (mvLX) x ,mvLX y ) With the same precision, mvL0' x =((-v x +offset)>>(N
+1))+mvL0 x ,mvL0’ y =((-v y +offset)>>(N+1))+mvL0 y ,mvL1’ x =((v x
+offset)>>(N+1))+mvL1 x ,mvL1’ y =((v y +offset)>>(N+1))+mvL1 y Wherein, for example, offset=1<<N.
(b) In one example, if desired (v) x ,v y ) Right shifting (i.e., scaling to lower precision) by N bits to obtain a sum (mvLX) x ,mvLX y ) With the same precision, mvL0' x =((-v x +offset)>>(N
+K+1))+mvL0 x ,mvL0’ y =((-v y +offset)>>(N+K+1))+mvL0 y ,mvL1’ x
=((v x +offset)>>(N+K+1))+mvL1 x ,mvL1’ y =((v y +offset)>>(N+K+
1))+mvL1 y Wherein, for example, offset=1<<(N+K). K is an integer, for example, K is equal to 1, 2, 3, -2, -1 or 0.
(c) Alternatively, mvL0' x =-sign(v x )*((abs(v x )+offset)>>(N+1))+
mvL0 x ,mvL0’ y =-sign(v y )*((abs(v y )+offset)>>(N+1))+mvL0 y ,mvL1’ x
=sign(v x )*((abs(v x )+offset)>>(N+1))+mvL1 x ,mvL1’ y =sign(v y )*
((abs(v y )+offset)>>(N+1))+mvL1 y
(d) Alternatively, mvL0' x =-sign(v x )*((abs(v x )+offset)>>(N+K+1))
+mvL0 x ,mvL0’ y =-sign(v y )*((abs(v y )+offset)>>(N+K+1))+mvL0 y ,mvL1’ x =sign(v x )*((abs(v x )+offset)>>(N+K+1))+mvL1 x ,mvL1’ y
sign(v y )*((abs(v y )+offset)>>(N+K+1))+mvL1 y Wherein, for example, offset=
1< < (n+k). K is an integer, for example, K is equal to 1, 2, 3, -2, -1 or 0.
Example 4.The clipping operation may be further applied to updated MVs employed in BIO and/or DMVR or other kinds of coding methods that may require updated MVs.
(a) In one example, updated MVs are cropped in the same way as other conventional MVs, e.g., within a certain range compared to picture boundaries.
(b) Alternatively, the updated MVs are clipped within a specific range (or multiple ranges for different sub-blocks) as compared to MVs used in the MC process. That is, the difference between the MVs used in the MC and the updated MVs is clipped within a certain range (or multiple ranges for different sub-blocks).
Example 5.The use of updated MVs invoked in BIO and/or other kinds of coding methods that may require updated MVs may be constrained.
(a) In one example, the updated MVs are used for future motion prediction (as in AMVP, merge, and/or affine modes), deblocking, OBMC, and the like. Alternatively, the updated MV may be used for the first module, while the original MV may be used for the second module. For example, the first module is motion prediction and the second module is deblocking.
(i) In one example, future motion prediction refers to motion prediction in a block to be encoded/decoded after a current block in a current picture or slice.
(ii) Alternatively, future motion prediction refers to motion prediction in a picture or slice to be encoded/decoded after the current picture or slice.
(b) Alternatively, the updated MVs may only be used for motion prediction of its non-immediately following CU/PU in decoding order.
(c) The updated MVs are not applied to the motion prediction of their next CU/PU in decoding order.
(d) Alternatively, the updated MVs may only be used as predictors for encoding subsequent pictures/slices, such as TMVP in AMVP, and/or Merge and/or affine modes.
(e) Alternatively, the updated MVs may only be used as predictors for encoding subsequent pictures/slices, e.g. ATMVP and/or STMVP, etc.
Example 6.In one example, a two-step inter prediction procedure is proposed, wherein a first step is performed to generate some intermediate predictions (first predictions) based on signaled/derived motion information associated with the current block, and a second step is performed to derive the final of the current block based on updated motion information that may depend on the intermediate predictionsPrediction (second prediction).
(a) In one example, the BIO process (i.e., using signaled/derived motion information that is used to generate a first prediction and spatial gradient, temporal gradient, and optical flow for each sub-block/pixel within the block) is used only to derive updated MVs as specified in example 1 (and equation (7) is not applied to generate a second prediction), then motion compensation is performed using the updated MVs and a second prediction (i.e., a final prediction) for each sub-block/pixel within the block is generated.
(b) In one example, a different interpolation filter may be used in the first or/and second step than the interpolation filter of the inter-coded block encoded without this approach to reduce memory bandwidth.
(i) In one example, a shorter tap filter (e.g., a 6 tap filter, a 4 tap filter, or a bilinear filter) may be used.
(ii) Alternatively, the filters (such as filter taps, filter coefficients) utilized in the first/second steps may be predefined.
(iii) Alternatively, in addition, the filter taps selected for the first and/or second step may depend on coding information, such as block size/block shape (square, non-square, etc)/stripe type/prediction direction (unidirectional or bi-predictive or multi-hypothesis, forward or backward).
(iv) Alternatively, in addition, different blocks may select different filters for the first/second steps. In one example, one or more candidate sets for the plurality of filters may be predefined or signaled. The block may be selected from the candidate set. The selected filter may be indicated by a signaled index or may be derived on the fly without signaling.
(c) In one example, only integer MVs are used when generating the first prediction, and no interpolation filtering process is applied in the first step.
(i) In one example, the fractional MVs are rounded to the nearest integer MVs.
(1) If there is more than one nearest integer MV, the fractional MVs are rounded to smaller nearest integer MVs.
(2) If there is more than one nearest integer MV, the fractional MVs are rounded to the larger nearest integer MVs.
(3) If there is more than one nearest integer MV, the fractional MVs are rounded to the nearest MVs that are closer to zero.
(ii) In one example, the fractional MV is rounded to the nearest integer MV that is not less than the fractional MV.
(iii) In one example, the fractional MV is rounded to the nearest integer MV that is not greater than the fractional MV.
(d) The use of this method may be signaled in SPS, PPS, slice header, CTU or CU or CTU group.
(e) The use of such a method may also depend on coding information such as block size/block shape (square, non-square, etc)/slice type/prediction direction (unidirectional or bi-predictive or multi-hypothesis, forward or backward).
(i) In one example, such a method may be automatically disabled under certain conditions, for example, when the current block is encoded with affine patterns.
(ii) In one example, this approach may be applied automatically under certain conditions, such as when the block is encoded with bi-prediction and the block size is greater than a threshold (e.g., more than 16 samples).
Example 7.In one example, it is proposed that the reference block (or prediction block) may be modified first before calculating the temporal gradient in the BIO, and the calculation of the temporal gradient is based on the modified reference block.
(a) In one example, the mean is removed for all reference blocks.
(i) For example, for reference block X (x=0 or 1), first, a mean value (represented by MeanX) is calculated for the block, and then MeanX is subtracted from each pixel in the reference block.
(ii) Alternatively, for a different reference picture list, a decision may be made whether to remove the mean.
For example, for one reference block/sub-block, the mean is removed before the time gradient is calculated, while for another reference block/sub-block, the mean is not removed.
(iii) Alternatively, a different reference block (e.g., 3 or 4 reference blocks utilized in multi-hypothesis prediction) may choose whether to modify first.
(b) In one example, the mean is defined as the average of the selected samples in the reference block.
(c) In one example, all pixels in reference block X or sub-blocks of reference block X are used to calculate MeanX.
(d) In one example, only a portion of the pixels in reference block X or sub-blocks of the reference block are used to calculate MeanX. For example, only pixels per second row/column are used.
(i) Alternatively, in an example, meanX is calculated using only every fourth row/column of pixels.
(ii) Alternatively, meanX is calculated using only four corner pixels.
(iii) Alternatively, meanX is calculated using only four corner pixels and a center pixel, e.g., a pixel at position (W/2, H/2), where w×h is the reference block size.
(e) In one example, the reference block may be first filtered before being used to derive the temporal gradient.
(i) In one example, a smoothing filtering method may be first applied to the reference block.
(ii) In one example, pixels at block boundaries are first filtered.
(iii) In one example, overlapped Block Motion Compensation (OBMC) is first applied before deriving the temporal gradient.
(iv) In one example, illumination Compensation (IC) is first applied before deriving the time gradient.
(v) In one example, weighted prediction is first applied before deriving the temporal gradient.
(f) In one example, the time gradient is first calculated and then modified. For example, the time gradient is further subtracted by the difference between Mean0 and Mean 1.
Example 8.In one example, one may look atA frequency parameter set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice, a CTU, or a CU signals from the encoder to the decoder whether to update MVs for the BIO coding block and/or how to use the updated MVs for future motion prediction.
Example 9.In one example, it is proposed to add constraints to the motion vectors utilized in the BIO process.
(a) In one example, (v) x ,v y ) Constraint to a given range, -M x <v x <N x A kind of electronic device
-M y <v y <N y Wherein M is x ,N x ,M y ,N y Is a non-negative integer and may be equal to 32, for example.
(b) In one example, updated MVs of BIO encoded sub-blocks/BIO encoded blocks are constrained to a given range, such as-M L0x <mvL0’ x <N L0x and/or-M L1x <mvL1’ x <N L1x ,-M L0y
<mvL0’ y <N L0y and/or-M L1y <mvL1’ y <N L1y Wherein M is L0x ,N L0x ,M L1x ,N L1x ,M L0y ,N L0y ,M L1y ,N L1y Is a non-negative integer and may be equal to 1024, 2048, etc., for example.
Example 10.It is proposed that for BIO, DMVR, FRUC, template matching or other methods requiring content derived from the bitstream to update MVs (or motion information including MVs and/or reference pictures), the use of updated motion information may be constrained.
(a) In one example, even if the motion information is updated at the block level, the updated and non-updated motion information may be stored differently for different sub-blocks. In one example, updated motion information for some sub-blocks may be stored, and for other remaining sub-blocks, non-updated motion information is stored.
(b) In one example, if MVs (or motion information) are updated at the sub-block/block level, the updated MVs are stored only for the inner sub-blocks (i.e., sub-blocks that are not at the PU/CU/CTU boundary) and then used for motion prediction, deblocking, OBMC, etc., as shown in fig. 26A and 26B.
Alternatively, updated MVs are stored only for boundary sub-blocks.
(c) In one example, if the neighboring block and the current block are not in the same CTU or the same region having a size such as 64×64 or 32×32, updated motion information from the neighboring block is not used.
(i) In one example, a neighboring block is considered "unavailable" if it is not in the same CTU or the same region having a size such as 64 x 64 or 32 x 32.
(ii) Alternatively, if the neighboring block and the current block are not in the same CTU or the same region having a size such as 64×64 or 32×32, the current block uses motion information without an update procedure.
(d) In one example, if the neighboring block and the current block are not in the same CTU row or the same row of regions having a size such as 64×64 or 32×32, then the updated MVs from the neighboring block are not used.
(i) In one example, a neighboring block is considered "unavailable" if it is not in the same CTU row or the same row of regions having a size such as 64 x 64 or 32 x 32.
(ii) Alternatively, if the neighboring block and the current block are not in the same CTU row or the same row having an area of a size such as 64×64 or 32×32, the current block uses motion information without an update process.
(e) In one example, if the bottom row of the block is a CTU or the bottom row of an area having a size such as 64×64 or 32×32, the motion information of the block is not updated.
(f) In one example, if the rightmost column of the block is a CTU or the rightmost column of an area having a size such as 64×64 or 32×32, the motion information of the block is not updated.
(g) In one example, refined motion information from some neighboring CTUs or regions is used for the current CTU, and non-refined motion information from other neighboring CTUs or regions is used for the current CTU.
(i) In one example, refined motion information from the left CTU or left region is used for the current CTU.
(ii) Alternatively, in addition, refined motion information from the upper left CTU or upper left region is used for the current CTU.
(iii) Alternatively, in addition, refined motion information from the upper CTU or upper region is used for the current CTU.
(iv) Alternatively, in addition, refined motion information from the upper right CTU or upper right region is used for the current CTU.
(v) In one example, the region has a size such as 64×64 or 32×32.
Example 11. In one example, it is proposed that different MVD precision may be used in AF INTER mode, and syntax elements may be signaled to indicate the MVD precision of each block/CU/PU. Allowing a precision set comprising a plurality of different MVD precision constituting an equal ratio sequence.
(a) In one example, {1/4,1,4} pixel MVD precision is allowed.
(b) In one example, {1/4,1/2,1,2,4} pixel MVD precision is allowed.
(c) In one example, {1/16,1/8,1/4} pixel MVD precision is allowed.
(d) The syntax element is present under further conditions such as when a non-zero MVD component of a block/CU/PU is present.
(e) In one example, MVD precision information is always signaled regardless of whether there are any non-zero MVD components.
(f) Alternatively, for a 4/6 parameter AF INTER mode, in which 2/3 MVDs are encoded, different MVD precision may be used for 2/3 MVDs (1 MVD per control point in unidirectional prediction, 2 MVDs per control point in bi-directional prediction, i.e. 1 MVD per control point in each prediction direction), and 2/3 control points are associated with different MVD precision. In this case, furthermore, a 2/3 syntax element may be signaled to indicate MVD precision.
(g) In one example, the method described in PCT/CN2018/091792 can be used to encode MVD precision in AF INTER mode.
Example 12.In one example, it is proposed that if more than one DMVD method is performed on a block (e.g., PU), different decoder-side motion vector derivation (DMVD) methods, such as BIO, DMVR, FRUC and template matching, operate independently, i.e., the input of a DMVD method does not depend on the output of another DMVD method.
(a) In one example, furthermore, a prediction block and/or an updated set of motion information (e.g., motion vectors and reference pictures for each prediction direction) is generated from a set of motion information derived by multiple DMVD methods.
(b) In one example, motion compensation is performed using the derived motion information for each DMVD method and they are averaged or weighted averaged or filtered (e.g., by a median filter) to generate the final prediction.
(c) In one example, all DMVD method derived motion information is averaged or weighted averaged or filtered (e.g., by a median filter) to generate final motion information. Alternatively, different priorities are assigned to the different DMVD methods, and the motion information derived by the method having the highest priority is selected as the final motion information. For example, when BIO and DMVR are performed on a PU, then the motion information generated by the DMVR is used as final motion information.
(d) In one example, no more than N DMVD methods are allowed for the PU, where N > =1.
(i) Different priorities are assigned to different DMVD methods, and a method that is efficient and has the highest N priorities is performed.
(e) The DMVD method is performed in a simultaneous manner. The updated MV of one DMVD method is not input as the starting point of the next DMVD method. For all DMVD methods, an unexplored MV is input as a search start point. Alternatively, the DMVD method is performed in a cascade fashion. The updated MV of one DMVD method is input as the search start point of the next DMVD method.
Other embodiments
This section describes a method of MV refinement and storage for further use of the BIO coding block. The refined MVs may be used for motion vector prediction of subsequent blocks within the current slice/CTU line/slice, and/or for filtering processing (e.g., deblocking filter processing) and/or motion vector prediction of blocks located at different pictures.
As shown in fig. 32, from the sub-block in the reference block 0 to the sub-block in the reference block 1 (by (DMV x ,DMV y ) A representation) is used to further improve the prediction of the current sub-block.
It is suggested to further refine the motion vector of each sub-block by using the derived motion vector in the BIO. The POC distance (e.g., absolute POC difference) between the LX reference picture and the current picture is denoted as deltapoc x, and (MVLX x ,MVLX y ) Sum (MVLX) x ’,MVLX y ') represents a signaled and refined motion vector for the current sub-block, where x=0 or 1. Then (MVLX) x ’,MVLX y ') is calculated as follows:
Figure BDA0004068675780000391
Figure BDA0004068675780000392
Figure BDA0004068675780000393
Figure BDA0004068675780000394
however, multiplication and division are required in the above equation. To solve this problem, the derivation of the refined motion vector is simplified as follows:
MVL0′ x =MVL0 x -((DMV x +1)>>1)
MVL0′ y =MVL0 y -((DMV y +1)>>1)
MVL1′ x =MVL1 x +((DMV x +1)>>1)
MVL1′ y =MVL1 y +((DMV y +1)>>1)
in some embodiments, this approach is only employed when the current CU is predicted from the previous picture and the next picture, so it only operates in a Random Access (RA) configuration.
Example 13.The proposed method can be applied under certain conditions such as block size, slice/picture/slice type.
(a) In one example, the above method is not allowed when the block size contains samples less than m×h, for example 16 or 32 or 64 luminance samples.
(b) Alternatively, the above method is not allowed when the minimum dimension of the width or height of the block is less than or not greater than X. In one example, X is set to 8.
(c) Alternatively, when the width of the block > th1 or > =th1 and/or the height of the block > th2 or > =th2, the above method is not allowed. In one example, X is set to 8.
(d) Alternatively, when the width of the block < th1 or < =th1 and/or the height of the block < th2 or < =th2, the above method is not allowed. In one example, X is set to 8.
Example 14.The above method may be applied at the sub-block level.
(a) In one example, a BIO update procedure, or a two-step inter prediction procedure, or the temporal gradient derivation method described in example 7 may be invoked for each sub-block.
(b) In one example, a block may be partitioned into multiple sub-blocks when the width or height of the block or both the width and height are greater than (or equal to) a threshold L. Each sub-block is processed in the same manner as a normal encoded block having a sub-block size.
Example 15.The threshold may be predefined or signaled at SPS/PPS/picture/slice level.
(a) Alternatively, the threshold may depend on certain coding information, such as block size, picture type, temporal layer index, etc.
The above described examples may be incorporated in the context of the methods described below, e.g., methods 2700-3100, 3300-3600, and 3800-4200, which may be implemented at a video decoder.
Fig. 27 shows a flow chart of an example method for video decoding. Method 2700 includes, at step 2710, receiving a bitstream representation of a current block of video data.
The method 2700 includes, at step 2720, generating updated first and second reference motion vectors based on weighted sums of the first scaled motion vector and the first and second scaled reference motion vectors, respectively. In some embodiments, the first scaled motion vector is generated by scaling the first motion vector to the target precision, and wherein the first and second scaled reference motion vectors are generated by scaling the first and second reference motion vectors to the target precision, respectively. In some embodiments, the first motion vector is derived based on a first reference motion vector from the first reference block and a second reference motion vector from the second reference block, and wherein the current block is associated with the first and second reference blocks.
In some embodiments, the indication of the target precision is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a Coding Tree Unit (CTU), or a Coding Unit (CU).
In some embodiments, the first motion vector has a first precision and the first and second reference motion vectors have reference precision. In other embodiments, the first precision may be higher or lower than the reference precision. In other embodiments, the target accuracy may be set to a first accuracy, a reference accuracy, or a fixed (or predetermined) accuracy independent of the first accuracy and the reference accuracy.
In some embodiments, the first motion vector is derived based on bi-directional optical flow (BIO) refinement using the first and second reference motion vectors.
The method 2700 includes, at step 2730, processing the bitstream representation based on the updated first and second reference motion vectors to generate a current block. In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement or decoder-side motion vector refinement (DMVR), and wherein the updated first and second reference motion vectors are clipped prior to processing.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement, and the updated first and second reference motion vectors are constrained to a predetermined range of values prior to processing.
In some embodiments, processing is based on bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques. In one example, updated first and second reference motion vectors are generated for inner sub-blocks that are not on the boundary of the current block. In another example, updated first and second reference motion vectors are generated for a subset of sub-blocks of the current block.
In some embodiments, the processing is based on at least two techniques, which may include bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame Rate Up Conversion (FRUC) techniques, or template matching techniques. In one example, processing is performed for each of at least two techniques to generate multiple result sets, which may be averaged or filtered to generate the current block. In another example, processing is performed in a cascaded manner for each of at least two techniques to generate a current block.
Fig. 28 shows a flow chart of an example method for video decoding. The method 2800 includes, at step 2810, generating an inter prediction for a current block based on first motion information associated with the current block. In some embodiments, generating the intermediate prediction includes a first interpolation filtering process. In some embodiments, generating the intermediate prediction is also based on signaling in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a Coding Tree Unit (CTU), a slice header, a Coding Unit (CU), or a group of CTUs.
The method 2800 includes, at step 2820, updating the first motion information to the second motion information. In some embodiments, updating the first motion information includes using bi-directional optical flow (BIO) refinement.
The method 2800 includes, at step 2830, generating a final prediction of the current block based on the intermediate prediction or the second motion information. In some embodiments, generating the final prediction includes a second interpolation filtering process.
In some embodiments, the first interpolation filtering process uses a first set of filters that is different from a second set of filters used by the second interpolation filtering process. In some embodiments, at least one filter tap of the first or second interpolation filtering process is based on a dimension, a prediction direction, or a prediction type of the current block.
Fig. 29 shows a flow chart of another example method for video decoding. This example includes some features and/or steps similar to those shown in fig. 28 and described above. At least some of these features and/or components may not be separately described in this section.
Method 2900 includes, at step 2910, receiving a bitstream representation of a current block of video data. In some embodiments, step 2910 includes receiving a bitstream representation from a memory location or buffer in a video encoder or decoder. In other embodiments, step 2910 includes receiving, at the video decoder, the bitstream representation over a wireless or wired channel. In other embodiments, step 2910 includes receiving a bitstream representation from a different module, unit, or processor, which may implement one or more methods as described in embodiments in this document, but is not limited to such.
Method 2900 includes, at step 2920, generating intermediate motion information based on motion information associated with the current block.
Method 2900 includes, at step 2930, generating updated first and second reference motion vectors based on the first and second reference motion vectors, respectively. In some embodiments, the current block is associated with the first and second reference blocks. In some embodiments, the first and second reference motion vectors are associated with first and second reference blocks, respectively.
Method 2900 includes, at step 2940, processing the bitstream representation based on the intermediate motion information or the updated first and second reference motion vectors to generate a current block.
In some embodiments of method 2900, generating the updated first and second reference motion vectors is based on a weighted sum of the first scaled motion vector and the first and second scaled reference motion vectors, respectively. In some embodiments, the first motion vector is derived based on the first reference motion vector and the second reference motion vector, the first scaled motion vector is generated by scaling the first motion vector to the target precision, and the first and second scaled reference motion vectors are generated by scaling the first and second reference motion vectors to the target precision, respectively.
In some embodiments, the indication of the target precision is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a Coding Tree Unit (CTU), or a Coding Unit (CU).
In some embodiments, the first motion vector has a first precision and the first and second reference motion vectors have reference precision. In other embodiments, the first precision may be higher or lower than the reference precision. In other embodiments, the target accuracy may be set to a first accuracy, a reference accuracy, or a fixed (or predetermined) accuracy independent of the first accuracy and the reference accuracy.
In some embodiments, the first motion vector is derived based on bi-directional optical flow (BIO) refinement using the first and second reference motion vectors.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement, and the updated first and second reference motion vectors are constrained to a predetermined range of values prior to processing.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement or decoder-side motion vector refinement (DMVR), and wherein the updated first and second reference motion vectors are clipped prior to processing.
In some embodiments, processing is based on bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques. In one example, updated first and second reference motion vectors are generated for inner sub-blocks that are not on the boundary of the current block. In another example, updated first and second reference motion vectors are generated for a subset of sub-blocks of the current block.
In some embodiments, the processing is based on at least two techniques, which may include bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame Rate Up Conversion (FRUC) techniques, or template matching techniques. In one example, processing is performed for each of at least two techniques to generate multiple result sets, which may be averaged or filtered to generate the current block. In another example, processing is performed in a cascaded manner for each of at least two techniques to generate a current block.
Fig. 30 shows a flow chart of an example method for video decoding. The method 3000 includes, at step 3010, generating an inter prediction for the current block based on the first motion information associated with the current block. In some embodiments, generating the intermediate prediction includes a first interpolation filtering process. In some embodiments, generating the intermediate prediction is also based on signaling in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a Coding Tree Unit (CTU), a slice header, a Coding Unit (CU), or a group of CTUs.
The method 3000 includes, at step 3020, updating the first motion information to the second motion information. In some embodiments, updating the first motion information includes using bi-directional optical flow (BIO) refinement.
The method 3000 includes, at step 3030, generating a final prediction of the current block based on the intermediate prediction or the second motion information. In some embodiments, generating the final prediction includes a second interpolation filtering process.
In some embodiments, the first interpolation filtering process uses a first set of filters that is different from a second set of filters used by the second interpolation filtering process. In some embodiments, at least one filter tap of the first or second interpolation filtering process is based on a dimension, a prediction direction, or a prediction type of the current block.
Fig. 31 shows a flow chart of another example method for video decoding. This example includes some features and/or steps similar to those shown in fig. 30 described above. At least some of these features and/or components may not be described separately in this section.
Method 3100 includes, at step 3110, receiving a bitstream representation of a current block of video data. In some embodiments, step 3110 includes receiving the bitstream representation from a memory location or buffer in a video encoder or decoder. In other embodiments, step 3110 includes receiving the bitstream representation at the video decoder over a wireless or wired channel. In other embodiments, step 3110 includes receiving the bitstream representation from a different module, unit, or processor, which may implement one or more methods as described in embodiments herein, but is not limited to such.
The method 3100 includes, at step 3120, generating intermediate motion information based on the motion information associated with the current block.
The method 3100 includes, at step 3130, generating updated first and second reference motion vectors based on the first and second reference motion vectors, respectively. In some embodiments, the current block is associated with the first and second reference blocks. In some embodiments, the first and second reference motion vectors are associated with first and second reference blocks, respectively.
The method 3100 includes, at step 3140, processing the bitstream representation based on the intermediate motion information or the updated first and second reference motion vectors to generate a current block.
In some embodiments of method 3100, generating the updated first and second reference motion vectors is based on a weighted sum of the first scaled motion vector and the first and second scaled reference motion vectors, respectively. In some embodiments, the first motion vector is derived based on the first reference motion vector and the second reference motion vector, the first scaled motion vector is generated by scaling the first motion vector to the target precision, and the first and second scaled reference motion vectors are generated by scaling the first and second reference motion vectors, respectively, to the target precision.
In some embodiments, the indication of the target precision is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a Coding Tree Unit (CTU), or a Coding Unit (CU).
In some embodiments, the first motion vector has a first precision and the first and second reference motion vectors have reference precision. In other embodiments, the first precision may be higher or lower than the reference precision. In other embodiments, the target accuracy may be set to a first accuracy, a reference accuracy, or a fixed (or predetermined) accuracy independent of the first accuracy and the reference accuracy.
In some embodiments, the first motion vector is derived based on bi-directional optical flow (BIO) refinement using the first and second reference motion vectors.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement, and the updated first and second reference motion vectors are constrained to a predetermined range of values prior to processing.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement or decoder-side motion vector refinement (DMVR), and wherein the updated first and second reference motion vectors are clipped prior to processing.
In some embodiments, processing is based on bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques. In one example, updated first and second reference motion vectors are generated for inner sub-blocks that are not on the boundary of the current block. In another example, updated first and second reference motion vectors are generated for a subset of sub-blocks of the current block.
In some embodiments, the processing is based on at least two techniques, which may include bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame Rate Up Conversion (FRUC) techniques, or template matching techniques. In one example, processing is performed for each of at least two techniques to generate multiple result sets, which may be averaged or filtered to generate the current block. In another example, processing is performed in a cascaded manner for each of at least two techniques to generate a current block.
Fig. 33 shows a flow chart of an example method for video decoding. Method 3300 includes, at step 3310, generating an inter prediction for the current block based on first motion information associated with the current block. In some embodiments, generating the intermediate prediction includes a first interpolation filtering process. In some embodiments, generating the intermediate prediction is also based on signaling in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a Coding Tree Unit (CTU), a slice header, a Coding Unit (CU), or a group of CTUs.
Method 3300 includes, at step 3320, updating the first motion information to the second motion information. In some embodiments, updating the first motion information includes using bi-directional optical flow (BIO) refinement.
Method 3300 includes, at step 3330, generating a final prediction of the current block based on the intermediate prediction or the second motion information. In some embodiments, generating the final prediction includes a second interpolation filtering process.
In some embodiments, the first interpolation filtering process uses a first set of filters that is different from a second set of filters used by the second interpolation filtering process. In some embodiments, at least one filter tap of the first or second interpolation filtering process is based on a dimension, a prediction direction, or a prediction type of the current block.
Fig. 34 shows a flow chart of another example method for video decoding. This example includes some features and/or steps similar to those shown in fig. 33 described above. At least some of these features and/or components may not be described separately in this section.
The method 3400 includes, at step 3410, receiving a bitstream representation of a current block of video data. In some embodiments, step 3410 includes receiving the bitstream representation from a memory location or buffer in a video encoder or decoder. In other embodiments, step 3410 includes receiving the bitstream representation at the video decoder over a wireless or wired channel. In other embodiments, step 3410 includes receiving a bitstream representation from a different module, unit, or processor, which may implement one or more methods as described in embodiments in this document, but is not limited to such.
The method 3400 includes, at step 3420, generating intermediate motion information based on motion information associated with the current block.
The method 3400 includes, at step 3430, generating updated first and second reference motion vectors based on the first and second reference motion vectors, respectively. In some embodiments, the current block is associated with the first and second reference blocks. In some embodiments, the first and second reference motion vectors are associated with first and second reference blocks, respectively.
The method 3400 includes, at step 3440, processing the bitstream representation based on the intermediate motion information or the updated first and second reference motion vectors to generate a current block.
In some embodiments of the method 3400, generating updated first and second reference motion vectors is based on a weighted sum of the first scaled motion vector and the first and second scaled reference motion vectors, respectively. In some embodiments, the first motion vector is derived based on the first reference motion vector and the second reference motion vector, the first scaled motion vector is generated by scaling the first motion vector to the target precision, and the first and second scaled reference motion vectors are generated by scaling the first and second reference motion vectors to the target precision, respectively.
In some embodiments, the indication of the target precision is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a Coding Tree Unit (CTU), or a Coding Unit (CU).
In some embodiments, the first motion vector has a first precision and the first and second reference motion vectors have reference precision. In other embodiments, the first precision may be higher or lower than the reference precision. In other embodiments, the target accuracy may be set to a first accuracy, a reference accuracy, or a fixed (or predetermined) accuracy independent of the first accuracy and the reference accuracy.
In some embodiments, the first motion vector is derived based on bi-directional optical flow (BIO) refinement using the first and second reference motion vectors.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement, and the updated first and second reference motion vectors are constrained to a predetermined range of values prior to processing.
In some embodiments, the processing is based on bi-directional optical flow (BIO) refinement or decoder-side motion vector refinement (DMVR), and wherein the updated first and second reference motion vectors are clipped prior to processing.
In some embodiments, processing is based on bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques. In one example, updated first and second reference motion vectors are generated for inner sub-blocks that are not on the boundary of the current block. In another example, updated first and second reference motion vectors are generated for a subset of sub-blocks of the current block.
In some embodiments, the processing is based on at least two techniques, which may include bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame Rate Up Conversion (FRUC) techniques, or template matching techniques. In one example, processing is performed for each of at least two techniques to generate multiple result sets, which may be averaged or filtered to generate the current block. In another example, processing is performed in a cascaded manner for each of at least two techniques to generate a current block.
Fig. 35 shows a flow chart of an example method for video decoding. The method 3500 includes, at step 3510, generating an updated reference block for the bitstream representation of the current block by modifying the reference block associated with the current block.
In some embodiments, method 3500 further includes the step of filtering the reference block using a smoothing filter.
In some embodiments, method 3500 further includes a step of filtering pixels at a block boundary of the reference block.
In some embodiments, method 3500 further comprises the step of applying Overlapped Block Motion Compensation (OBMC) to the reference block.
In some embodiments, method 3500 further includes the step of applying Illumination Compensation (IC) to the reference block.
In some embodiments, method 3500 further includes the step of applying weighted prediction to the reference block.
Method 3500 includes, at step 3520, calculating a temporal gradient for bi-directional optical flow (BIO) motion refinement based on the updated reference block.
Method 3500 includes, at step 3530, performing a conversion between the bitstream representation and the current block that includes BIO motion refinement based on the temporal gradient. In some embodiments, the conversion generates the current block from the bitstream representation (e.g., as may be implemented in a video decoder). In other embodiments, the conversion generates a bitstream representation from the current block (e.g., as may be implemented in a video encoder).
In some embodiments, method 3500 further comprises calculating a mean of the reference block; and subtracting the average value from each pixel of the reference block. In one example, the calculated mean is based on all pixels of the reference block. In another example, the calculated mean is based on all pixels in a sub-block of the reference block.
In some embodiments, the calculated mean is based on a subset of pixels (in other words, not all pixels) of the reference block. In one example, the subset of pixels includes pixels in every fourth row or column of the reference block. In another example, the subset of pixels includes four corner pixels. In yet another example, the subset of pixels includes four corner pixels and a center pixel.
Fig. 36 shows a flow chart of another example method for video decoding. This example includes some features and/or steps similar to those shown in fig. 35 described above. At least some of these features and/or components may not be described separately in this section.
Method 3600 includes, at step 3610, generating a temporal gradient for bi-directional optical flow (BIO) motion refinement for a bitstream representation of a current block.
The method 3600 includes, at step 3620, generating an updated temporal gradient by subtracting a difference of a first mean value and a second mean value from the temporal gradient, wherein the first mean value is a mean value of a first reference block, the second mean value is a mean value of a second reference block, and the first and second reference blocks are associated with a current block.
In some embodiments, the mean is based on all pixels of the corresponding reference block (e.g., the first mean is calculated as the mean of all pixels of the first reference block). In another example, the mean is calculated based on all pixels in the sub-block of the corresponding reference block.
In some embodiments, the mean is based on a subset of pixels (in other words, not all pixels) of the corresponding reference block. In one example, the subset of pixels includes pixels in every fourth row or column of the corresponding reference block. In another example, the subset of pixels includes four corner pixels. In yet another example, the subset of pixels includes four corner pixels and a center pixel.
Method 3600 includes, at step 3630, performing a conversion between a bitstream representation and a current block that includes BIO motion refinement based on an updated temporal gradient. In some embodiments, the conversion generates the current block from the bitstream representation (e.g., as may be implemented in a video decoder). In other embodiments, the conversion generates a bitstream representation from the current block (e.g., as may be implemented in a video encoder).
Fig. 38 shows a flowchart of an example method for video processing. The method 3800 includes: in step 3810, original motion information of the current block is determined; scaling the original motion vector of the original motion information and the derived motion vector derived based on the original motion vector to the same target precision at step 3820; generating an updated motion vector from the scaled original and derived motion vectors at step 3830; and performing a transition between the current block and a bitstream representation of the video including the current block based on the updated motion vector at step 3840.
Fig. 39 shows a flowchart of an example method for video processing. The method 3900 includes: at step 3910, determining original motion information for a current block; in step 3920, updating an original motion vector of original motion information of a current block based on a refinement method; clipping the updated motion vector to a range in step 3930; and at step 3940, performing a transition between the current block and a bitstream representation of the video including the current block based on the cropped updated motion vector.
Fig. 40 shows a flowchart of an example method for video processing. The method 4000 comprises: in step 4010, original motion information associated with the current block is determined; in step 4020, generating updated motion information based on the particular prediction mode; and at step 4030, performing a conversion between the current block and a bitstream representation of the video data comprising the current block based on the updated motion information, wherein the particular prediction mode includes one or more of bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques.
Fig. 41 shows a flowchart of an example method for video processing. The method 4100 comprises: in step 4110, determining from a Motion Vector Difference (MVD) precision set the MVD precision of the current block processed with affine mode; in step 4120, a conversion between the current block and a bitstream representation of the video including the current block is performed based on the determined MVD precision.
Fig. 42 shows a flowchart of an example method for video processing. The method 4200 includes: determining non-updated motion information associated with the current block in step 4210; updating the non-updated motion information based on a plurality of decoder side motion vector derivation (DMVD) methods to generate updated motion information for the current block in step 4220; and at step 4230, performing a transition between the current block and a bitstream representation of the video including the current block based on the updated motion information.
7. Example implementations of the disclosed technology
Fig. 37 is a block diagram of a video processing apparatus 3700. Apparatus 3700 can be used to implement one or more methods described herein. The apparatus 3700 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, or the like. The apparatus 3700 can include one or more processors 3702, one or more memories 3704, and video processing hardware 3706. The processor 3702 may be configured to implement one or more of the methods described herein (including, but not limited to, methods 27003100, 3300-3600, and 3800-4200). Memory(s) 3704 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 3706 may be used to implement some of the techniques described in this document in hardware circuitry.
In some embodiments, the video encoding method may be implemented using an apparatus implemented on a hardware platform as described with respect to fig. 37.
The following clause-based format may be used to describe the various embodiments and techniques described throughout this document.
1.1 a video processing method comprising:
determining original motion information of a current block;
scaling an original motion vector of the original motion information and a derived motion vector derived based on the original motion vector to the same target precision;
Generating an updated motion vector from the scaled original and derived motion vectors; and
based on the updated motion vector, a transition between the current block and a bitstream representation of the video comprising the current block is performed.
1.2. The method of example 1.1, wherein the original motion vector has a first precision, the derived motion vector has a second precision different from the first precision, and the target precision is set to a higher precision or a lower precision of the first precision and the second precision.
1.3. The method of example 1.1, wherein the target accuracy is set to a fixed accuracy.
1.4. The method of example 1.1, wherein the target accuracy is higher than the accuracy of the original motion vector.
1.5. The method of example 1.4, wherein scaling the original motion vector is:
mvLX’ x =sign(mvLX x )*(abs(mvLX x )<<N),
mvLX’ y =sign(mvLX y )*(abs(mvLX y )<<N),
wherein (mvLX) x ,mvLX y ) Is the original motion vector, (mvLX' x ,mvLX’ y ) Is the scaled original motion vector, the function sign () returns the sign of the input parameter, the function abs () returns the absolute value of the input parameter, n=log2 (curr_mv_precision/targ_mv_precision), and where curr_mv_precision is the precision of the original motion vector, and targ_mv_pThe recovery is the accuracy of the derived motion vector, which is taken as the target accuracy.
1.6. The method of example 1.1, wherein the target precision is the same as the precision of the original motion vector.
1.7. The method of example 1.1, wherein the original motion vector has a first precision, the derived motion vector has a second precision different from the first precision, and the target precision is set to the first precision.
1.8. The method of example 1.7, wherein when the derived motion vector is right shifted by N to achieve the target precision, the derived motion vector is scaled to:
v’ x =(v x +offset)>>N,v’ y =(v y +offset)>>n; or (b)
v’ x =sign(v x )*((abs(v x )+offset)>>N),v’ y =sign(v y )*((abs(v y )+offset)
>>N)
Wherein (v) x ,v y ) Is a derived motion vector, (v' x ,v’ y ) Is the scaled derived motion vector that is derived,
offset is an offset applied to derive motion vectors to achieve target accuracy,
the function sign (.) returns the sign of the input parameter, the function abs (.) returns the absolute value of the input parameter,
n=log2 (curr_mv_precision/targ_mv_precision), where curr_mv_precision is the first precision and targ_mv_precision is the second precision.
1.9. The method of example 1.1, wherein the generating of the scaled and updated motion vectors is performed as:
mvL0’ x =-v x /S 0 +mvL0 x ,mvL0’ y =-v y /S 0 +mvL0 y the method comprises the steps of carrying out a first treatment on the surface of the And/or
mvL1’ x =v x /S 1 +mvL1 x ,mvL1’ y =v y /S 1 +mvL1 y
Wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, (v) x ,v y ) Is a derived motion vector and S 0 And S is 1 Is a scaling factor.
1.10. The method of example 1.1, wherein the generating of the scaled and updated motion vectors is performed as:
mvL0’ x =(-v x +offset0)/S 0 +mvL0 x ,mvL0’ y =-(v y +offset0)/S 0 +
mvL0 y a kind of electronic device
mvL1’ x =(v x +offset1)/S 1 +mvL1 x ,mvL1’ y =(v y +offset1)/S 1 +mvL1 y
Wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, (v) x ,v y ) Is a derived motion vector, offset0 and offset1 are offsets, and S 0 And S is 1 Is a scaling factor.
1.11. The method of example 1.1, wherein the generating of the scaled and updated motion vectors is performed as:
mvL0’ x =((-v x +1)>>1)+mvL0 x ,mvL0’ y =(-(v y +1)>>1)+mvL0 y the method comprises the steps of carrying out a first treatment on the surface of the And-
Or (b)
mvL1’ x =((v x +1)>>1)+mvL1 x ,mvL1’ y =((v y +1)>>1)+mvL1 y
Wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, and(v x ,v y ) Is the derived motion vector.
1.12. The method of any of examples 1.9-1.11, wherein, when τ 0 >0 and τ 1 >Generation of the scaled and updated motion vector is performed at 0, where τ 0 =poc (current) -POC (Ref 0 ),τ 1 =POC(Ref 1 ) POC (current), and wherein POC (current), POC (Ref 0 ) And POC (Ref) 1 ) Picture order counts for the current block, the first reference block, and the second reference block, respectively.
1.13. The method of example 1.1, wherein the generating of the scaled and updated motion vectors is performed as:
mvL0’ x =-SF 0 *v x +mvL0 x ,mvL0’ y =-v y *SF 0 +mvL0 y the method comprises the steps of carrying out a first treatment on the surface of the And/or
mvL1’ x =-SF 1 *v x +mvL1 x ,mvL1’ y =-SF 1 *v y +mvL1 y
Wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, (v) x ,v y ) Is a derived motion vector, and SF 0 And SF (sulfur hexafluoride) 1 Is a scaling factor.
1.14. The method of example 1.13, wherein, when τ 0 >0、τ 1 <0 and τ 0 >|τ 1 When I, the scaling and generation of updated motion vectors is performed, where τ 0 =poc (current) -POC (Ref 0 ),τ 1 =POC(Ref 1 ) POC (current), and wherein POC (current), POC (Ref 0 ) And POC (Ref) 1 ) Picture order counts for the current block, the first reference block, and the second reference block, respectively.
1.15. The method of example 1.1, wherein the generating of the scaled and updated motion vectors is performed as:
mvL0’ x =SFACT 0 *v x +mvL0 x ,mvL0’ y =SFACT 0 *v y +mvL0 y a kind of electronic device
mvL1’ x =SFACT 1 *v x +mvL1 x ,mvL1’ y =SFACT 1 *v y +mvL1 y
Wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, (v) x ,v y ) Is a derived motion vector, and SFACT 0 And SFACT (Small form-factor ACT) 1 Is a scaling factor.
1.16. The method of example 1.15, wherein, when τ 0 >0、τ 1 <0 and τ 0 <|τ 1 When I, the scaling and generation of updated motion vectors is performed, where τ 0 =poc (current) -POC (Ref 0 ),τ 1 =POC(Ref 1 ) POC (current), and wherein POC (current), POC (Ref 0 ) And POC (Ref) 1 ) Picture order counts for the current block, the first reference block, and the second reference block, respectively.
1.17. The method of example 1.1, wherein, when τ 0 >0、τ 1 >0 and τ 0 >|τ 1 When I, the derivation of the derived motion vector and the generation of the updated motion vector are performed together, where τ 0 =poc (current) -POC (Ref 0 ),τ 1 =POC(Ref 1 ) POC (current), and wherein POC (current), POC (Ref 0 ) And POC (Ref) 1 ) Picture order counts for the current block, the first reference block, and the second reference block, respectively.
1.18. The method of example 1.17, wherein when the derived motion vector is right shifted by N to achieve the target precision, the scaling and generation of the updated motion vector is performed as:
mvL0’ x =((-v x +offset)>>(N+1))+mvL0 x ,mvL0’ y =((-v y +offset1)>>
(N+1))+mvL0 y ,mvL1’ x =((v x +offset)>>(N+1))+mvL1 x ,mvL1’ y =((v y +
offset2)>>(N+1))+mvL1 y
wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, (v) x ,v y ) Is the derived motion vector, offset1 and offset2 are offsets, n=log2 (curr_mv_precision/targ_mv_precision), and where curr_mv_precision is the precision of the original motion vector and targ_mv_precision is the precision of the derived motion vector.
1.19. The method of example 1.17, wherein the original motion vector has a first precision, the derived motion vector has a second precision different from the first precision, and the original motion vector is shifted left by N to achieve the target precision as the second precision.
1.20. The method of example 1.17, wherein the original motion vector is shifted left by K and the derived motion vector is shifted right by N-K to achieve the target accuracy.
1.21. The method of example 1.17, wherein the generating of the scaled and updated motion vectors is performed as:
mvL0’ x =-sign(v x )*((abs(v x )+offset0)>>(N+1))+mvL0 x
mvL0’ y =-sign(v y )*((abs(v y )+offset0)>>(N+1))+mvL0 y
mvL1’ x =sign(v x )*((abs(v x )+offset1)>>(N+1))+mvL1 x
mvL1’ y =sign(v y )*((abs(v y )+offset1)>>(N+1))+mvL1 y
Wherein, (mvL 0) x ,mvL0 y ) And (mvL 1) x ,mvL1 y ) Is the original motion vector, (mvL0' x ,mvL0’ y ) And (mvL 1' x ,mvL1’ y ) Is an updated motion vector, (v) x ,v y ) Is the derived motion vector, offset0 and offset1 are the offsets, the function sign () returns the sign of the input parameter, the function abs () returns the absolute value of the input parameter, n=log2 (curr_mv_precision/targ_mv_precision), curr_mv_precision is the precision of the original motion vector, and targ_mv_precision is the precision of the derived motion vector.
1.22. The method of example 1.1, wherein updating the first and second reference motion vectors includes refining using bi-directional optical flow (BIO).
1.23. The method of any of examples 1.1-1.22, wherein the method is not applied if the current block satisfies a particular condition.
1.24. The method of example 1.23, wherein the particular condition specifies at least one of: the size of the current block, the slice type of the current block, the picture type of the current block, and the slice type of the current block.
1.25. The method of example 1.23, wherein the particular condition specifies that the current block contains a number of samples less than a first threshold.
1.26. The method of example 1.23, wherein the particular condition specifies that a minimum size of a width and a height of the current block is less than or not greater than a second threshold.
1.27. The method of example 1.23, wherein the particular condition specifies that a width of the current block is less than or not greater than a third threshold and/or a height of the current block is less than or not greater than a fourth threshold.
1.28. The method of example 1.23, wherein the particular condition specifies that the width of the current block is greater than or not less than a third threshold and/or the height of the current block is greater than or not less than a fourth threshold.
1.29. The method of example 1.23, wherein the method is applied at a sub-block level in a case where a width and/or a height of a block to which the sub-block belongs is equal to or greater than a fifth threshold.
1.30. The method of example 1.29, wherein the current block is partitioned into a plurality of sub-blocks, and each of the plurality of sub-blocks is further subjected to a bi-directional optical flow (BIO) process in the same manner as a normal encoded block having a size equal to the sub-block.
1.31. The method of any of examples 1.25-1.29, wherein each of the first through fifth thresholds is predefined or signaled in a Sequence Parameter Set (SPS) level, or a Picture Parameter Set (PPS) level, or a picture level, or a slice level.
1.32. The method of example 1.31, wherein each of the first through fifth thresholds is defined according to encoding information including at least one of a block size, a picture type, and a temporal layer index.
1.33. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any one of examples 1.1 to 1.32.
1.34. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing a method as in any of examples 1.1 to 1.32.
2.1. A video processing method, comprising:
determining original motion information of a current block;
updating an original motion vector of the original motion information of the current block based on a refinement method;
clipping the updated motion vector to a range; and
based on the cropped updated motion vector, a transition between the current block and the bitstream representation of the video comprising the current block is performed.
2.2. The method of example 2.1, wherein the refinement method includes bi-directional optical flow BIO refinement, decoder-side motion vector refinement DMVR, frame rate up-conversion FRUC, or template matching.
2.3. The method of example 2.1, wherein the updated motion vector is clipped to the same range as the original motion vector allows.
2.4. The method of example 2.1, wherein a difference between the updated motion vector and the original motion vector is clipped to a same range or a different sub-block is clipped to a different range.
2.5. The method of example 2.1, wherein the refinement method includes bi-directional optical flow, BIO, refinement and constraining motion vectors derived from the original motion vectors in the BIO refinement to a first range as follows:
-M x <v x <N x and/or-M y <v y <N y
Wherein (v) x ,v y ) Is a derived motion vector, and M x 、N x 、M y 、N y Is a non-negative integer.
2.6. The method of example 2.1, wherein the refinement method includes bi-directional optical flow (BIO) refinement and constraining the updated motion vectors to the second range as follows:
-M L0x <mvL0’ x <N L0x a kind of electronic device
-M L2.1x <mvL2.1’ x <N L2.1x A kind of electronic device
-M L0x <mvL0’ x <N L0x A kind of electronic device
-M L2.1y <mvL2.1’ y <N L2.1y
Wherein (mvL 0' x ,mvL0’ y ) And (mvL2.1' x ,mvL2.1’ y ) Is an updated motion vector for a different reference list, and M L0x 、N L0x 、M L2.1x 、N L2.1x 、M L0y 、N L0y 、M L2.1y 、N L2.1y An integer that is not negative.
2.7. The method of any of examples 2.1-2.6, wherein the method is not applied if the current block satisfies a particular condition.
2.8. The method of example 2.7, wherein the particular condition specifies at least one of: the size of the current block, the slice type of the current block, the picture type of the current block, and the slice type of the current block.
2.9. The method of example 2.7, wherein the particular condition specifies that the current block contains a number of samples less than a first threshold.
2.10. The method of example 2.7, wherein the particular condition specifies that a minimum size of a width and a height of the current block is less than or not greater than a second threshold.
2.11. The method of example 2.7, wherein the particular condition specifies that a width of the current block is less than or not greater than a third threshold and/or a height of the current block is less than or not greater than a fourth threshold.
2.12. The method of example 2.7, wherein the particular condition specifies that a width of the current block is greater than or not less than a third threshold and/or a height of the current block is greater than or not less than a fourth threshold.
2.13. The method of example 2.7, wherein the method is applied at a sub-block level in a case where a width and/or a height of a block to which the sub-block belongs is equal to or greater than a fifth threshold.
2.14. The method of example 2.13, wherein the current block is partitioned into a plurality of sub-blocks, and each of the plurality of sub-blocks is further subjected to a bi-directional optical flow, BIO, process in the same manner as a normal encoded block having a size equal to the sub-block.
2.15. The method of any of examples 2.9-2.13, wherein each of the first through fifth thresholds is predefined or signaled at a sequence parameter set, SPS, level, or picture parameter set, PPS, level, or slice level.
2.16. The method of example 2.15, wherein each of the first through fifth thresholds is defined according to encoded information including at least one of a block size, a picture type, and a temporal layer index.
2.17. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of examples 2.1 to 2.16.
2.18. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any one of examples 2.1 to 2.16.
3.1. A video processing method, comprising:
determining original motion information associated with the current block;
generating updated motion information based on the particular prediction mode; and
based on the updated motion information, a transition between the current block and a bitstream representation of video data comprising the current block is performed, wherein the particular prediction mode comprises one or more of bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques.
3.2. The method of example 3.1, wherein the updated motion information comprises an updated motion vector.
3.3. The method of example 3.1, wherein the updated motion vector is used for motion prediction for encoding a subsequent video block; or the updated motion vectors are used for filtering or Overlapped Block Motion Compensation (OBMC).
3.4. The method of example 3.3, wherein the updated motion vector is used for motion prediction in Advanced Motion Vector Prediction (AMVP) mode, merge mode, and/or affine mode.
3.5. The method of example 3.3, wherein the filtering comprises deblocking filtering.
3.6. The method of any of examples 3.1-3.5, wherein the updated motion information is for a first module and the original motion information is for a second module.
3.7. The method of example 3.6, wherein the first module is a motion prediction module and the second module is a deblocking module.
3.8. The method of any of examples 3.2-3.7, wherein the motion prediction is to process a block subsequent to the current block in a current picture or slice.
3.9. The method of any of examples 3.2-3.7, wherein the motion prediction is to process a picture or a slice to be processed after a current picture or slice comprising the current block.
3.10. The method of any of examples 3.1-3.9, wherein the updated motion vector is used only for motion information prediction of a Coding Unit (CU) or a Prediction Unit (PU) that does not immediately follow the current block in processing order.
3.11. The method of any of examples 3.1 to 3.10, wherein the updated motion vector is not used for motion prediction of a CU/PU that immediately follows the current block in processing order.
3.12. The method of any of examples 3.1-3.11, wherein the updated motion vector is used only as a predictor for processing subsequent pictures/slices.
3.13. The method of example 3.12, wherein the updated motion vector is used as Temporal Motion Vector Prediction (TMVP) in Advanced Motion Vector Prediction (AMVP) mode, merge mode, or affine mode.
3.14. The method of example 3.12, wherein the updated motion vector is used only as a predictor for processing subsequent pictures/slices in an Alternative Temporal Motion Vector Prediction (ATMVP) mode and/or a space-time motion vector prediction (STMVP) mode.
3.15. The method of any of examples 3.1-3.14, wherein signaling information from the encoder to the decoder includes whether to update MVs for the BIO coding block and/or whether to use the updated MVs for motion prediction and/or how to use the updated MVs for motion prediction.
3.16. The method of example 3.15, further comprising: the information is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a Coding Tree Unit (CTU), or a CU.
3.17. The method of example 3.1, further comprising: motion information is updated, which includes motion vectors and reference pictures for each prediction direction updated at the block level.
3.18. The method of example 3.1 or 3.17, wherein within a block, for some sub-blocks, the updated motion information is stored, and for other remaining sub-blocks, non-updated motion information is stored.
3.19. The method of example 3.1 or 3.17, storing the updated motion vector only for internal sub-blocks that are not at the PU/CU/CTU boundary
3.20. The method of example 3.19, further comprising: the updated motion vectors of the inner sub-blocks are used for motion prediction, deblocking or OBMC.
3.21. The method of example 3.1 or 3.17, storing the updated motion vector only for boundary sub-blocks at the PU/CU/CTU boundary.
3.22. The method of example 3.1 or 3.17, wherein if a neighboring block and the current block are not in the same CTU or in the same region having a size of 64 x 64 or 32 x 32, updated motion information from the neighboring block is not used.
3.23. The method of example 3.22, wherein the neighboring block is marked as unavailable if the neighboring block and the current block are not in the same CTU or in the same region having a size of 64 x 64 or 32 x 32.
3.24. The method of example 3.22, wherein the current block uses non-updated motion information if the neighboring block and the current block are not in the same CTU or in the same region having a size of 64 x 64 or 32 x 32.
3.25. The method of example 3.17, wherein if a neighboring block and the current block are not in the same CTU row or the same row of regions having a size of 64 x 64 or 32 x 32, then the updated motion vector from the neighboring block is not used.
3.26. The method of example 3.25, wherein the neighboring block is marked as unavailable if the neighboring block and the current block are not in the same CTU row or in the same row having a region of 64 x 64 or 32 x 32 size.
3.27. The method of example 3.25, wherein the current block uses non-updated motion information from the neighboring block if the neighboring block and the current block are not in the same CTU row or the same row having a region of 64 x 64 or 32 x 32 size.
3.28. The method of example 3.17, wherein if a bottom row of a block is a CTU or a bottom row of an area having a size of 64 x 64 or 32 x 32, motion information of the block is not updated.
3.29. The method of example 3.17, wherein if a rightmost column of a block is a CTU or a rightmost column of an area having a size of 64 x 64 or 32 x 32, motion information of the block is not updated.
3.30. The method of example 3.1 or 3.17, further comprising predicting motion information of blocks/CUs within a current CTU based on updated or non-updated motion information of neighboring CTUs or regions.
3.31. The method of example 3.30, wherein the updated motion information from a left CTU or left region is used for the current CTU.
3.32. The method of example 3.30 or 3.31, wherein the updated motion information from an upper left CTU or upper left region is used for the current CTU.
3.33. The method of any of examples 3.30-3.32, wherein the updated motion information from an upper CTU or upper region is used for the current CTU.
3.34. The method of any of examples 3.30-3.33, wherein the updated motion information from an upper right CTU or upper right region is used for the current CTU.
3.35. The method of any of examples 3.30-3.34, wherein each of the one or more regions has a size of 64 x 64 or 32 x 32.
3.36. The method of any of examples 3.1-3.35, wherein the method is not applied in case the current block satisfies a specific condition.
3.37. The method of example 3.36, wherein the particular condition specifies at least one of: the size of the current block, the slice type of the current block, the picture type of the current block, and the slice type of the current block.
3.38. The method of example 3.36, wherein the particular condition specifies that the current block contains a number of samples less than a first threshold.
3.39. The method of example 3.36, wherein the particular condition specifies that a minimum size of a width and a height of the current block is less than or not greater than a second threshold.
3.40. The method of example 3.36, wherein the particular condition specifies that a width of the current block is less than or not greater than a third threshold and/or a height of the current block is less than or not greater than a fourth threshold.
3.41. The method of example 3.36, wherein the particular condition specifies that a width of the current block is greater than or not less than a third threshold and/or a height of the current block is greater than or not less than a fourth threshold.
3.42. The method of example 3.36, wherein the method is applied to the sub-block level in a case where a width and/or a height of a block to which the sub-block belongs is equal to or greater than a fifth threshold.
3.43. The method of example 3.42, wherein the current block is partitioned into a plurality of sub-blocks, and each of the plurality of sub-blocks further performs bidirectional optical flow (BIO) processing in the same manner as a normal encoded block having a size equal to the sub-block size.
3.44. The method of any of examples 3.38-3.42, wherein each of the first through fifth thresholds is predefined or signaled in a Sequence Parameter Set (SPS) level, or a Picture Parameter Set (PPS) level, or a picture level, or a slice level.
3.45. The method of example 3.44, wherein each of the first through fifth thresholds is defined according to encoded information including at least one of a block size, a picture type, and a temporal layer index.
3.46. An apparatus in a video system comprising a processor and a non-volatile memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of examples 3.1 to 3.45.
3.47. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any one of examples 3.1 to 3.45.
4.1. A video processing method, comprising:
determining from a Motion Vector Difference (MVD) precision set MVD precision for a current block processed in affine mode;
based on the determined MVD precision, a transition between the current block and a bitstream representation of a video that includes the current block is performed.
4.2. The method of example 4.1, wherein the MVD represents a difference between a predicted motion vector and an actual motion vector used during the motion compensation process.
4.3. The method of example 4.2, wherein the set of MVD precision comprises a plurality of different MVD precision comprising an equal ratio sequence.
4.4. The method of example 4.3, wherein the set of MVD precision comprises 1/4, 1, and 4 pixel MVD precision.
4.5. The method of example 4.3, wherein the set of MVD precision comprises 1/4, 1/2, 1, 2, and 4 pixel MVD precision.
4.6. The method of example 4.3, wherein the set of MVD precision comprises 1/16, 1/8, and 1/4 pixel MVD precision.
4.7. The method of example 4.1, wherein the current block is a coding unit or a prediction unit.
4.8. The method of any one of examples 4.1-4.7, wherein determining the MVD precision further comprises:
the MVD precision of the current block is determined based on a syntax element indicating the MVD precision.
4.9. The method of example 4.8, wherein the syntax element is present when a non-zero MVD component of the current block is present.
4.10. The method of example 4.8, wherein the syntax element is not present when there is no non-zero MVD component of the current block.
4.11. The method of example 4.8, wherein the syntax element is present regardless of whether there are any non-zero MVD components of the current block.
4.12. The method of example 4.1, wherein the current block is processed using affine inter mode or affine Advanced Motion Vector Prediction (AMVP) mode.
4.13. The method of example 4.12, wherein different MVDs of the current block are associated with different MVD precision.
4.14. The method of example 4.13, wherein the affine inter mode is a 4-parameter affine inter mode having 2 control points, and one MVD is used for each control point in each prediction direction.
4.15. The method of example 4.14, wherein the 2 control points are associated with different MVD accuracies.
4.16. The method of example 4.13, wherein the affine inter mode is a 6-parameter affine inter mode having 3 control points, and one MVD is used for each control point in each prediction direction.
4.17. The method of example 4.16, wherein the 3 control points are associated with different MVD accuracies.
4.18. The method of example 4.15, wherein there are two syntax elements to indicate different MVD precision of the 2 control points.
4.19. The method of example 4.17, wherein there are three syntax elements to indicate different MVD precision of the 3 control points.
4.20. The method of example 4.1, wherein the set of MVD precision is determined based on coding information of the current block.
4.21. The method of example 4.20, wherein the encoding information includes a quantization level of the current block.
4.22. The method of example 4.21, wherein a coarser set of MVD precision is selected for a larger quantization level.
4.23. The method of example 4.21, wherein a finer set of MVD precision is selected for a smaller quantization level.
4.24. The method of any of examples 4.1-4.23, wherein the method is not applied if the current block satisfies a particular condition.
4.25. The method of example 4.24, wherein the particular condition specifies at least one of: the size of the current block, the slice type of the current block, the picture type of the current block, and the slice type of the current block.
4.26. The method of example 4.24, wherein the particular condition specifies that the current block contains a number of samples less than a first threshold.
4.27. The method of example 4.24, wherein the particular condition specifies that a minimum size of a width and a height of the current block is less than or not greater than a second threshold.
4.28. The method of example 4.24, wherein the particular condition specifies that a width of the current block is less than or not greater than a third threshold and/or a height of the current block is less than or not greater than a fourth threshold.
4.29. The method of example 4.24, wherein the particular condition specifies that a width of the current block is greater than or not less than a third threshold and/or a height of the current block is greater than or not less than a fourth threshold.
4.30. The method of example 4.24, wherein the method is applied at the sub-block level in case the width and/or height of the block to which the sub-block belongs is equal to or greater than a fifth threshold.
4.31. The method of example 4.30, wherein the current block is partitioned into a plurality of sub-blocks, and each of the plurality of sub-blocks is further subjected to a bi-directional optical flow (BIO) process in the same manner as a normal encoded block having a size equal to the sub-block.
4.32. The method of any of examples 4.26-4.30, wherein each of the first through fifth thresholds is predefined or signaled in a Sequence Parameter Set (SPS) level, or a Picture Parameter Set (PPS) level, or a picture level, or a slice level.
4.33. The method of example 4.32, wherein each of the first through fifth thresholds is defined according to encoding information including at least one of a block size, a picture type, and a temporal layer index.
4.34. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any one of examples 4.1 to 4.33.
4.35. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any one of examples 4.1 to 4.33.
5.1. A video processing method, comprising:
determining non-updated motion information associated with the current block;
updating the non-updated motion information based on a plurality of decoder-side motion vector derivation (DMVD) methods to generate updated motion information for the current block; and
based on the updated motion information, a transition between the current block and a bitstream representation of a video comprising the current block is performed.
5.2. The method of example 5.1, wherein the plurality of DMVD methods includes at least two of: bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, and template matching techniques.
5.3. The method of example 5.2, wherein the plurality of DMVD methods are performed on the non-updated motion information of the current block in a simultaneous manner, and a non-updated motion vector of the non-updated motion information is input as a search start point of each of the plurality of DMVD methods.
5.4. The method of example 5.2, wherein the plurality of DMVD methods are performed on the non-updated motion information of the current block in a cascade manner, and an updated motion vector of the updated motion information generated by one DMVD method is input as a search start point of a next DMVD method.
5.5. The method of example 5.4, wherein the one DMVD method is a DMVR and the next DMVD method is a BIO, wherein DMVR is performed on the non-updated motion information of the current block to generate the updated motion information, and the updated motion vector of the updated motion information is input as a search start point of the BIO.
5.6. The method of any of examples 5.1 to 5.5, wherein the updating the non-updated motion information based on a plurality of decoder-side motion vector derivation (DMVD) methods to generate updated motion information for the current block further comprises:
deriving a plurality of sets of updated motion information by the plurality of DMVD methods,
generating a final set of updated motion information from the plurality of sets of updated motion information.
5.7. The method of example 5.6, wherein the generating the final set of updated motion information from the plurality of sets of updated motion information further comprises:
an updated motion information of a final set is generated based on an average or weighted average of the plurality of sets of updated motion information.
5.8. The method of example 5.6, wherein the generating the final set of updated motion information from the plurality of sets of updated motion information further comprises:
The updated motion information of the final set is generated by filtering the plurality of sets of updated motion information using a median filter.
5.9. The method of example 5.6, wherein the generating the final set of updated motion information from the plurality of sets of updated motion information further comprises:
the plurality of DMVD methods are assigned different priorities,
a set of updated motion information derived by the DMVD method with the highest priority is selected as the final set of updated motion information.
5.10. The method of example 5.9, wherein the highest priority is assigned to the decoder side motion vector refinement (DMVR).
5.11. The method of any of examples 5.1 to 5.5, wherein the performing a transition between the current block and a bitstream representation of a video comprising the current block based on the updated motion information further comprises:
performing motion compensation using sets of updated motion information derived from the plurality of DMVD methods, respectively, to obtain sets of motion compensation results,
the current block is generated based on an average or weighted average of the plurality of sets of motion compensation results.
5.12. The method of any of examples 5.1 to 5.5, wherein the performing a transition between the current block and a bitstream representation of a video comprising the current block based on the updated motion information further comprises:
Performing motion compensation using sets of updated motion information derived from the plurality of DMVD methods, respectively, to obtain sets of motion compensation results,
the current block is generated by filtering the plurality of sets of motion compensation results using a median filter.
5.13. The method of any of examples 5.1 to 5.5, wherein the updating the non-updated motion information based on a plurality of decoder-side motion vector derivation (DMVD) methods to generate updated motion information for the current block further comprises:
the plurality of DMVD methods are assigned different priorities,
selecting a DMVD method having the highest N priorities and being effective from the plurality of DMVD methods, N being an integer and N > =1,
and generating updated motion information for the current block based on N DMVD methods.
5.14. The method of any of examples 5.1 to 5.13, wherein the current block is a prediction unit.
5.15. The method of any of examples 5.1 to 5.14, wherein the non-updated motion information includes a non-updated motion vector and a reference picture for each prediction direction.
5.16. The method of any of examples 5.1-5.15, wherein the method is not applied if the current block satisfies a particular condition.
5.17. The method of example 5.16, wherein the particular condition specifies at least one of: the size of the current block, the slice type of the current block, the picture type of the current block, and the slice type of the current block.
5.18. The method of example 5.16, wherein the particular condition specifies that the current block contains a number of samples less than a first threshold.
5.19. The method of example 5.16, wherein the particular condition specifies that a minimum size of a width and a height of the current block is less than or not greater than a second threshold.
5.20. The method of example 5.16, wherein the particular condition specifies that a width of the current block is less than or not greater than a third threshold and/or a height of the current block is less than or not greater than a fourth threshold.
5.21. The method of example 5.16, wherein the particular condition specifies that a width of the current block is greater than or not less than a third threshold and/or a height of the current block is greater than or not less than a fourth threshold.
5.22. The method of example 5.16, wherein the method is applied at the sub-block level in case the width and/or height of the block to which the sub-block belongs is equal to or greater than a fifth threshold.
5.23. The method of example 5.22, wherein the current block is partitioned into a plurality of sub-blocks, and each of the plurality of sub-blocks is further subjected to a bi-directional optical flow (BIO) process in the same manner as a normal encoded block having a size equal to the sub-block.
5.24. The method of any of examples 5.18-5.22, wherein each of the first to fifth thresholds is predefined or signaled in a Sequence Parameter Set (SPS) level, or Picture Parameter Set (PPS) level, or picture level, or slice level.
5.25. The method of example 5.24, wherein each of the first through fifth thresholds is defined according to encoded information including at least one of a block size, a picture type, and a temporal layer index.
5.26. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of examples 5.1 to 5.25.
5.27. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any one of examples 5.1 to 5.25.
From the foregoing it will be appreciated that specific embodiments of the presently disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the presently disclosed technology is not limited except as by the appended claims.
The implementation and functional operations of the subject matter described in this patent document may be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. The line-of-sight aspects of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more transitory and non-transitory computer program instruction modules encoded on a computer readable medium, for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The terms "data processing unit" and "data processing apparatus" include all means, devices and machines for processing data, including for example a programmable processor, a computer or a plurality of processors or computers. In addition to hardware, an apparatus may include code that creates a runtime environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, and a combination of one or more of them.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
It is intended that the specification, together with the accompanying drawings, be considered exemplary only, with the exemplary meaning given by way of example. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. In addition, the use of "or" is intended to include "and/or" unless the context clearly indicates otherwise.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Also, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.
Only a few embodiments and examples have been described, and other embodiments, enhancements, and variations may be made based on what is described and illustrated in this patent document.

Claims (27)

1. A video processing method, comprising:
determining original motion information of a current block;
updating an original motion vector of the original motion information based on a decoder-side motion vector refinement DMVR;
clipping the updated motion vector into a first range; and
based on the cropped updated motion vector, a transition between the current block and the bitstream representation of the video comprising the current block is performed.
2. The method of claim 1, wherein the first range is the same as the range allowed by the original motion vector.
3. The method of claim 1, wherein performing the conversion comprises:
determining a reference sample point based on the updated motion vector; and
based on motion vector fields (v) derived during bi-directional optical flow refinement x ,v y ) Adjusting the reference sample point;
wherein in the bi-directional optical flow refinement v in the motion vector field is refined x And v y At least one of which is cut into a second range.
4. A method according to claim 3, wherein v x And v y Is constrained to a second range, as follows:
-M x <v x <N x and/or-M y <v y <N y
Wherein M is x 、N x 、M y 、N y Is a non-negative integer number,
wherein the motion vector field (v) is determined based on at least one gradient value corresponding to the reference sample point x ,v y )。
5. The method of claim 1, wherein the method is not applied in case the current block satisfies a specific condition; wherein the specific condition specifies at least one of: the size of the current block, the slice type of the current block, the picture type of the current block and the slice type of the current block,
wherein the specific condition includes at least one of the following conditions:
the number of samples contained in the current block is smaller than a first threshold value;
the width of the current block is smaller than or not larger than a second threshold value;
the height of the current block is smaller than or not larger than a third threshold value;
wherein at least one of the first threshold value, the second threshold value or the third threshold value is predefined.
6. The method of claim 1, wherein said updating an original motion vector and said cropping said updated motion vector are performed for each sub-block of said current block,
wherein the current block is divided into a plurality of sub-blocks in case that the width and/or the height of the current block is greater than a fourth threshold, wherein the sub-blocks belong to the current block,
Wherein each of the plurality of sub-blocks is subjected to decoder-side motion vector refinement or/and bi-directional optical flow in the same manner as a normal video block having a size equal to the sub-block.
7. The method of claim 1, wherein determining the original motion information comprises:
constructing a motion candidate list of the current block; and
the original motion information is determined from the motion candidate list.
8. A video encoding and decoding method, comprising:
for a current block encoded and decoded in affine inter mode, determining MVD precision from a motion vector difference MVD precision set; and
based on the determined MVD precision, the current block of the video is encoded and decoded,
wherein the set of MVD precision comprises a plurality of different MVD precision.
9. The method of claim 8, wherein at least one syntax element is optionally present to indicate MVD precision of the current block,
wherein the number of syntax elements is 2,
wherein the presence of the at least one syntax element is based at least on the presence of a non-zero MVD component of the current block.
10. The method according to claim 8, wherein the affine inter-frame mode is a 4-parameter affine inter-frame mode having 2 control points or a 6-parameter affine inter-frame mode having 3 control points, and one MVD is used for each control point in each prediction direction.
11. The method of claim 8, wherein the method is applied if the current block meets a particular condition, wherein the particular condition specifies at least one of: the size of the current block or the slice type of the current block,
wherein the specific condition specifies that the width of the current block is greater than or not less than a fifth threshold value and the height of the current block is greater than or not less than a sixth threshold value,
wherein each of the fifth threshold and the sixth threshold is predefined.
12. The method of claim 8, wherein the current block is divided into a plurality of sub-blocks, and each of the plurality of sub-blocks is further subjected to a bi-directional optical flow BIO process in the same manner as a normal codec block having a size equal to the sub-block size.
13. The method of claim 8, wherein the set of MVD accuracies comprises at least one of: 1/16 luminance sample point, 1/8 luminance sample point, 1 luminance sample point, 2 luminance sample point, 4 luminance sample point, 1/4 luminance sample point, and 1/2 luminance sample point.
14. The method of claim 8, wherein the encoding and decoding comprises:
Determining at least one MVD based on the MVD precision;
deriving at least one motion vector based on the at least one MVD;
and encoding and decoding the current block based on the at least one motion vector.
15. A video processing method, comprising:
determining original motion information associated with the current block;
generating updated motion information based on the particular prediction mode; and
performing a transition between the current block and a bitstream representation of video data comprising the current block based on the updated motion information, wherein the particular prediction mode comprises one or more of bi-directional optical flow (BIO) refinement, decoder-side motion vector refinement (DMVR), frame rate up-conversion (FRUC) techniques, or template matching techniques,
wherein the updated motion information comprises an updated motion vector.
16. The method of claim 1, wherein the updated motion vector is used for motion prediction for encoding a subsequent video block; or the updated motion vectors are used for filtering or Overlapped Block Motion Compensation (OBMC),
wherein the updated motion vector is used for motion prediction in Advanced Motion Vector Prediction (AMVP) mode, merge mode and/or affine mode,
Wherein the filtering comprises deblocking filtering.
17. The method of any one of claim 15 or 16, wherein the updated motion information is for a first module and the original motion information is for a second module,
wherein the first module is a motion prediction module and the second module is a deblocking module,
wherein the motion prediction is used to process a block following the current block in a current picture or slice, or the motion prediction is used to process a picture or slice to be processed following a current picture or slice comprising the current block.
18. The method of any of claims 15-17, wherein the updated motion vector is used only for motion information prediction of Coding Units (CUs) or Prediction Units (PUs) that do not immediately follow the current block in processing order,
alternatively, the updated motion vector is not used for motion prediction of the CU/PU following the current block in processing order,
alternatively, the updated motion vector is used only as a predictor for processing subsequent pictures/slices,
alternatively, the updated motion vector is used as Temporal Motion Vector Prediction (TMVP) in Advanced Motion Vector Prediction (AMVP) mode, merge mode or affine mode,
Alternatively, the updated motion vector is used only as a predictor for processing subsequent pictures/slices in an Alternative Temporal Motion Vector Prediction (ATMVP) mode and/or a space-time motion vector prediction (STMVP) mode.
19. The method of claim 15, further comprising: updating motion information, which includes updating motion vectors and reference pictures for each prediction direction at a block level,
wherein if the bottom row of a block is a CTU or the bottom row of an area having a size of 64 x 64 or 32 x 32, the motion information of the block is not updated,
alternatively, if the rightmost column of a block is a CTU or a rightmost column of an area having a size of 64×64 or 32×32, the motion information of the block is not updated.
20. The method of any one of claims 15 or 19, wherein if a neighboring block and the current block are not in the same CTU or in the same region having a size of 64 x 64 or 32 x 32, updated motion information from the neighboring block is not used,
or, if the neighboring block and the current block are not in the same CTU or in the same region having a size of 64 x 64 or 32 x 32, the neighboring block is marked as unavailable,
or, if the neighboring block and the current block are not in the same CTU or the same region having a size of 64 x 64 or 32 x 32, the current block uses the motion information that is not updated,
Or if the neighboring block and the current block are not in the same CTU row or the same row with a region of 64 x 64 or 32 x 32 size, then the updated motion vector from the neighboring block is not used,
or, if the neighboring block and the current block are not in the same CTU row or the same row of regions having a size of 64 x 64 or 32 x 32, the neighboring block is marked as unavailable,
alternatively, if the neighboring block and the current block are not in the same CTU row or the same row of regions having a size of 64×64 or 32×32, the current block uses the non-updated motion information from the neighboring block.
21. The method of any of claims 15 or 19, further comprising predicting motion information of blocks/CUs within a current CTU based on updated or non-updated motion information of neighboring CTUs or regions,
wherein the updated motion information from the left CTU or left region is used for the current CTU,
or, using the updated motion information from the upper left CTU or upper left region for the current CTU,
or, using the updated motion information from the upper CTU or upper region for the current CTU,
Or, using the updated motion information from the upper right CTU or upper right region for the current CTU,
alternatively, each of the one or more regions may have a size of 64X 64 or 32X 32,
alternatively, the method is not applied in case the current block satisfies a specific condition.
22. The method of any of claims 1-16, 19, wherein performing the conversion comprises decoding the current block from the bitstream representation.
23. The method of any of claims 1-16, 19, wherein performing the conversion comprises encoding the current block into the bitstream representation.
24. A video processing apparatus comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-23.
25. A non-transitory computer readable storage medium storing instructions that cause a processor to implement the method of any one of claims 1-23.
26. A non-transitory computer readable recording medium storing a bitstream of video generated by the method of any one of claims 1-23, performed by a video processing device.
27. A method for storing a bitstream of video, comprising:
generating the bitstream based on the method of any of claims 1-23; and
the bit stream is stored in a non-transitory computer readable recording medium.
CN202310085146.XA 2018-08-04 2019-08-05 Video processing method, apparatus and computer readable medium Pending CN116095312A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
CN2018098691 2018-08-04
CNPCT/CN2018/098691 2018-08-04
CNPCT/CN2018/109250 2018-10-06
CN2018109250 2018-10-06
CN201910718717.2A CN110809155B (en) 2018-08-04 2019-08-05 Video processing method, device and computer readable medium

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910718717.2A Division CN110809155B (en) 2018-08-04 2019-08-05 Video processing method, device and computer readable medium

Publications (1)

Publication Number Publication Date
CN116095312A true CN116095312A (en) 2023-05-09

Family

ID=68072859

Family Applications (6)

Application Number Title Priority Date Filing Date
CN201910718733.1A Active CN110809156B (en) 2018-08-04 2019-08-05 Interaction between different decoder-side motion vector derivation modes
CN202210943677.3A Pending CN115842912A (en) 2018-08-04 2019-08-05 Interaction between different decoder-side motion vector derivation modes
CN201910718739.9A Active CN110809165B (en) 2018-08-04 2019-08-05 Affine motion vector difference accuracy
CN202310085146.XA Pending CN116095312A (en) 2018-08-04 2019-08-05 Video processing method, apparatus and computer readable medium
CN201910718717.2A Active CN110809155B (en) 2018-08-04 2019-08-05 Video processing method, device and computer readable medium
CN201910718738.4A Active CN110809159B (en) 2018-08-04 2019-08-05 Clipping of updated or derived MVs

Family Applications Before (3)

Application Number Title Priority Date Filing Date
CN201910718733.1A Active CN110809156B (en) 2018-08-04 2019-08-05 Interaction between different decoder-side motion vector derivation modes
CN202210943677.3A Pending CN115842912A (en) 2018-08-04 2019-08-05 Interaction between different decoder-side motion vector derivation modes
CN201910718739.9A Active CN110809165B (en) 2018-08-04 2019-08-05 Affine motion vector difference accuracy

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN201910718717.2A Active CN110809155B (en) 2018-08-04 2019-08-05 Video processing method, device and computer readable medium
CN201910718738.4A Active CN110809159B (en) 2018-08-04 2019-08-05 Clipping of updated or derived MVs

Country Status (5)

Country Link
US (5) US11109055B2 (en)
CN (6) CN110809156B (en)
GB (2) GB2590228B (en)
TW (4) TWI753281B (en)
WO (4) WO2020031061A2 (en)

Families Citing this family (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019234598A1 (en) 2018-06-05 2019-12-12 Beijing Bytedance Network Technology Co., Ltd. Interaction between ibc and stmvp
CN116347099A (en) * 2018-06-19 2023-06-27 北京字节跳动网络技术有限公司 Motion vector difference accuracy without selection of motion vector prediction cut-off
WO2019244117A1 (en) 2018-06-21 2019-12-26 Beijing Bytedance Network Technology Co., Ltd. Unified constrains for the merge affine mode and the non-merge affine mode
GB2589223B (en) 2018-06-21 2023-01-25 Beijing Bytedance Network Tech Co Ltd Component-dependent sub-block dividing
US11533471B2 (en) * 2018-06-22 2022-12-20 Sony Corporation Image processing apparatus and image processing method
TWI744661B (en) 2018-06-29 2021-11-01 大陸商北京字節跳動網絡技術有限公司 Number of motion candidates in a look up table to be checked according to mode
CN110662056B (en) 2018-06-29 2022-06-07 北京字节跳动网络技术有限公司 Which lookup table needs to be updated or not
KR20240005239A (en) 2018-06-29 2024-01-11 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Interaction between lut and amvp
BR112020024142A2 (en) 2018-06-29 2021-03-02 Beijing Bytedance Network Technology Co., Ltd. method for video processing, apparatus for encoding video data, non-transitory computer-readable storage medium and recording medium
CN114885173A (en) 2018-06-29 2022-08-09 抖音视界(北京)有限公司 Checking order of motion candidates in LUT
CN114845108A (en) 2018-06-29 2022-08-02 抖音视界(北京)有限公司 Updating the lookup table: FIFO, constrained FIFO
TWI731364B (en) 2018-07-02 2021-06-21 大陸商北京字節跳動網絡技術有限公司 Hmvp+ non-adjacent motion
WO2020031061A2 (en) * 2018-08-04 2020-02-13 Beijing Bytedance Network Technology Co., Ltd. Mvd precision for affine
GB2590310B (en) 2018-09-12 2023-03-22 Beijing Bytedance Network Tech Co Ltd Conditions for starting checking HMVP candidates depend on total number minus K
TW202025737A (en) 2018-09-19 2020-07-01 大陸商北京字節跳動網絡技術有限公司 Fast algorithms for adaptive motion vector resolution in affine mode
CN110944196B (en) 2018-09-24 2023-05-30 北京字节跳动网络技术有限公司 Simplified history-based motion vector prediction
CN111010569B (en) 2018-10-06 2023-02-28 北京字节跳动网络技术有限公司 Improvement of temporal gradient calculation in BIO
CN112970262B (en) 2018-11-10 2024-02-20 北京字节跳动网络技术有限公司 Rounding in trigonometric prediction mode
WO2020098808A1 (en) 2018-11-17 2020-05-22 Beijing Bytedance Network Technology Co., Ltd. Construction of merge with motion vector difference candidates
US11290743B2 (en) * 2018-12-08 2022-03-29 Qualcomm Incorporated Interaction of illumination compensation with inter-prediction
CN113196747B (en) 2018-12-21 2023-04-14 北京字节跳动网络技术有限公司 Information signaling in current picture reference mode
KR20240010576A (en) 2019-01-10 2024-01-23 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Invoke of lut updating
WO2020143824A1 (en) 2019-01-13 2020-07-16 Beijing Bytedance Network Technology Co., Ltd. Interaction between lut and shared merge list
WO2020147773A1 (en) 2019-01-16 2020-07-23 Beijing Bytedance Network Technology Co., Ltd. Inserting order of motion candidates in lut
EP3895429A4 (en) 2019-01-31 2022-08-10 Beijing Bytedance Network Technology Co., Ltd. Context for coding affine mode adaptive motion vector resolution
EP3942822A1 (en) * 2019-03-16 2022-01-26 Vid Scale, Inc. Inter prediction memory access bandwidth reduction method with optical flow compensation
US11343525B2 (en) 2019-03-19 2022-05-24 Tencent America LLC Method and apparatus for video coding by constraining sub-block motion vectors and determining adjustment values based on constrained sub-block motion vectors
CN113615193A (en) 2019-03-22 2021-11-05 北京字节跳动网络技术有限公司 Merge list construction and interaction between other tools
JP7239732B2 (en) * 2019-04-02 2023-03-14 北京字節跳動網絡技術有限公司 Video encoding and decoding based on bidirectional optical flow
CN113711609B (en) 2019-04-19 2023-12-01 北京字节跳动网络技术有限公司 Incremental motion vectors in predictive refinement using optical flow
WO2020211866A1 (en) 2019-04-19 2020-10-22 Beijing Bytedance Network Technology Co., Ltd. Applicability of prediction refinement with optical flow process
CN113728630B (en) 2019-04-19 2023-11-17 北京字节跳动网络技术有限公司 Region-based gradient computation in different motion vector refinements
CN117676134A (en) 2019-04-25 2024-03-08 北京字节跳动网络技术有限公司 Constraint on motion vector differences
CN117201791A (en) 2019-04-28 2023-12-08 北京字节跳动网络技术有限公司 Symmetric motion vector difference codec
WO2020233662A1 (en) 2019-05-21 2020-11-26 Beijing Bytedance Network Technology Co., Ltd. Syntax signaling for optical-flow based inter coding
EP3973705A4 (en) 2019-06-25 2022-09-28 Beijing Bytedance Network Technology Co., Ltd. Restrictions on motion vector difference
US20220264146A1 (en) * 2019-07-01 2022-08-18 Interdigital Vc Holdings France, Sas Bi-prediction refinement in affine with optical flow
US11272203B2 (en) 2019-07-23 2022-03-08 Tencent America LLC Method and apparatus for video coding
CN114365490A (en) 2019-09-09 2022-04-15 北京字节跳动网络技术有限公司 Coefficient scaling for high precision image and video coding and decoding
BR112022005133A2 (en) 2019-09-21 2022-10-11 Beijing Bytedance Network Tech Co Ltd VIDEO DATA PROCESSING METHOD AND APPARATUS, AND NON-TRANSITORY COMPUTER-READABLE STORAGE AND RECORDING MEDIA
JP7391203B2 (en) 2019-10-12 2023-12-04 北京字節跳動網絡技術有限公司 Use and signaling to refine video coding tools
EP4032290A4 (en) 2019-10-18 2022-11-30 Beijing Bytedance Network Technology Co., Ltd. Syntax constraints in parameter set signaling of subpictures
US11575926B2 (en) * 2020-03-29 2023-02-07 Alibaba Group Holding Limited Enhanced decoder side motion vector refinement
CN115443653A (en) 2020-04-07 2022-12-06 抖音视界有限公司 Signaling of inter prediction in high level syntax
WO2021204233A1 (en) 2020-04-09 2021-10-14 Beijing Bytedance Network Technology Co., Ltd. Constraints on adaptation parameter set based on color format
WO2021204251A1 (en) 2020-04-10 2021-10-14 Beijing Bytedance Network Technology Co., Ltd. Use of header syntax elements and adaptation parameter set
CN115868159A (en) 2020-04-17 2023-03-28 抖音视界有限公司 Presence of adaptive parameter set units
WO2021222036A1 (en) 2020-04-26 2021-11-04 Bytedance Inc. Conditional signaling of video coding syntax elements
CN111679971B (en) * 2020-05-20 2021-07-20 北京航空航天大学 Adaboost-based software defect prediction method
CN111654708B (en) * 2020-06-07 2022-08-23 咪咕文化科技有限公司 Motion vector obtaining method and device and electronic equipment
CN116671101A (en) 2020-06-22 2023-08-29 抖音视界有限公司 Signaling of quantization information in a codec video
CN111901590B (en) * 2020-06-29 2023-04-18 北京大学 Refined motion vector storage method and device for inter-frame prediction
US11936899B2 (en) * 2021-03-12 2024-03-19 Lemon Inc. Methods and systems for motion candidate derivation
US11671616B2 (en) 2021-03-12 2023-06-06 Lemon Inc. Motion candidate derivation
EP4352960A1 (en) * 2021-05-17 2024-04-17 Beijing Dajia Internet Information Technology Co., Ltd. Geometric partition mode with motion vector refinement
CN113743357B (en) * 2021-09-16 2023-12-05 京东科技信息技术有限公司 Video characterization self-supervision contrast learning method and device
US20230128502A1 (en) * 2021-10-21 2023-04-27 Tencent America LLC Schemes for Adjusting Adaptive Resolution for Motion Vector Difference
WO2023140883A1 (en) * 2022-01-18 2023-07-27 Tencent America LLC Interdependence between adaptive resolution of motion vector difference and signaling/derivation of motion vector-related parameters
WO2023153893A1 (en) * 2022-02-13 2023-08-17 엘지전자 주식회사 Image encoding/decoding method and device, and recording medium storing bitstream
WO2023195824A1 (en) * 2022-04-08 2023-10-12 한국전자통신연구원 Method, device, and recording medium for image encoding/decoding

Family Cites Families (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3681342B2 (en) * 2000-05-24 2005-08-10 三星電子株式会社 Video coding method
CN101340578A (en) * 2007-07-03 2009-01-07 株式会社日立制作所 Motion vector estimating apparatus, encoder and camera
CN101272450B (en) 2008-05-13 2010-11-10 浙江大学 Global motion estimation exterior point removing and kinematic parameter thinning method in Sprite code
JP2010016453A (en) * 2008-07-01 2010-01-21 Sony Corp Image encoding apparatus and method, image decoding apparatus and method, and program
EP2343901B1 (en) * 2010-01-08 2017-11-29 BlackBerry Limited Method and device for video encoding using predicted residuals
JP5786478B2 (en) 2011-06-15 2015-09-30 富士通株式会社 Moving picture decoding apparatus, moving picture decoding method, and moving picture decoding program
RS64604B1 (en) * 2011-06-16 2023-10-31 Ge Video Compression Llc Entropy coding of motion vector differences
AU2012323631B2 (en) * 2011-10-11 2015-09-17 Mediatek Inc. Method and apparatus of motion and disparity vector derivation for 3D video coding and HEVC
WO2013068566A1 (en) * 2011-11-11 2013-05-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Adaptive partition coding
US20150016530A1 (en) * 2011-12-19 2015-01-15 James M. Holland Exhaustive sub-macroblock shape candidate save and restore protocol for motion estimation
US9325990B2 (en) 2012-07-09 2016-04-26 Qualcomm Incorporated Temporal motion vector prediction in video coding extensions
US9749642B2 (en) 2014-01-08 2017-08-29 Microsoft Technology Licensing, Llc Selection of motion vector precision
US10531116B2 (en) 2014-01-09 2020-01-07 Qualcomm Incorporated Adaptive motion vector resolution signaling for video coding
US10200713B2 (en) 2015-05-11 2019-02-05 Qualcomm Incorporated Search region determination for inter coding within a particular picture of video data
CN109005407B (en) * 2015-05-15 2023-09-01 华为技术有限公司 Video image encoding and decoding method, encoding device and decoding device
CN106331722B (en) * 2015-07-03 2019-04-26 华为技术有限公司 Image prediction method and relevant device
US10999595B2 (en) 2015-11-20 2021-05-04 Mediatek Inc. Method and apparatus of motion vector prediction or merge candidate derivation for video coding
KR20170059718A (en) 2015-11-23 2017-05-31 삼성전자주식회사 Decoding apparatus and decoding method thereof
CN114793279A (en) 2016-02-03 2022-07-26 Oppo广东移动通信有限公司 Moving image decoding device, encoding device, and predicted image generation device
US11109061B2 (en) 2016-02-05 2021-08-31 Mediatek Inc. Method and apparatus of motion compensation based on bi-directional optical flow techniques for video coding
WO2017147765A1 (en) * 2016-03-01 2017-09-08 Mediatek Inc. Methods for affine motion compensation
US10397569B2 (en) * 2016-06-03 2019-08-27 Mediatek Inc. Method and apparatus for template-based intra prediction in image and video coding
EP3264769A1 (en) 2016-06-30 2018-01-03 Thomson Licensing Method and apparatus for video coding with automatic motion information refinement
EP3264768A1 (en) 2016-06-30 2018-01-03 Thomson Licensing Method and apparatus for video coding with adaptive motion information refinement
US10631002B2 (en) * 2016-09-30 2020-04-21 Qualcomm Incorporated Frame rate up-conversion coding mode
US10448010B2 (en) * 2016-10-05 2019-10-15 Qualcomm Incorporated Motion vector prediction for affine motion models in video coding
BR112019012582A8 (en) * 2016-12-22 2023-02-07 Mediatek Inc MOTION REFINEMENT METHOD AND APPARATUS FOR VIDEO CODING
US10911761B2 (en) 2016-12-27 2021-02-02 Mediatek Inc. Method and apparatus of bilateral template MV refinement for video coding
US20180192071A1 (en) * 2017-01-05 2018-07-05 Mediatek Inc. Decoder-side motion vector restoration for video coding
US20180199057A1 (en) 2017-01-12 2018-07-12 Mediatek Inc. Method and Apparatus of Candidate Skipping for Predictor Refinement in Video Coding
US10701366B2 (en) 2017-02-21 2020-06-30 Qualcomm Incorporated Deriving motion vector information at a video decoder
US10491917B2 (en) 2017-03-22 2019-11-26 Qualcomm Incorporated Decoder-side motion vector derivation
US10595035B2 (en) 2017-03-22 2020-03-17 Qualcomm Incorporated Constraining motion vector information derived by decoder-side motion vector derivation
US10904565B2 (en) * 2017-06-23 2021-01-26 Qualcomm Incorporated Memory-bandwidth-efficient design for bi-directional optical flow (BIO)
US10757442B2 (en) * 2017-07-05 2020-08-25 Qualcomm Incorporated Partial reconstruction based template matching for motion vector derivation
CN107396102B (en) * 2017-08-30 2019-10-08 中南大学 A kind of inter-frame mode fast selecting method and device based on Merge technological movement vector
CN111630859B (en) 2017-12-14 2024-04-16 Lg电子株式会社 Method and apparatus for image decoding based on inter prediction in image coding system
US11265551B2 (en) 2018-01-18 2022-03-01 Qualcomm Incorporated Decoder-side motion vector derivation
CN112369021A (en) 2018-06-29 2021-02-12 韩国电子通信研究院 Image encoding/decoding method and apparatus for throughput enhancement and recording medium storing bitstream
WO2020031061A2 (en) 2018-08-04 2020-02-13 Beijing Bytedance Network Technology Co., Ltd. Mvd precision for affine
CN111010569B (en) 2018-10-06 2023-02-28 北京字节跳动网络技术有限公司 Improvement of temporal gradient calculation in BIO
WO2020084461A1 (en) 2018-10-22 2020-04-30 Beijing Bytedance Network Technology Co., Ltd. Restrictions on decoder side motion vector derivation based on coding information
CN112913249B (en) 2018-10-22 2022-11-08 北京字节跳动网络技术有限公司 Simplified coding and decoding of generalized bi-directional prediction index

Also Published As

Publication number Publication date
GB2590228A (en) 2021-06-23
US11109055B2 (en) 2021-08-31
GB2590222B (en) 2023-02-01
US20210067783A1 (en) 2021-03-04
CN110809165B (en) 2022-07-26
TW202025735A (en) 2020-07-01
GB2590228B (en) 2023-04-05
CN110809156A (en) 2020-02-18
WO2020031061A2 (en) 2020-02-13
TWI750494B (en) 2021-12-21
CN110809159B (en) 2022-06-07
CN110809156B (en) 2022-08-12
US20210185347A1 (en) 2021-06-17
US11330288B2 (en) 2022-05-10
TW202025734A (en) 2020-07-01
TW202013975A (en) 2020-04-01
CN110809165A (en) 2020-02-18
WO2020031058A1 (en) 2020-02-13
CN110809155A (en) 2020-02-18
WO2020031061A3 (en) 2020-04-02
TWI752341B (en) 2022-01-11
GB2590222A (en) 2021-06-23
US20200221117A1 (en) 2020-07-09
US20210185348A1 (en) 2021-06-17
US11451819B2 (en) 2022-09-20
CN110809155B (en) 2023-01-31
TWI753281B (en) 2022-01-21
GB202100505D0 (en) 2021-03-03
US20220272376A1 (en) 2022-08-25
CN110809159A (en) 2020-02-18
TW202025765A (en) 2020-07-01
WO2020031059A1 (en) 2020-02-13
CN115842912A (en) 2023-03-24
WO2020031062A1 (en) 2020-02-13
US11470341B2 (en) 2022-10-11
GB202100379D0 (en) 2021-02-24
TWI735929B (en) 2021-08-11

Similar Documents

Publication Publication Date Title
CN110809155B (en) Video processing method, device and computer readable medium
CN111010569B (en) Improvement of temporal gradient calculation in BIO
CN110933420B (en) Fast algorithm for adaptive motion vector resolution in affine mode
CN110620929B (en) Selected motion vector difference accuracy without motion vector prediction truncation
CN110581998B (en) Video processing method, apparatus and computer-readable recording medium
CN110740332B (en) Motion prediction based on updated motion vectors
CN110881124B (en) Two-step inter prediction
CN113678444B (en) Entropy coding of affine patterns with adaptive motion vector resolution
CN111010580B (en) Size limitation based on motion information
CN110809164B (en) MV precision in BIO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination