CN113632477A - Derivation of transformed uni-directional prediction candidates - Google Patents
Derivation of transformed uni-directional prediction candidates Download PDFInfo
- Publication number
- CN113632477A CN113632477A CN202080023944.4A CN202080023944A CN113632477A CN 113632477 A CN113632477 A CN 113632477A CN 202080023944 A CN202080023944 A CN 202080023944A CN 113632477 A CN113632477 A CN 113632477A
- Authority
- CN
- China
- Prior art keywords
- block
- prediction
- video
- motion vector
- current block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
- H04N19/54—Motion estimation other than block-based using feature points or meshes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Techniques for implementing video processing techniques are described. In one example embodiment, a method of video processing includes: for a transition between a current block of video and a bitstream representation of the video, a unidirectional motion vector is determined from a bidirectional motion vector if a condition of block size is satisfied. The uni-directional motion vector is then used as the Merge candidate for the transform. The method also includes performing a conversion based on the determination.
Description
Cross Reference to Related Applications
The present application claims in time the priority and benefit of international patent application No. pct/CN2019/079397 filed 24/3/2019, according to applicable patent laws and/or rules applicable to paris convention. The entire disclosure of the above application is incorporated by reference as part of the disclosure of the present application for all purposes of united states law.
Technical Field
This patent document relates to image and video encoding and decoding.
Background
Digital video accounts for the greatest bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth required to pre-count the usage of digital video will continue to grow.
Disclosure of Invention
This document discloses various video processing techniques that may be used by video encoders and decoders during encoding and decoding operations.
In one example aspect, a video processing method is disclosed. The method includes determining, for a conversion between a current block of video and a bitstream representation of the video using an affine encoding tool, that a first motion vector of a sub-block of the current block and a second motion vector, which is a representative motion vector of the current block, comply with a size constraint. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining an affine model including six parameters for a conversion between a current block of video and a bitstream representation of the video. The affine model inherits from affine coding information of neighboring blocks of the current block. The method also includes performing a transformation based on the affine model.
In another example aspect, a video processing method is disclosed. The method includes determining whether a bi-predictive coding technique is applicable to a block for a transition between the block of video and a bitstream representation of the video based on a block size having a width W and a height H, W and H being positive integers. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a block of video and a bitstream representation of the video, whether a coding tree splitting process is applicable to the block based on a size of a sub-block, the sub-block being a child coding unit of the block according to the coding tree splitting process. The sub-blocks have a width W and a height H, W and H being positive integers. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a current block of video and a bitstream representation of the video, whether to derive an index for bi-prediction with coding unit level weight (BCW) coding mode based on a rule regarding a location of the current block. In the BCW encoding mode, a weight set including a plurality of weights is used to generate a bidirectional predictor of a current block. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining an intra-prediction mode for a current block independent of intra-prediction modes of neighboring blocks for a transition between the current block of video encoded using a Combined Inter and Intra Prediction (CIIP) encoding technique and a bitstream representation of the video. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a current block of video encoded using a Combined Inter and Intra Prediction (CIIP) encoding technique and a bitstream representation of the video, an intra prediction mode for the current block based on a first intra prediction mode for a first neighboring block and a second intra prediction mode for a second neighboring block. The first neighboring block is encoded using an intra prediction encoding technique and the second neighboring block is encoded using a CIIP encoding technique. The first intra prediction mode is given a different priority than the second intra prediction mode. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a conversion between a current block of video and a bitstream representation of the video, whether a Combined Inter and Intra Prediction (CIIP) process is applicable to a color component of the current block based on a size of the current block. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a current block of video and a bitstream representation of the video, whether to apply a Combined Inter and Intra Prediction (CIIP) encoding technique to the current block based on a characteristic of the current block. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a current block of video and a bitstream representation of the video, whether to disable an encoding tool for the current block based on whether the current block is encoded in a Combined Inter and Intra Prediction (CIIP) encoding technique. The encoding tool includes at least one of bi-directional optical flow (BDOF), Overlapped Block Motion Compensation (OBMC), or decoder-side motion vector refinement procedure (DMVR). The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a block of video and a bitstream representation of the video, a first precision P1 of a motion vector for spatial motion prediction and a second precision P2 of the motion vector for temporal motion prediction. P1 and/or P2 are fractions and neither P1 nor P2 are signaled in the bitstream representation. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method comprises determining motion vectors (MVx, MVy) with precision (Px, Py) for a conversion between a block of the video and a bitstream representation of the video. Px is associated with MVx and Py is associated with MVy. MVx and MVy are stored as integers each having N bits, and MinX ≦ MVx ≦ MaxX and MinY ≦ MVy ≦ MaxY, MinX, MaxX, MinY, and MaxY being real numbers. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a current block of video and a bitstream representation of the video, whether a shared Merge list is applicable to the current block based on an encoding mode of the current block. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes, for a transition between a current block of video of size W × H and a bitstream representation of the video, determining a second block of size (W + N-1) x (H + N-1) for motion compensation during the transition. The second block is determined based on a reference block of size (W + N-1-PW) x (H + N-1-PH). N denotes the filter size, W, H, N, PW and PH are non-negative integers. PW and PH are not both equal to 0. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes, for a transition between a current block of video of size WxH and a bitstream representation of the video, determining a second block of size (W + N-1) x (H + N-1) for motion compensation during the transition. W, H is a non-negative integer, and N is a non-negative integer and is based on the filter size. During the conversion, a refined motion vector is determined based on a multi-point search according to a motion vector refinement operation on the original motion vector, and pixels along a reference block boundary are determined by repeating one or more non-boundary pixels. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining a positional predictor in a block based on a weighted sum of the positional inter and intra predictor values in the block for a transition between the block of video and a bitstream representation of the video encoded using a Combined Inter and Intra Prediction (CIIP) encoding technique. The weighted sum is based on adding an offset to an initial sum obtained based on the inter prediction value and the intra prediction value, and the offset is added before performing a right shift operation to determine the weighted sum. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes, for a transition between a current block of video and a bitstream representation of the video, determining a manner in which to represent encoding information of the current block in the bitstream representation based in part on whether a condition associated with a size of the current block is satisfied. The method also includes performing a conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method comprises for a conversion between a current block of the video and a bitstream representation of the video, determining a set of modified motion vectors, and performing the conversion based on the set of modified motion vectors. Since the current block satisfies the condition, the modified set of motion vectors is a modified version of the set of motion vectors associated with the current block.
In another example aspect, a video processing method is disclosed. The method includes for a transition between a current block of video and a bitstream representation of the video, determining a unidirectional motion vector from a bidirectional motion vector if a condition of block size is satisfied, wherein a decoder-side motion vector refinement (DMVR) process is subsequently disabled based on the determination to use the unidirectional motion vector during the transition; and performing the conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method comprises determining a unidirectional motion vector from a bidirectional motion vector for a transition between a current block of video and a bitstream representation of said video, if a block size is met. The uni-directional motion vector is then used as a Merge candidate for the transform. The method also includes performing the conversion in accordance with the determination.
In another example aspect, a video processing method is disclosed. The method includes determining, for a transition between a current block of video and a bitstream representation of the video, that a motion candidate for the current block is restricted to a uni-directional prediction candidate based on a size of the current block. The method also includes performing the conversion based on the determination.
In another example aspect, a video processing method is disclosed. The method includes determining a size constraint between a representative motion vector of an affine-encoded current video block and a motion vector of a sub-block of the current video block, and performing a conversion between a bitstream representation of the current video block or sub-block and pixel values by using the size constraint.
In another example aspect, a video processing method is disclosed. The method includes determining one or more sub-blocks of a current video block for an affine encoded current video block, wherein each sub-block has a size of MxN pixels, where M and N are multiples of 2 or 4, conforming a motion vector of the sub-block to a size limit, and conditionally performing a conversion between a bitstream representation and a pixel value of the current video block based on a flip-flop using the size limit.
In yet another example aspect, a video processing method is disclosed. The method includes determining that a current video block satisfies a size condition, and based on the determination, performing a conversion between a bitstream representation of the video block and pixel values by excluding a bi-predictive coding mode of the current video block.
In yet another example aspect, a video processing method is disclosed. The method includes determining that a current video block satisfies a size condition, and based on the determination, performing a transition between a bitstream representation and pixel values of the current video block, wherein an inter prediction mode is signaled in the bitstream representation according to the size condition.
In yet another example aspect, a video processing method is disclosed. The method comprises determining that a current video block satisfies a size condition, and based on the determination, performing a conversion between a bitstream representation and pixel values of the current video block, wherein generation of a Merge candidate list during the conversion depends on the size condition.
In yet another example aspect, a video processing method is disclosed. The method includes determining that a child coding unit of a current video block satisfies a size condition, and based on the determination, performing a conversion between a bitstream representation and pixel values of the current video block, wherein a coding tree partitioning process used to generate the child coding unit depends on the size condition.
In yet another example aspect, a video processing method is disclosed. The method includes determining a weight index of a generalized bi-directional prediction (GBi) process for a current video block based on a location of the current video block, and performing a conversion between the current video block and a bit stream representation thereof using the weight index to implement the GBi process.
In yet another example aspect, a video processing method is disclosed. The method includes determining that a current video block is encoded as an Intra Inter Prediction (IIP) encoded block, and performing a conversion between the current video block and its bit stream representation using a simplified rule for determining an intra prediction mode or Most Probable Mode (MPM) for the current video block.
In yet another example aspect, a video processing method is disclosed. The method includes determining that a current video block satisfies a simplification criterion, and performing a transition between the current video block and a bitstream representation by disabling an inter intra prediction mode for the transition or by disabling additional coding tools for the transition.
In yet another example aspect, a video processing method is disclosed. The method includes performing a conversion between a current video block and a bitstream representation of the current video block using a motion vector based encoding process, wherein: (a) during the conversion process, precision P1 is used to store spatial motion predictors and precision P2 is used to store temporal motion predictors, where P1 and P2 are fractions, or (b) precision Px is used to store x motion vectors and precision Py is used to store y motion vectors, where Px and Py are fractions.
In yet another example aspect, a video processing method is disclosed. The method includes interpolating a small sub-block of W1xH1 size within a large sub-block of W2xH2 size of a current video block by extracting a (W2+ N-1-PW) (H2+ N-1-PH) block, pixel-filling the extracted block, performing boundary pixel repetition on the pixel-filled block, and obtaining pixel values of the small sub-block, wherein W1, W2, H1, H2, and PW and PH are integers, and performing conversion between the current video block and a bitstream representation of the current video block using the interpolated pixel values of the small sub-block.
In yet another example aspect, a video processing method is disclosed. The method includes performing a motion compensation operation during a WxH-sized current video block and a transition of a bitstream representation of the current video block by fetching (W + N-1-PW) × (W + N-1-PH) reference pixels and filling reference pixels outside the fetched reference pixels during the motion compensation operation, and performing a transition between the current video block and the bitstream representation of the current video block using a result of the motion compensation operation, wherein W, H, N, PW and PH are integers.
In yet another example aspect, a video processing method is disclosed. The method includes determining that bi-prediction or uni-directional prediction of a current video block is not allowed based on a size of the current video block, and performing a conversion between a bitstream representation and pixel values of the current video block by disabling a bi-prediction or uni-directional prediction mode based on the determination.
In yet another example aspect, a video processing method is disclosed. The method includes determining that bi-prediction or uni-directional prediction of a current video block is not allowed based on a size of the current video block, and performing a conversion between a bitstream representation and pixel values of the current video block by disabling a bi-prediction or uni-directional prediction mode based on the determination.
In yet another example aspect, a video encoder apparatus is disclosed. The video encoder comprises a processor configured to implement the above-described method.
In yet another example aspect, a video encoder apparatus is disclosed. The video encoder comprises a processor configured to implement the above-described method.
In yet another example aspect, a computer-readable medium having code stored thereon is disclosed. The code embodies one of the methods described herein in the form of processor executable code.
These and other features are described throughout this document.
Drawings
Fig. 1 shows an example of sub-block based prediction.
FIG. 2A shows a 4-parameter affine model.
FIG. 2B shows a 6-parameter affine model.
Fig. 3 shows an example of an affine motion vector field for each sub-block.
Fig. 4A shows an example of a candidate for AF _ MERGE.
Fig. 4B shows another example of a candidate for AF _ MERGE.
Fig. 5 shows candidate positions of the affine Merge mode.
Fig. 6 shows an example of constrained sub-block motion vectors of a Coding Unit (CU) of an affine mode.
Fig. 7A shows an example of a 135 degree partition of a CU into two triangle prediction units.
Fig. 7B shows an example of a 45-degree partition pattern that partitions a CU into two triangle prediction units.
Fig. 8 shows an example of the positions of neighboring blocks.
Fig. 9 shows an example of repeated boundary pixels of a reference block before interpolation.
Fig. 10 shows an example of a Coding Tree Unit (CTU) and CTU (region) row. Shaded CTUs (regions) are located in one CUT (region) row and unshaded CTUs (regions) are located in another CUT (region) row.
Fig. 11 is a block diagram of an example of a hardware platform for implementing a video decoder or video encoder apparatus described herein.
FIG. 12 is a flow diagram of an example method of video processing.
Fig. 13 shows an example of a motion vector difference MVD (0, 1) mirrored between list 0 and list 1 in the DMVR.
Fig. 14 shows an exemplary MV that can be examined in one iteration.
Fig. 15 shows the filled reference samples and boundaries used for the calculation.
FIG. 16 is a block diagram of an example video processing system in which the disclosed technology may be implemented.
Fig. 17 is a flow chart representing a method for video processing according to the present disclosure.
Fig. 18 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 19 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 20 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 21 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 22 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 23 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 24 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 25 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 26 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 27 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 28 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 29 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 30 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 31 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 32 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 33 is a flow chart representing another method for video processing according to the present disclosure.
FIG. 34 is a flow chart representing another method for video processing according to the present disclosure.
Fig. 35 is a flow chart representing another method for video processing according to the present disclosure.
FIG. 36 is a flow chart representing another method for video processing according to the present disclosure.
FIG. 37 is a flow chart representing yet another method for video processing according to the present disclosure.
Detailed Description
The section headings are used in this document for ease of understanding and do not limit the applicability of the techniques and embodiments disclosed in each section to that section only.
1. Overview
This patent document relates to video/image coding techniques. In particular, it relates to reducing the bandwidth and line buffers of several coding tools in video/image coding. It can be applied to existing video coding standards (e.g. HEVC), or to pending standards (multi-function video coding). It may also be applicable to future video/image coding standards or video/image codecs.
2. Background of the invention
Video coding standards have evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T makes H.261 and H.263, ISO/IEC makes MPEG-1 and MPEG-4Visual, and the two organizations together make the H.262/MPEG-2 video and the H.264/MPEG-4 enhanced video coding (AVC) and H.265/HEVC standards. Starting from h.262, the video coding standard is based on a hybrid video coding structure, in which temporal prediction and transform coding are utilized. To explore future video coding techniques beyond HEVC, VCEG and MPEG united in 2015 into the joint video exploration group (jfet). Thereafter, JFET adopted many new approaches and introduced them into reference software known as the "Joint exploration model" (JEM). In month 4 of 2018, the joint video experts group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1SC29/WG11(MPEG) holds in an effort to the multifunctional video coding (VVC) standard, which aims at 50% bit rate reduction compared to HEVC.
2.1 inter prediction in HEVC/VVC
Interpolation filter
In HEVC, luminance subsamples are generated by an 8-tap interpolation filter and chrominance subsamples are generated by a 4-tap interpolation filter.
The filter is separable in two dimensions. Samples are first filtered horizontally and then vertically.
2.2 sub-block based prediction techniques
First, sub-block based prediction is introduced into the video coding standard by HEVC annex I (3D-HEVC). With sub-block based prediction, a block such as a Coding Unit (CU) or a Prediction Unit (PU) is divided into several non-overlapping sub-blocks. Different sub-blocks may be allocated different motion information, such as reference indices or Motion Vectors (MVs), and Motion Compensation (MC) is performed separately for each sub-block. Fig. 1 shows a concept demonstrating sub-block based prediction.
To explore future video coding techniques other than HEVC, VCEG and MPEG united in 2015 to form the joint video exploration group (jfet). Thereafter, jfet adopted many new methods and introduced them into reference software named "joint exploration model" (JEM).
In JEM, sub-block based prediction is employed in several coding tools, such as affine prediction, optional temporal motion vector prediction (ATMVP), spatial-temporal motion vector prediction (STMVP), bi-directional optical flow (BIO), and frame rate up-conversion (FRUC). Affine prediction has also been used for VVC.
2.3 affine prediction
In HEVC, only the translational motion model is applied to Motion Compensated Prediction (MCP). In the real world, there are various motions such as zoom-in/zoom-out, rotation, perspective motion, and other irregular motions. In VVC, a simplified affine transform motion compensated prediction is applied. As shown in fig. 2A-2B, the affine motion field of a block is described by two (in a 4-parameter affine model) or three (in a 6-parameter affine model) control point motion vectors.
The Motion Vector Field (MVF) of the block is described by a 4-parameter affine model in equation (1) (where 4 parameters are defined as variables a, b, e, and f)) and a 6-parameter affine model in equation (2) (where 6 parameters are defined as variables a, b, c, d, e, and f), respectively:
here, (mv)h 0,mvh 0) Is the motion vector of the upper left corner Control Point (CP), (mv)h 1,mvh 1) Is the motion vector of the upper right corner control point, (mv)h 2,mvh 2) Is the motion vector of the lower left corner control point, (x, y) represents the coordinates of the representative point relative to the upper left sample point in the current block, all three motion vectors are referred to as Control Point Motion Vectors (CPMV). The CP motion vectors may be signaled (as in affine AMVP mode) or derived on the fly (as in affine Merge mode). w and h are the width and height of the current block. In practice, division is achieved by rounding and right shifting. In the VTM, a representative point is defined as the center position of the subblock, for example, when the coordinates of the upper left corner of the subblock with respect to the upper left corner sample point in the current block are (xs, ys), the coordinates of the representative point are defined as (xs +2, ys + 2).
In a design free of division, equations (1) and (2) are implemented as:
for the 4-parameter affine model shown in (1):
for the 6-parameter affine model shown in (2):
In the end of this process,
Off=1<<(S-1)
where S represents the calculation accuracy. For example, in VVC, S ═ 7. In VVC, the MV for the subblock whose upper left corner sample point is (xs, ys) in MC is calculated using equation (6) with x ═ xs +2 and y ═ ys + 2.
As shown in fig. 3, to derive the motion vector of each 4 × 4 sub-block, the motion vector of the center sample point of each sub-block is calculated according to equation (1) or equation (2) and rounded to 1/16 fractional precision. A motion compensated interpolation filter is then applied to generate a prediction for each sub-block having a derived motion vector.
Affine models can inherit from spatially adjacent affine coding blocks, such as the left, top right, bottom left, and top left adjacent blocks shown in fig. 4A. For example, if the lower left adjacent block A in FIG. 4A is encoded in an affine mode, as in A0 in FIG. 4BAs shown, a Control Point (CP) motion vector mv is obtained that contains the top left, top right, and bottom left corners of the neighboring CU/PU for block A0 N、mv1 NAnd mv2 N. And is based on mv0 N、mv1 NAnd mv2 NCalculating the motion vector mv of the upper left corner/upper right/lower left of the current CU/PU0 C、mv1 CAnd mv2 C(for 6 parameter affine models only). It should be noted that in VTM-2.0, if the current block is affine encoded, the sub-block (e.g., 4 × 4 block in VTM) LT stores mv0 and RT stores mv 1. If the current block is encoded using a 6-parameter affine model, then LB stores mv 2; otherwise (encoded using a 4-parameter affine model), LB stores mv 2'. The other sub-blocks store MVs for the MC.
It should be noted that when a CU is encoded in affine Merge mode, e.g. in AF _ Merge mode, it obtains the first block encoded in affine mode from the valid neighboring reconstructed blocks. And as shown in fig. 4A, the candidate blocks are selected in order from left, top right, bottom left to top left.
Derived CP MV MV of current block in affine Merge mode0 C、mv1 CAnd mv2 CCan be used as the CP MV. Alternatively, they may be used as MVPs for affine inter mode in VVCs. It should be noted that for the Merge mode, if the current block is encoded in the affine mode, after deriving the CP MV of the current block, the current block may be further divided into a plurality of sub-blocks, and each block will derive its motion information based on the derived CP MV of the current block.
2.4 example embodiment in JFET
Unlike VTM, in VTM only one affine spatial neighboring block can be used to derive affine motion of the block. In some embodiments, a separate list of affine candidates for AF _ MERGE mode is constructed.
1) Inserting inherited affine candidates into a candidate list
Inherited affine candidates refer to candidates derived from valid neighbor reconstruction blocks encoded in affine mode.
As shown in FIG. 5, the scan order of the candidate blocks is A 1、B1、B0、A0And B2. When a block (e.g. A) is selected1) Then, a two-step process will be performed:
a first, deriving two/three control points for the current block using three angular motion vectors covering the CU of the block; and
deriving the sub-block motion for each sub-block within the current block based on the control point for the current block.
2) Insertion-built affine candidates
And if the number of candidates in the affine Merge candidate list is less than MaxMumAffinic, inserting the constructed affine candidate into the candidate list.
The constructed affine candidates refer to candidates constructed by combining neighbor motion information of each control point.
First, motion information of the control point is derived from the specified spatial and temporal neighbors shown in fig. 5. CPk (k ═ 1, 2, 3, 4) denotes the kth control point. A. the0,A1,A2,B0,B1,B2And B3Is the spatial position used to predict CPk ( k 1, 2, 3); t is the temporal location used to predict CP 4.
The coordinates of the CP1, CP2, CP3, and CP4 are (0, 0), (W, 0), (H, 0), and (W, H), respectively, where W and H are the width and height of the current block.
Motion information for each control point is obtained according to the following priority order:
a for CP1, check priority B2->B3->A2. If B is present2Can be used, then B is used2. Otherwise, if B2Not available, then B is used 3. If B is present2And B3Are all unusable, use A2. If none of these three candidates are available, no motion information for CP1 is available.
B for CP2, check priority B1- > B0;
c for CP3, check priority A1- > A0;
2.d for CP4, T is used.
Second, a motion model is constructed using a combination of control points.
The motion vectors of the three control points are needed to calculate the transformation parameters in the 6-parameter affine model. Three control points may be selected from one of the following four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). For example, a 6-parameter affine motion model is constructed using CP1, CP2, and CP3 control points, represented as affine (CP1, CP2, CP 3).
The motion vectors of the two control points are needed to calculate the transformation parameters in the 4-parameter affine model. Two control points may be selected from one of the following six combinations ({ CP1, CP4}, { CP2, CP3}, { CP1, CP2}, { CP2, CP4}, { CP1, CP3}, { CP3, CP4 }). For example, CP1 and CP2 control points are used to construct a 4-parameter affine motion model, denoted as affine (CP1, CP 2).
Inserting the combination of the constructed affine candidates into the candidate list in the following order: { CP1, CP2, CP3}, { CP1, CP2, CP4}, { CP1, CP3, CP4}, { CP2, CP3, CP4}, { CP1, CP2}, { CP1, CP3}, { CP2, CP3}, { CP1, CP4}, { CP2, CP4}, { CP3, CP4}
3) Inserting zero motion vectors
If the number of candidates in the affine Merge candidate list is less than MaxumAffiniBand, zero motion vectors are inserted into the candidate list until the list is full.
2.5 affine Merge candidate list
2.5.1 affine Merge mode
In the affine Merge mode of VTM-2.0.1, only the first available affine neighbors can be used to derive motion information for the affine Merge mode. In some embodiments, the candidate list of affine Merge patterns is constructed by searching for valid affine neighbors and combining the neighbor motion information for each control point.
The affine Merge candidate list is constructed as the following steps:
1) inserting inherited affine candidates
Inherited affine candidates refer to candidates derived from the affine motion model of its valid neighbor affine coding block. As shown in fig. 5, on a common basis, the scanning order of the candidate locations is: a1, B1, B0, a0 and B2.
After deriving the candidates, a full pruning process is performed to check if the same candidates have been inserted into the list. If the same candidate exists, the derived candidate is discarded.
2) Insertion-built affine candidates
If the number of candidates in the affine Merge candidate list is less than MaxmumAffinic and (set to 5 in this contribution), the constructed affine candidate is inserted into the candidate list. The construction affine candidates refer to candidates constructed by combining neighbor motion information of each control point.
First, motion information for a control point is derived from specified spatial and temporal neighbors. CPk (k ═ 1, 2, 3, 4) denotes the kth control point. A. the0,A1,A2,B0,B1,B2And B3Is the spatial position used to predict CPk ( k 1, 2, 3); t is the temporal location used to predict CP 4.
The coordinates of the CP1, CP2, CP3, and CP4 are (0, 0), (W, 0), (H, 0), and (W, H), respectively, where W and H are the width and height of the current block.
Motion information for each control point is obtained according to the following priority order:
for CP1, the checking priority is B2->B3->A2. If B is present2Can be used, then B is used2. Otherwise, if B2Not available, then B is used3. If B is present2And B3Are all unusable, use A2. If none of these three candidates are available, no motion information for CP1 is available.
For CP2, check priority B1- > B0;
for CP3, check priority a1- > a 0;
for CP4, T is used.
Second, a motion model is constructed using a combination of control points.
Motion information of three control points is required to construct a 6-parameter affine candidate. Three control points may be selected from one of the following four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combinations { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4} will be converted into 6-parameter motion models represented by upper-left, upper-right and lower-left control points.
Motion information of two control points is required to construct a 4-parameter affine candidate. Two control points may be selected from one of the following six combinations ({ CP1, CP4}, { CP2, CP3}, { CP1, CP2}, { CP2, CP4}, { CP1, CP3}, { CP3, CP4 }). The combinations { CP1, CP4}, { CP2, CP3}, { CP2, CP4}, { CP1, CP3}, { CP3, CP4} will be transformed into 4-parameter motion model points represented by the upper left and upper right control points.
Inserting the combination of the constructed affine candidates into the candidate list in the following order:
{CP1,CP2,CP3},{CP1,CP2,CP4},{CP1,CP3,CP4},{CP2,CP3,CP4},{CP1,CP2},{CP1,CP3},{CP2,CP3},{CP1,CP4},{CP2,CP4},{CP3,CP4}
for the combined reference list X (X is 0 or 1), the reference index with the highest usage rate in the control points is selected as the reference index for list X, and the scaling is directed to the motion vector of the difference reference picture.
After deriving the candidates, a full pruning process is performed to check if the same candidates have been inserted into the list. If the same candidate exists, the derived candidate is discarded.
3) Filling with zero motion vectors
If the number of candidates in the affine Merge candidate list is less than 5, a zero motion vector with a zero reference index is inserted in the candidate list until the list is full.
Example of 2.5.2 affine Merge modes
In some embodiments, the affine Merge mode may be simplified as follows:
1) By comparing coding units covering neighboring positions instead of comparing affine candidates derived in VTM-2.0.1, the pruning process for inherited affine candidates is simplified. At most 2 inherited affine candidates are inserted into the affine Merge list. The pruning process of the constructed affine candidates is completely removed.
2) The MV scaling operations in the constructed affine candidates are removed. If the reference indices of the control points are different, the constructed motion model is discarded.
3) The number of constructed affine candidates is reduced from 10 to 6.
4) In some embodiments, other Merge candidates with sub-block prediction (such as ATMVP) are also put into the affine Merge candidate list. In that case, the affine Merge candidate list may be renamed with some other name such as the sub-block Merge candidate list.
2.6 example of control Point MV offset for affine Merge mode
Generating a new affine Merge candidate based on the CPMV offset of the first affine Merge candidate. If the first affine Merge candidate enables a 4-parameter affine model, then by offsetting the 2 CPMVs of the first affine Merge candidate, 2 CPMVs can be derived for each new affine Merge candidate. Otherwise (6-parameter affine model enabled), then 3 CPMVs are derived for each new affine Merge candidate by shifting the 3 CPMVs of the first affine Merge candidate. In uni-directional prediction, the CPMV offset is applied to the first candidate CPMV. In bi-prediction with list 0 and list 1 in the same direction, the CPMV offset is applied to the first candidate as follows:
MVnew(L0),i=MVold(L0)+MVoffset(i)Equation (8)
MVnew(L1),i=MVold(L1)+MVoffset(i)Equation (9)
In bi-directional prediction with list 0 and list 1 in opposite directions, the CPMV offset is applied to the first candidate as follows:
MVnew(L0),i=MVold(L0)+MVoffset(i)equation (10)
MVnew(L1),i=MVold(L1)-MVoffset(i)Equation (11)
Various offset directions with various offsets may be used to generate new affine Merge candidates. Two implementations have been tested:
(1) generate 16 new affine Merge candidates with 8 different offset directions with 2 different offsets as shown in the following offset set:
the offset set is { (4, 0), (0, 4), (-4, 0), (0, -4), (-4, -4), (4, -4), (4, 4), (-4, 4), (8, 0), (0, 8), (-8, 0), (0, -8), (-8, -8), (8, 8), (-8, 8) }.
For this design, the affine Merge list is increased to 20. The total number of potential affine Merge candidates is 31.
(2) Generate 4 new affine Merge candidates with 1 offset with 4 different offset directions as shown in the following offset set:
the offset set is { (4, 0), (0, 4), (-4, 0), (0, -4) }.
As in VTM-2.0.1, the affine Merge list remains 5. The four time-domain constructed affine Merge candidates are removed to keep the number of potential affine Merge candidates unchanged, for example, 15 in total. Let the coordinates of CPMV1, CPMV2, CPMV3, and CPMV4 be (0, 0), (W, 0), (H, 0), and (W, H). Note that CPMV4 is derived from the time domain MV as shown in fig. 6. The removed candidates are the following four time-domain correlated constructed affine Merge candidates: { CP2, CP3, CP4}, { CP1, CP4}, { CP2, CP4}, and { CP3, CP4 }.
2.7 Bandwidth problem for affine motion compensation
Since the current block is divided into 4x4 sub-blocks for the luma component and 2 x 2 sub-blocks for the two chroma components for motion compensation, the total bandwidth requirement is much higher than non-sub-block inter prediction. To solve the bandwidth problem, several methods are proposed.
2.7.1 example 1
The 4x4 block serves as the sub-block size of the uni-directional affine coded CU, while the 8x4/4x8 block serves as the sub-block size of the bi-directional affine coded CU.
2.7.2 example 2
For affine mode, the sub-block motion vectors of affine CUs are constrained within a predefined motion vector field. Suppose the motion vector of the first (upper left) sub-block is (v)0x,v0y) The motion vector of the second sub-block is (v)ix,viy) Then v isixAnd viyThe values of (a) show the following constraints:
vix∈[v0x-H,v0x+H]equation (12)
viy∈[v0y-V,v0y+V]Equation (13)
If the motion vector of any sub-block exceeds the predefined motion vector field, the motion vector is clipped. An illustration of the idea of constraining the sub-block motion vectors is given in fig. 6.
Assuming that the memory is retrieved per CU instead of per sub-block, the values H and V are chosen such that the worst-case memory bandwidth of an affine CU will not exceed the bandwidth of the normal inter-frame MC of the 8 × 8 bi-prediction block. Note that the values of H and V apply to CU size as well as to unidirectional prediction or bidirectional prediction.
2.7.3 example 3
To reduce storage bandwidth requirements in affine prediction, each 8x8 block within the block is treated as a base unit. The MVs of all four 4x4 sub-blocks within an 8x8 block are constrained such that the maximum difference between the integer parts of the four 4x4 sub-blocks MVs does not exceed 1 pixel. The bandwidth is then (8+7+1) × (8+7+1)/(8 × 8) ═ 4 spots/pixel.
In some cases, after calculating the MVs of all sub-blocks within the current block using the affine model, the MVs of the sub-blocks containing a control point are first replaced with the corresponding control point MVs. This means that the MVs of the upper left, upper right and lower left sub-blocks are replaced by upper left, upper right and lower left control points MV, respectively. Then, for each 8x8 block within the current block, the MVs of all four 4x4 sub-blocks are clipped to ensure that the maximum difference between the integer parts of the four MVs does not exceed 1 pixel. It should be noted here that the sub-blocks containing control points (top left, top right and bottom left sub-blocks) participate in the MV clipping process using the respective control points MV. The MV at the upper right control point remains unchanged during clipping.
The clipping process applied to each 8x8 block is described as follows:
1. the minimum and maximum values MVminx, MVminy, MVmaxx, MVmaxy of the MV components are first determined for each 8 × 8 block, as follows:
a) Obtaining minimum MV component in four 4x4 subblocks MV
MVminx=min(MVx0,MVx1,MVx2,MVx3)
MVminy=min(MVy0,MVy1,MVy2,MVy3)
b) Using the integer part of MVminx and MVminy as the minimum MV component
MVminx=MVminx>>MV_precision<<MV_precision
MVminy=MVminy>>MV_precision<<MV_precision
c) The maximum MV component is calculated as follows:
MVmaxx=MVminx+(2<<MV_precision)–1
MVmaxy=MVminy+(2<<MV_precision)–1
d) if the upper right control point is in the current 8x8 block
If (MV1x > MVmaxx)
MVminx=(MV1x>>MV_precision<<MV_precision)–(1<<MV_precision)
MVmaxx=MVminx+(2<<MV_precision)–1
If (MV1y > MVmaxy)
MVminy=(MV1y>>MV_precision<<MV_precision)–(1<<MV_precision)
MVmaxy=MVminy+(2<<MV_precision)–1
2. The MV components of each 4x4 block inside the 8x8 block are clipped as follows:
MVxi=max(MVminx,min(MVmaxx,MVxi))
MVyi=max(MVminy,min(MVmaxy,MVyi))
where (MVxi, MVyi) is the MV of the ith sub-block within an 8 × 8 block, where i is 0, 1, 2, 3; (MV1x, MV1y) is the MV for the upper right control point; MV _ precision equals 4, corresponding to 1/16 motion vector fractional precision. Since the difference between the integer parts of MVminx and MVmaxx (MVminy and MVmaxy) is 1 pixel, the maximum difference between the integer parts of the four 4 × 4 sub-blocks MV is not more than 1 pixel.
In some embodiments, a similar approach may also be used to handle planar modes.
2.7.4 example 4
In some embodiments, the constraint of affine mode is used for worst case bandwidth reduction. To ensure that the worst-case bandwidth of an affine block is no worse than the INTER 4x8/INTER 8x4 block or even the INTER 9x9 block, the motion vector difference between affine control points is used to determine whether the sub-block size of the affine block is 4x4 or 8x 8.
General affine constraints for worst case bandwidth reduction
The memory bandwidth reduction of affine mode is controlled by constraining the motion vector difference between affine control points (also called control point difference). In general, if the control point differences satisfy the following constraints, the affine motion will use the 4x4 sub-block (i.e., the 4x4 affine pattern). Otherwise, it will use the 8x8 sub-block (8x8 affine pattern). The constraints of the 6-parameter and 4-parameter models are as follows.
To derive constraints for different block sizes (wxh), the motion vector differences for control points are normalized to:
Norm(v2x-v0x)=(v2x-v0x) 128/h equation (14)
In the 4-parameter affine model, (v)2x-v0x) And (v)2y-v0y) The settings were as follows:
(v2x-v0x)=-(v1y-v0y)
(v2y-v0y)=-(v1x-v0x) Equation (15)
Thus, (v)2x-v0x) And (v)2y-v0y) The norm of (a) is:
Norm(v2x-v0x)=-Norm(v1y-v0y)
Norm(v2y-v0y)=Norm(v1x-v0x) Equation (16)
The constraint to ensure worst case bandwidth is to reach INTER _4x8 or INTER _8x 4:
|Norm(v1x-v0x)+Norm(v2x-v0x)+128|+
|Norm(v1y-v0y)+Norm(v2y-v0y)+128|+
|Norm(v1x-v0x)-Norm(v2x-v0x)|+
|Norm(v1y-v0y)-Norm(v2y-v0y)|
<128 x 3.25 equation (17)
The left side of equation (18) represents the reduction or span level of the sub-affine block, while (3.25) indicates a 3.25 pixel offset.
The constraint to ensure worst case bandwidth is to reach INTER-9 x9
(4*Norm(v1x-v0x)>-4*pel&&+4*Norm(v1x-v0x)<pel)&&
(4*Norm(v1y-v0y)>-pel&&4*Norm(v1y-v0y)<pel)&&
(4*Norm(v2x-v0x)>-pel&&4*Norm(v2x-v0x)<pel)&&
(4*Norm(v2y-v0y)>-4*pel&&4*Norm(v2y-v0y)<pel)&&
((4*Norm(v1x-v0x)+4*Norm(v2x-v0x)>-4*pel)&&
(4*Norm(v1x-v0x)+4*Norm(v2x-v0x)<pel))&&
((4*Norm(v1y-v0y)+4*Norm(v2y-v0y)>-4*pel)&&
(4*Norm(v1y-v0y)+4*Norm(v2y-v0y)<pel)) equation (18)
Where pel is 128 x 16(128 and 16 indicate normalization factor and motion vector precision, respectively).
2.8. Generalized bi-directional prediction improvement
Some embodiments improve the gain complexity tradeoff for GBi and are employed in BMS2.1 GBi, also known as bi-directional prediction with CU level weights (BCW). BMS2.1 GBi applies unequal weights to the predictors from L0 and L1 in bi-directional prediction mode. In the inter prediction mode, a plurality of weight pairs including equal weight pairs (1/2 ) are evaluated based on Rate Distortion Optimization (RDO), and GBi indexes of the selected weight pairs are signaled to a decoder. In Merge mode, the GBi index is inherited from an adjacent CU. In BMS2.1 GBi, equation (19) shows the predictor generation in bi-directional prediction mode.
PGBi=(w0*PL0+w1*PL1+RoundingOffsetGBi)>>shiftNumGBi, equation (19)
Wherein PGBiIs the final predictor of GBi. w is a0And w1Are selected GBi weight pairs and are applied to the predictors of list 0(L0) and list 1(L1), respectively. RoundingOffsetGBi and shiftNumGBi are used to normalize the final predictor in GBi. Supported w1The weight set is { -1/4,3/8,1/2,5/8,5/4}, where five weights correspond to an equal weight pair and four unequal weight pairs, respectively. Hybrid gain, e.g. w1And w0The sum was fixed to 1.0. Thus, corresponding w0The set of weights is {5/4,5/8,1/2,3/8, -1/4 }. The weight pair selection is at the CU level.
For non-low delay pictures, the weight set size is reduced from five to three, where w1The weights are set to {3/8,1/2,5/8}, and w0The weights are set to {5/8,1/2,3/8 }. In this contribution, the weight-set size reduction of non-low-delay pictures is applicable to BMS2.1 GBi and all GBi tests.
In some embodiments, the following modifications are made on the basis of the existing GBi design in BMS2.1 to further improve GBi performance.
2.8.1 GBi encoder error repair
To reduce GBi encoding time, in current encoder designs, the encoder will store the uni-directional prediction motion vector estimated from the GBi weight equal to 4/8 and reuse it for uni-directional prediction search of other GBi weights. The quick coding method is simultaneously applied to a translation motion model and an affine motion model. In VTM 2.0, a 6-parameter affine model is employed together with a 4-parameter affine model. When GBi weight equals 4/8, BMS2.1 encoder stores uni-predictive affine MVs, BMS2.1 encoder does not distinguish between 4-parameter affine models and 6-parameter affine models. Thus, after encoding with GBi weights 4/8, a 4-parameter affine MV may be overlaid with a 6-parameter affine MV. For other GBi weights, the stored 6-parameter affine MV may be used for the 4-parameter affine ME, or the stored 4-parameter affine MV may be used for the 6-parameter affine ME. The proposed GBi encoder error recovery is to store the 4-parameter affine and 6-parameter affine MVs separately. When the GBi weights are equal to 4/8, the encoder will store these affine MVs based on the affine model type and reuse the corresponding affine MVs for other GBi weights based on the affine model type.
CU size constraints of 2.8.2 GBi
In this approach, for small CUs, GBi is disabled. In inter prediction mode, if bi-prediction is used and the CU area is less than 128 luma samples, GBi is disabled without signaling.
2.8.3 Merge mode with GBi
In the Merge mode, the GBi index is not signaled. Instead, it is inherited from the merged neighbor block. When a TMVP candidate is selected, GBi will be turned off in this block.
2.8.4 affine prediction with GBi
GBi may be used when encoding a current block using affine prediction. For affine inter mode, the GBi index will be signaled. For affine Merge mode, the GBi index inherits from the merged neighbor block. If the constructed affine model is selected, then GBi will be closed in this block.
2.9 example of Inter-intra prediction mode (IIP)
In inter-intra prediction mode, multi-hypothesis prediction combines one intra prediction and one Merge index prediction. Such a block is considered a special inter-coded block.
In the Merge CU, when the flag is true, a flag is signaled for the Merge mode to select the intra mode from the intra candidate list. For the luma component, the intra candidate list is derived from 4 intra prediction modes including DC, planar, horizontal, and vertical modes, and the size of the intra candidate list may be 3 or 4 based on the block shape. When the CU width is greater than twice the CU height, the horizontal mode does not include the intra-mode list, and when the CU height is greater than twice the CU width, the vertical mode is to be removed from the intra-mode list. One intra prediction mode selected by an intra mode index and one Merge index prediction selected by a Merge index are combined using a weighted average. For the chroma component, DM is always applied without additional signaling.
The weights of the combined predictions are described below. Equal weights will be applied when DC or planar mode, or CB width or height less than 4, is selected. For a CB whose CB width and height are greater than or equal to 4, when the horizontal/vertical mode is selected, one CB is first divided vertically/horizontally into four equal-area regions. Each weight set, denoted as (w _ intra)i,w_interi) Where i ranges from 1 to 4 and (w _ intra)1,w_inter1)=(6,2),(w_intra2,w_inter2)=(5,3),(w_intra3,w_inter3) (3,5) and (w _ intra)4,w_inter4) Apply to the corresponding region (2, 6). (w _ intra)1,w_inter1) For the region closest to the reference sample point, and (w _ intra)4,w_inter4) For the region furthest from the reference sample point. The combined prediction can then be calculated by adding the two weighted predictions and right-shifting by 3 bits. In addition, the intra-prediction mode of the intra-hypothesis for the predictor may be saved for reference by a subsequent neighboring CU.
Assuming that intra and inter prediction values are PIntra and Pinter, the weighting factors are w _ intra and w _ inter, respectively. The predicted value of the position (x, y) is calculated as (PIntra (x, y) × w _ intra (x, y) + pin (x, y) × w _ inter (x, y)) > > N, wherein w _ intra (x, y) + w _ inter (x, y) ═ 2^ N.
Signaling intra prediction modes in IIP coded blocks
When inter-intra mode is used, one of four allowed intra prediction modes, namely DC, planar, horizontal and vertical, is selected and signaled. Three most likely modes (MPMs) are constructed from the left and top of adjacent blocks. The intra prediction mode of the intra-coded neighboring block or the IIP-coded neighboring block is regarded as one MPM. If the intra-prediction mode is not one of the four allowed intra-prediction modes, it will be rounded to vertical or horizontal mode according to the angle difference. The neighboring blocks must be located in the same CTU row as the current block.
Assume that the width and height of the current block are W and H. If W >2 × H or H >2 × W, only one of the three MPMs can be used in inter-intra mode. Otherwise, all four valid intra prediction modes can be used in inter-intra mode.
It should be noted that intra prediction modes in inter-intra mode cannot be used to predict intra prediction modes in normal intra coded blocks.
Only when W × H > -64, frame inter-intra prediction can be used.
2.10 examples of trigonometric prediction modes
The concept of the triangulation mode (TPM) is to introduce a new triangulation for motion compensated prediction. As shown in fig. 7A and 7B, it divides a CU into two triangular prediction units in a diagonal direction or a diagonal opposite direction. Each triangular prediction unit in a CU uses a reference frame index derived from a unidirectional prediction candidate list and its own unidirectional prediction motion vector for inter prediction. After the prediction of the triangular prediction unit, adaptive weighting processing is performed on diagonal edges. The transform and quantization process is then applied to the entire CU. Note that this mode only applies to skip and Merge modes.
2.10.1 unidirectional prediction candidate list for TPM
The uni-directional prediction candidate list consists of five uni-directional prediction motion vector candidates. As shown in fig. 8, it is derived from seven adjacent blocks including five spatially adjacent blocks (1 to 5) and two temporally juxtaposed blocks (6 to 7). Motion vectors of seven neighboring blocks are collected and put into the uni-directional prediction candidate list according to the order of the uni-directional prediction motion vector, the L0 motion vector of the bi-prediction motion vector, the L1 motion vector of the bi-prediction motion, and the average motion vector of the L0 and L1 motion vectors of the bi-prediction motion vector. If the number of candidates is less than five, a zero motion vector is added to the list. The motion candidates added to this list are referred to as TPM motion candidates.
More specifically, it relates to the following steps:
1) from A without any clipping operation1,B1,B0,A0,B2Motion candidates are obtained in Col and Col2 (corresponding to blocks 1-7 in fig. 8).
2) Set variable numcurMergeCand 0
3) For slave A1,B1,B0,A0,B2Each motion candidate derived by Col and Col2, and numcurrmeasurcand is less than 5, if the motion candidate is uni-directionally predicted (from list 0 or list 1), it is added to the large list and numcurrmeasurcand is increased by 1. Such added motion candidates are referred to as "candidates for original unidirectional prediction".
Full clipping is applied.
4) For slave A1,B1,B0,A0,B2Each motion candidate derived by Col and Col2, and numcurrmeasurcand is less than 5, if the motion candidate is bi-predicted, the motion information in list 0 is added to the Merge list (i.e., modified to uni-prediction from list 0), and numcurrmeasurcand is increased by 1. Such added motion candidates are referred to as "Truncated (Truncated) candidates for list 0 prediction".
Full clipping is applied.
5) For slave A1,B1,B0,A0,B2Each motion candidate derived by Col and Col2, and numcurrmeasurcand is less than 5, if the motion candidate is bi-predicted, the motion information in list 1 is added to the Merge list (i.e., modified to uni-directional prediction from list 1), and numcurrmeasurcand is increased by 1. Such added motion candidates are referred to as "truncated list 1 predicted candidates".
Full clipping is applied.
6) For slave A1,B1,B0,A0,B2Each motion candidate derived by Col and Col2, and numcurrmemegared is less than 5, if the motion candidate is bi-predictive,
if the slice quantization parameter QP of the list 0 reference picture is smaller than the slice QP of the list 1 reference picture, the motion information of list 1 is first scaled to the list 0 reference picture and the average of two MVs (one from the original list 0 and the other the scaled MV from list 1) is added to the Merge list, which is the average uni-directional prediction from the list 0 motion candidate, and numCurrMergeCand is increased by 1.
Otherwise, first scale the motion information of list 0 to list 1 reference pictures, then add the average of two MVs (one from the original list 1 and the other the scaled MV from list 0) to the Merge list, which is the average uni-directional prediction from the list 1 motion candidates, and numcurmergergecand is increased by 1.
Full clipping is applied.
7) If numcurrMergeCand is less than 5, a zero motion vector candidate is added.
Decoder-side motion vector refinement (DMVR) in VVC
For DMVR in VVC, assume that the MVD mirror between list 0 and list 1 is as shown in fig. 13, and perform bilateral matching to refine the MV, e.g., find the best MVD among several MVD candidates. MVs of the two reference picture lists are denoted by MVL0(L0X, L0Y) and MVL1(L1X, L1Y). The MVD of list 0 represented by (MvdX, MvdY) is defined as the optimal MVD, which minimizes a cost function (e.g., SAD). For the SAD function, it is defined as the SAD between the list 0 reference blocks (derived using the motion vectors in list 0(L0X + MvdX, L0Y + MvdY)) and the list 1 reference blocks (derived using the motion vectors in list 1 (L1X-MvdX, L1Y-MvdY)).
The motion vector refinement process may iterate twice. In each iteration, up to 6 MVDs (with integer pixel precision) can be checked in two steps, as shown in fig. 14. In a first step, the MVD (0, 0), (-1,0), (1,0), (0, -1), (0,1) is checked. In a second step, one of the MVDs (-1, -1), (-1, 1), (1, -1) or (1, 1) may be selected and further examined. Assume that the function Sad (x, y) returns the Sad value of MVD (x, y). The MVD represented by (MvdX, MvdY) examined in the second step is decided as follows:
MvdX=-1;
MvdY=-1;
if (Sad (1,0) < Sad (-1,0))
MvdX=1;
If (Sad (0,1) < Sad (0, -1))
MvdY=1;
In the first iteration, the starting point is the signaled MV, and in the second iteration, the starting point is the signaled MV plus the best MVD selected in the first iteration. DMVR only applies when one reference picture is a previous picture and the other reference picture is a subsequent picture and the picture order count distances of both reference pictures are the same as the current picture.
To further simplify the process of DMVR, the following main features may be implemented in some embodiments:
1. early termination occurs when the (0, 0) position SAD between list 0 and list 1 is less than the threshold.
2. For some positions, early termination occurs when the SAD between List 0 and List 1 is zero.
Block size of DMVR: w H > 64& & H > 8, where W and H are the width and height of the block.
4. For DMVR with CU size >16 × 16, the CU is divided into a plurality of 16 × 16 sub-blocks. If only the width or height of a CU is greater than 16, it is only divided in the vertical or horizontal direction.
5. The reference block size (W +7) × (H +7) (for luminance).
6.25 Point SAD-based integer Pixel search (e.g., (+ -)2 refine the search Range, Single level)
7. DMVR based on bilinear interpolation.
8. Sub-pixel refinement based on "parametric error surface equation". This process is only performed when the minimum SAD cost is not equal to zero and the optimal MVD was (0, 0) in the last MV refinement iteration.
9. Luma/chroma MC with reference block padding (if needed).
10. Refinement MV only for MC and TMVP.
2.11.1 use of DMVR
A DMVR may be enabled when all of the following conditions are met:
a DMVR enabled flag (e.g., SPS _ DMVR _ enabled _ flag) in the SPS is equal to 1.
TPM flag, interframe affine flag and subblock Merge flag (ATMVP or affine Merge), MMVD flag all equal to 0.
The Merge flag is equal to 1.
-the current block is bi-predicted, and a Picture Order Count (POC) distance between a reference picture in list 1 and the current picture is equal to a POC distance between a reference picture in list 0 and the current picture.
-the height of the current CU is greater than or equal to 8.
The number of brightness samples (CU width height) is greater than or equal to 64.
2.11.2 subpixel refinement based on "parametric error surface equation
The method is summarized as follows:
1. the parametric error surface fit is only calculated when the center position is the best cost position in a given iteration.
2. The cost of center position and the cost of center-to-center distance (-1,0), (0, -1), (1,0) and (0,1) positions are used to fit a two-dimensional parabolic error surface equation of the form
E(x,y)=A(x-x0)2+B(y-y0)2+C
Wherein (x)0,y0) Corresponding to the position with the lowest cost, C corresponds to the minimum cost value. By solving 5 equations of 5 unknowns, (x) can be expressed0,y0) The calculation is as follows:
x0=(E(-1,0)-E(1,0))/(2(E(-1,0)+E(1,0)-2E(0,0)))
y0=(E(0,-1)-E(0,1))/(2((E(0,-1)+E(0,1)-2E(0,0))
(x0,y0) The sub-pixel precision can be calculated to any desired sub-pixel precision by adjusting the precision with which the division is performed (e.g., how many bit quotients are calculated). For 1/16 pixel precision, only 4 bits of the absolute value of the quotient need to be calculated, which makes itself possible to implement the 2 divisions required for each CU based on fast-shifting subtraction.
3. Will calculate (x)0,y0) Is added to the integer distance refinement MV to obtain the sub-pixel accurate refinement increment MV.
2.11.3 reference spots required in DMVR
For a block of size W × H, (W +2 × offSet + filterSize-1) reference samples are needed, assuming that the maximum allowed MVD value is +/-offSet (e.g., 2 in VVC) and the filter size is filterSize (e.g., 8 for luminance and 4 for chrominance in VVC). To reduce memory bandwidth, only the center (W + filterSize-1) × (H + filterSize-1) reference samples are extracted and other pixels are generated by repeating the boundaries of the extracted samples. An example of an 8 by 8 block is shown in fig. 15, extracting 15 by 15 reference spots (labeled 1502) and repeating (labeled 1504) the boundaries of the extracted spots to generate a 17 by 17 region (labeled 1506).
During motion vector refinement, bilinear motion compensation is performed using these reference samples. At the same time, final motion compensation is also performed using these reference samples.
2.12 Bandwidth calculation for different Block sizes
The memory bandwidth for each block unit (4: 2: 0 color format, one MxN luma block with two M/2xN/2 chroma blocks) is listed in Table 1 below based on the current 8-tap luma interpolation filter and 4-tap chroma interpolation filter.
TABLE 1 memory Bandwidth example
Similarly, the memory bandwidth per MxN luma block unit is listed in table 2 below based on the current 8-tap luma interpolation filter and 4-tap chroma interpolation filter
TABLE 2 memory Bandwidth example
Thus, regardless of the color format, the bandwidth requirements for each block size are arranged in descending order:
4 x 4 bi-directional >4 x 8 bi-directional >4 x 16 bi-directional >4 x 4 uni-directional >8 x 8 bi-directional >4 x 32 bi-directional >4 x 64 bi-directional >4 x 128 bi-directional >8 x 16 bi-directional >4 x 8 uni-directional >8 x 32 bi-directional.
Motion vector accuracy problem in 2.12 VTM-3.0
In VTM-3.0, the MV precision is 1/16 luma pixels in storage. When the MV is signaled, the highest precision is 1/4 luma pixels.
3. Examples of problems addressed by the disclosed embodiments
1. The bandwidth control method for affine prediction is not clear enough and should be more flexible.
2. Note that in HEVC design, even though the Coding Units (CUs) may be partitioned with asymmetric prediction modes (such as partitioning one 16x16 into two PUs equal in size to 4x16 and 12x 16), the worst case memory bandwidth requirement is 8x8 bi-prediction. In VVC, one CU may be set to 4x16 due to the new QTBT partition structure, and bi-prediction may be enabled. The bi-predicted 4x16CU requires higher memory bandwidth than the bi-predicted 8x8 CU. It is not known how to handle block sizes that require higher bandwidth, such as 4x16 or 16x 4.
3. New coding tools (e.g., GBi) introduce more line buffer problems
4. Inter intra mode requires more memory and logic to signal the intra prediction mode used in inter coded blocks.
The 5.1/16 luminance pixel MV precision requires higher memory storage.
6. To interpolate four 4x4 blocks in one 8x8 block requires (8+7+1) × (8+7+1) reference pixels to be extracted and about 14% more pixels compared to the non-affine/non-planar pattern 8x8 block.
7. The averaging operation in hybrid intra and inter prediction should remain consistent with other coding tools, such as weighted prediction, local illumination compensation, OBMC, and triangle prediction, where an offset is added before shifting.
4. Examples of the embodiments
The techniques disclosed herein may reduce the bandwidth and line buffers required in affine prediction and other new encoding tools.
The following description should be considered as an example to explain the general concept, and should not be interpreted in a narrow sense. Furthermore, the embodiments may be combined in any manner.
In the following discussion, the width and height of the affine-coded current CU are w and h, respectively. Assume that the interpolation filter tap (in motion compensation) is N (e.g., 8, 6, 4, or 2) and the current block size is WxH.
Bandwidth control for affine prediction
Example 1:suppose that the motion vector of a subblock SB in an affine coded block is MVSB(expressed as (MVx, MVy)), MVSBMay be within a certain range with respect to the representative motion vector MV ' (MV ' x, MV ' y).
In some embodiments, MVx > ═ MV ' x-DH0 and MVx < ═ MV ' x + DH1 and MVy > ═ MV ' y-DV0 and MVy < ═ MV ' y + DV1, where MV ' ═ MV ' x, MV ' y. In some embodiments, DH0 may be equal to or not equal to DH 1; DV0 may or may not equal DV 1. In some embodiments, DH0 may or may not be equal to DV 0; DH1 may or may not be equal to DV 1. In some embodiments, DH0 may not be equal to DH 1; DV0 may not equal DV 1. In some embodiments, DH0, DH1, DV0, and DV1 may be signaled from the encoder to the decoder, such as in VPS/SPS/PPS/slice header/slice group header/slice/CTU/CU/PU. In some embodiments, DH0, DH1, DV0, and DV1 may be specified differently for different standard profiles/levels/hierarchies. In some embodiments, DH0, DH1, DV0, and DV1 may depend on the width and height of the current block. In some embodiments, DH0, DH1, DV0, and DV1 may depend on whether the current block is uni-directional predicted or bi-directional predicted. In some embodiments, DH0, DH1, DV0, and DV1 may depend on the location of the sub-block SB. In some embodiments, DH0, DH1, DV0, and DV1 may depend on how the MV' is obtained.
In some embodiments, MV' may be a CPMV, such as MV0, MV1, or MV 2.
In some embodiments, MV 'may be the MV used in the MC for one of the corner sub-blocks, such as MV0', MV1', or MV2' in fig. 3.
In some embodiments, the MV' may be an MV derived for any location inside or outside the current block using an affine model of the current block. For example, it may be derived for the center position of the current block (e.g., x ═ w/2 and y ═ h/2).
In some embodiments, the MV' may be an MV used in the MC for any sub-block of the current block, such as one of the central sub-blocks (C0, C1, C2, or C3 shown in fig. 3).
In some embodiments, if MVSBIf the constraint is not satisfied, the MV should be adjustedSBClipping to an effective range. In some embodiments, the MVs to be clippedSBStoring the clipped MV in an MV bufferSBWill be used to predict the MV of the following coding block. In some embodiments, in MVSBIt is stored in the MV buffer before being clipped.
In some embodimentsIn, if MVSBIf the constraint is not satisfied, the bitstream is considered non-compliant (invalid). In one example, the MV may be specified in a standardSBConstraints must or should be met. Any compliant encoder should comply with the constraint, otherwise the encoder will be considered non-compliant.
In some embodiments, the MV may be represented with signaling MV precision (such as quarter-pixel precision)SBAnd MV'. In some embodiments, the MVSBAnd MV' may be represented by a stored MV precision, such as 1/16 precision. In some embodiments, the MV may be modifiedSBAnd MV' rounded to a different precision (such as integer precision) than the signaling or storage precision.
Example 2:for an affine coded block, each MxN (such as 8x4, 4x8, or 8x8) block within the block is considered a base unit. The MVs of all 4x4 sub-blocks within MxN are constrained such that the maximum difference between the integer part of the four 4x4 sub-blocks MVs does not exceed K pixels.
In some embodiments, whether and how this constraint is applied depends on whether the current block applies bi-directional prediction or uni-directional prediction. For example, the constraint is only applicable to bi-directional prediction, not to uni-directional prediction. As another example, M, N and K are different for bi-directional prediction and uni-directional prediction.
In some embodiments, M, N and K may depend on the width and height of the current block.
In some embodiments, whether constraints are applied may be signaled from the encoder to the decoder, such as in VPS/SPS/PPS/slice header/slice group header/slice/CTU/CU/PU. For example, an on/off flag is signaled to indicate whether a constraint is imposed. As another example, M, N and K are signaled.
In some embodiments, M, N and K may be specified differently for different standard profiles/levels/hierarchies.
Example 3: the width and height of the sub-blocks may be calculated in different ways for different affine coded blocks.
In some embodiments, the calculation method is different for affine coded blocks using unidirectional prediction and bidirectional prediction. In one example, the sub-block size is fixed (such as 4 × 4, 4 × 8, or 8 × 4) for blocks using unidirectional prediction. In another example, the sub-block size is calculated for a block using bi-prediction. In this case, the sub-block size may be different for two different bi-directionally predicted affine blocks.
In some embodiments, for bi-predictive affine blocks, the width and/or height of the sub-block from reference list 0 may be different from the width and/or height of the sub-block from reference list 1. In one example, assume that the width and height of the sub-block from reference list 0 are derived as Wsb0 and Hsb0, respectively; the width and height of the sub-blocks from reference list 1 are derived as Wsb1 and Hsb1, respectively. The final width and height of the sub-block are calculated as Max (Wsb0, Wsb1) and Max (Hsb0, Hsb1), respectively, for reference list 0 and reference list 1.
In some embodiments, the calculated width and height of the sub-block are applicable only to the luminance component. For the chroma component, it is always fixed, such as a 4 × 4 chroma sub-block, which corresponds to using 4: 2: 8 x 8 luminance block in 0 color format.
In some embodiments, MVx-MV 'x and MVy-MV' y are calculated to determine the width and height of the sub-block. (MVx, MVy) and (MV 'x, MV' y) are defined in example 1.
In some embodiments, the MVs involved in the calculation may be represented in signaling MV precision (such as quarter-pixel precision). In one example, the MVs may be represented with a stored MV precision (such as 1/16 precision). As another example, the MVs may be rounded to a different precision (such as integer precision) than the signaling or storage precision.
In some embodiments, the thresholds used in the calculations to determine the width and height of the sub-blocks may be signaled from encoder to decoder, such as in VPS/SPS/PPS/slice header/slice group header/slice/CTU/CU/PU.
In some embodiments, the thresholds used in the calculations to determine the width and height of the sub-blocks may be different for different standard profiles/levels/hierarchies.
Example 4:to interpolate the W1xH1 sub-block within one W2xH2 sub-block/block, the (W2+ N-1-PW) × (H2+ N-1-PH) block is first extracted, then the pixel filling method described in example 6 (e.g., the boundary pixel repetition method) is applied to generate a larger block, which is then used to interpolate the W1xH1 sub-block. For example, W2 ═ H2 ═ 8, W1 ═ H1 ═ 4, and PW ═ PH 0.
In some embodiments, the integer part of the MV of any W1xH1 sub-block may be used to extract the entire W2xH2 sub-block/block, and thus a different boundary pixel repetition method may be required. For example, if the maximum difference between the integer parts of all W1xH1 sub-blocks MV is no more than 1 pixel, the integer part of MV of the top left W1xH1 sub-block is used to extract the entire W2xH2 sub-block/block. The right and lower boundaries of the reference block are repeated once. As another example, if the maximum difference between the integer parts of all W1xH1 sub-blocks MV is no more than 1 pixel, the integer parts of the MVs of the lower right W1xH1 sub-block are used to extract the entire W2xH2 sub-block/block. The left and upper boundaries of the reference block are repeated once.
In some embodiments, the MV of any W1xH1 sub-block may be modified first and then used to extract the entire W2xH2 sub-block/block, and thus a different boundary pixel repetition method may be required. For example, if the maximum difference between the integer parts of all W1xH1 sub-blocks MV is no more than 2 pixels, the integer part of MV of the top left W1xH1 sub-block may be added to (1, 1) (where 1 represents a 1 integer-pixel distance) and then used to extract the entire W2xH2 sub-block/block. In this case, the left, right, upper and lower boundaries of the reference block are repeated once. As another example, if the maximum difference between the integer parts of all W1xH1 sub-blocks MV is no more than 2 pixels, the integer parts of the MVs of the lower right W1xH1 sub-block may be added to (-1, -1) (where 1 represents 1 integer pixel distance) and then used to extract the entire W2xH2 sub-block/block. In this case, the left, right, upper and lower boundaries of the reference block are repeated once.
Bandwidth control for specific block sizes
Example 5:if w and h of the current block are fullBidirectional prediction is not allowed if one or more of the following conditions are met.
A.w is equal to T1 and h is equal to T2, or h is equal to T1 and w is equal to T2. In one example, T1-4 and T2-16.
B.w is equal to T1 and h is not greater than T2, or h is equal to T1 and w is not greater than T2. In one example, T1-4 and T2-16.
C.w is not greater than T1 and h is not greater than T2, or h is not greater than T1 and w is not greater than T2. In one example, T1-8 and T2-8. In another example, T1 ═ 8 and T2 ═ 4. In yet another example, T1-4 and T2-4.
In some embodiments, bi-prediction may be disabled for the 4x8 block. In some embodiments, bi-prediction may be disabled for 8x4 blocks. In some embodiments, bi-prediction may be disabled for 4x16 blocks. In some embodiments, bi-prediction may be disabled for 16 x4 blocks. In some embodiments, bi-prediction may be disabled for 4x8, 8x4 blocks. In some embodiments, bi-prediction may be disabled for 4x16, 16 x4 blocks. In some embodiments, bi-prediction may be disabled for 4x8, 16 x4 blocks. In some embodiments, bi-prediction may be disabled for 4x16, 8x4 blocks. In some embodiments, bi-prediction may be disabled for 4N blocks, e.g., N ≦ 16. In some embodiments, bi-prediction may be disabled for an Nx 4 block, e.g., N ≦ 16. In some embodiments, bi-prediction may be disabled for an 8xN block, e.g., N < ═ 16. In some embodiments, bi-prediction may be disabled for Nx8 blocks, e.g., N < ═ 16. In some embodiments, bi-prediction may be disabled for 4x8, 8x4, 4x16 blocks. In some embodiments, bi-prediction may be disabled for 4 × 8, 8 × 4, 16 × 4 blocks. In some embodiments, bi-prediction may be disabled for 8 × 4, 4 × 16, 16 × 4 blocks. In some embodiments, bi-prediction may be disabled for 4 × 8, 8 × 4, 4 × 16, 16 × 4 blocks.
In some embodiments, block sizes disclosed herein may refer to one color component, such as a luma component, and the decision as to whether to disable bi-prediction may apply to all color components. For example, if bi-prediction is disabled according to the block size of the luminance component of the block, bi-prediction will also be disabled for corresponding blocks of other color components. In some embodiments, the block size disclosed herein may refer to one color component, such as a luminance component, and the decision as to whether to disable bi-prediction applies only to that color component.
In some embodiments, if bi-prediction is disabled for a block and the selected Merge candidate is bi-predicted, only one MV from either reference list 0 or reference list 1 of the Merge candidate is assigned to the block.
In some embodiments, if bi-prediction is disabled for a block, the Triangle Prediction Mode (TPM) is not allowed for the block.
In some embodiments, how the prediction direction (unidirectional prediction from list 0/1, bidirectional prediction) is signaled may depend on the block size. In one example, an indication of a unidirectional prediction from list 0/1 may be signaled when 1) block width block height <64 or 2) block width block height 64 but the width is not equal to the height. As another example, when 1) block width block height >64 or 2) block width block height 64 and the width is equal to the height, an indication of bi-directional prediction or uni-directional prediction from list 0/1 may be signaled.
In some embodiments, for a 4x4 block, both unidirectional prediction and bidirectional prediction may be disabled. In some embodiments, an affine coded codeblock may be disabled. Alternatively, it may be disabled for non-affine coded blocks. In some embodiments, indications of quadtree partitioning of 8x8 blocks, binary tree partitioning of 8x4 or 4x8 blocks, and ternary tree partitioning of 4x16 or 16x4 blocks may be skipped. In some embodiments, a 4x4 block must be encoded as an intra block. In some embodiments, the MVs of a 4 × 4 block must be integer precision. For example, the IMV flag of a 4x4 block must be 1. As another example, the MVs of a 4x4 block must be rounded to integer precision.
In some embodiments, bi-directional prediction is allowed. However, assuming that the tap of the interpolation filter is N, only (W + N-1-PW) ((W + N-1-PH)) reference pixels are fetched, instead of (W + N-1) ((H + N-1)) reference pixels. At the same time, the pixels at the block boundaries (upper, left, lower and right boundaries) are repeatedly referenced to generate the (W + N-1) × (H + N-1) block for final interpolation as shown in FIG. 9. In some embodiments, PH is zero and only the left or/and right boundaries are repeated. In some embodiments, PW is zero, and only the upper or/and lower bounds are repeated. In some embodiments, PW and PH are both greater than zero, and the left or/and right boundaries are repeated first, and then the upper or/and lower boundaries are repeated. In some embodiments, PW and PH are both greater than zero, and the upper or/and lower bounds are repeated first, and then the left and right bounds are repeated. In some embodiments, the left border is repeated M1 times and the right border is repeated PW-M1 times. In some embodiments, the upper border is repeated M2 times and the lower border is repeated PH-M2 times. In some embodiments, this boundary pixel repetition method may be applied to some or all of the reference blocks. In some embodiments, PW and PH may be different for different color components, such as Y, Cb and Cr.
Fig. 9 shows an example of repeating the boundary pixels of the reference block before interpolation.
Example 6:in some embodiments, (W + N-1-PW) (W + N-1-PH) reference pixels (instead of (W + N-1) ((H + N-1)) reference pixels) may be extracted for motion compensation of the WxH block. The interpolation process is performed by filling in spots that are outside the range of (W + N-1-PW) (W + N-1-PH) but within the range of (W + N-1) (H + N-1). In one padding approach, the pixels at the reference block boundaries (upper, left, lower and right boundaries) are repeated to generate the (W + N-1) × (H + N-1) block for final interpolation as shown in FIG. 11.
In some embodiments, PH is zero and only the left or/and right boundaries are repeated.
In some embodiments, PW is zero, and only the upper or/and lower bounds are repeated.
In some embodiments, PW and PH are both greater than zero, and the left or/and right boundaries are repeated first, and then the upper or/and lower boundaries are repeated.
In some embodiments, PW and PH are both greater than zero, and the upper or/and lower bounds are repeated first, and then the left and right bounds are repeated.
In some embodiments, the left border is repeated M1 times and the right border is repeated PW-M1 times.
In some embodiments, the upper border is repeated M2 times and the lower border is repeated PH-M2 times.
In some embodiments, this boundary pixel repetition method may be applied to some or all of the reference blocks.
In some embodiments, PW and PH may be different for different color components, such as Y, Cb and Cr.
In some embodiments, the PW and PH may be different for different block sizes or shapes.
In some embodiments, PW and PH may be different for unidirectional prediction and bi-directional prediction.
In some embodiments, padding may not be performed in affine mode.
In some embodiments, spots that are outside the range (W + N-1-PW) (W + N-1-PH) but within the range (W + N-1) (H + N-1) are set to a single value. In some embodiments, the single value is 1< < (BD-1), where BD is the bit depth of a sample point, such as 8 or 10. In some embodiments, the single value is signaled from the encoder to the decoder in VPS/SPS/PPS/slice header/slice group header/slice/CTU line/CTU/CU/PU. In some embodiments, the single value is derived from samples within a range (W + N-1-PW) (W + N-1-PH).
Example 7:in DMVR, (W + filterSize-1-PW) (H + filterSize-1-PH) reference spots may be extracted instead of (W + filterSize-1) ((H + filterSize-1)) reference spots, and all other required spots may be generated by repeating the boundaries of the extracted reference spots, where PW is >0 and PH>=0。
In some embodiments, the method set forth in example 6 may be used to fill in unextracted samples.
In some embodiments, in the final motion compensation of the DMVR, padding may no longer be performed.
In some embodiments, whether the above method is applied may depend on the block size.
Example 8:the signaling method of inter _ pred _ idc may depend on whether w and h satisfy the condition in example 5. An example is shown in table 3 below:
TABLE 3
Another example is shown in table 4 below:
TABLE 4
Another example is shown in table 5 below:
TABLE 5
Example 9:the Merge candidate list construction process may depend on whether w and h satisfy the condition in example 4. The following examples describe the case when w and h satisfy the conditions.
In some embodiments, if one Merge candidate uses bi-directional prediction, only the prediction from reference list 0 is retained, and the Merge candidate is considered as uni-directional prediction with reference to reference list 0.
In some embodiments, if one Merge candidate uses bi-directional prediction, only the prediction from reference List 1 is retained, and the Merge candidate is considered as a uni-directional prediction with reference to reference List 1.
In some embodiments, a Merge candidate is considered unavailable if it uses bi-prediction. That is, such a Merge candidate is removed from the Merge list.
In some embodiments, the Merge candidate list construction process for the trigonometric prediction mode is used instead.
Example 10:the coding tree partitioning process may depend on the partitioningWhether the width and height of the latter sub-CU satisfy the conditions in example 5.
In some embodiments, if the width and height of the sub-CU after the division satisfy the conditions in example 5, the division is not allowed. In some embodiments, the signaling of the coding tree partitioning may depend on whether one partitioning is allowed or not. In one example, if a partition is not allowed, the codewords representing the partition are omitted.
Example 11:the signaling of the skip flag or/and the block Intra Block Copy (IBC) flag may depend on whether the width and/or height of the block meets certain conditions (e.g., the conditions described in example 5).
In some embodiments, the condition is that the luminance block contains no more than X samples. For example, X ═ 16;
in some embodiments, the condition is that the luminance block contains X samples. For example, X ═ 16.
In some embodiments, the condition is that the width and height of the luminance block are both equal to X. For example, X ═ 4;
in some embodiments, when one or some of the above conditions are true, inter mode and/or IBC mode are not allowed for such blocks.
In some embodiments, if inter mode is not allowed for a block, it may not be signaled with a skip flag. Alternatively, in addition, the skip flag may be inferred as false.
In some embodiments, if inter and IBC modes are not allowed for a block, then a skip flag may not be signaled for it and may be implicitly derived as false (e.g., a block is derived to be encoded in a non-skip mode).
In some embodiments, if inter mode is not allowed for a block but IBC mode is allowed for the block, the skip flag may still be signaled. In some embodiments, if the block is encoded in skip mode, the IBC flag may not be signaled and may be implicitly derived as true (e.g., the block is derived as encoded in IBC mode).
Example 12:predictionThe signaling of the pattern may depend on whether the width and/or height of the block meets certain conditions (e.g., the conditions described in example 5).
In some embodiments, the condition is that the luminance block contains no more than X samples, e.g., X ═ 16.
In some embodiments, the condition is that the luminance block contains X samples, e.g., X ═ 16.
In some embodiments, the condition is that the width and height of the luminance block are both equal to X, e.g., X ═ 4.
In some embodiments, when one or some of the above conditions are true, inter mode and/or IBC mode are not allowed for such blocks.
In some embodiments, the signaling of indications of certain modes may be skipped.
In some embodiments, if inter and IBC modes are not allowed for a block, the signaling of the indication of inter and IBC modes is skipped, and the remaining allowed modes, such as intra mode or palette mode, may still be signaled.
In some embodiments, if inter and IBC modes are not allowed for a block, the prediction mode may not be signaled. Alternatively, in addition, the prediction mode may be implicitly derived as an intra mode.
In some embodiments, if inter mode is not allowed for a block, the signaling of the indication of inter mode is skipped, and the remaining allowed modes, e.g., intra mode or IBC mode, may still be signaled. Alternatively, the remaining allowed modes may still be signaled, e.g. intra mode or IBC mode or palette mode.
In some embodiments, if inter mode is not allowed for a block but IBC mode and intra mode are allowed, an IBC flag may be signaled to indicate whether the block is encoded in IBC mode. Alternatively, in addition, the prediction mode may not be signaled.
Example 13:the signaling of the triangular pattern may depend on whether the width and/or height of the block satisfy certain conditions (e.g., the conditions described in example 5))。
In some embodiments, the condition is that the luma block size is one of some particular sizes. For example, the particular dimensions may include 4x16 or/and 16x 4.
In some embodiments, when the above condition is true, the triangle mode may be disabled, and a flag indicating whether the current block is encoded in the triangle mode may not be signaled and may be deduced as false.
Example 14:the signaling of the inter prediction direction may depend on whether the width and/or height of the block satisfy certain conditions (e.g., the conditions described in example 5).
In some embodiments, the condition is that the luma block size is one of some particular sizes. For example, the particular dimensions may include 8x4 or/and 4x8 or/and 4x16 or/and 16x 4.
In some embodiments, when the above condition is true, the block may be only uni-directionally predicted, and a flag indicating whether the current block is bi-directionally predicted may not be signaled and may be deduced as false.
Example 15:the signaling of the SMVD (symmetric MVD) flag may depend on whether the width and/or height of the block satisfy certain conditions (e.g., the conditions described in example 5).
In some embodiments, the condition is that the luma block size is one of some particular sizes. In some embodiments, the condition is defined as whether a block size has no more than 32 samples. In some embodiments, the condition is defined as whether the block size is 4 × 8 or 8 × 4. In some embodiments, the condition is defined as whether the block size is 4 × 4, 4 × 8, or 8 × 4. In some embodiments, the particular dimensions may include 8x4 or/and 4x8 or/and 4x16 or/and 16x 4.
In some embodiments, when certain conditions are true, an indication of the use of SMVD (such as an SMVD flag) may not be signaled and may be inferred as false. For example, a block may be set to be uni-directionally predicted.
In some embodiments, an indication of the use of SMVD (such as an SMVD flag) may still be signaled when certain conditions are true, however, only list 0 or list 1 motion information may be used in the motion compensation process.
Example 16:the motion vectors (such as those derived in the conventional Merge mode, ATMVP mode, MMVD Merge mode, MMVD skip mode, etc.) or block vectors for IBC may be modified depending on whether the width and/or height of the block satisfy certain conditions.
In some embodiments, the condition is that the luma block size is one of some particular sizes. For example, the particular dimensions may include 8x4 or/and 4x8 or/and 4x16 or/and 16x 4.
In some embodiments, when the above condition is true, if the derived motion information is bi-directionally predicted (e.g., inherited from a neighboring block with some offset), the motion vector or block vector of the block may be changed to a uni-directional motion vector. Such a process is called a conversion process, and the final unidirectional motion vector is called a "converted unidirectional" motion vector. In some embodiments, the motion information for reference picture list X (e.g., X is 0 or 1) may be retained, and the motion information for list Y (Y is 1-X) may be discarded. In some embodiments, the motion information of reference picture list X (e.g., X is 0 or 1) and the motion information of list Y (Y is 1-X) may be jointly utilized to derive new motion candidate points for list X. In one example, the motion vector of the new motion candidate may be the average motion vector of the two reference picture lists. As another example, the motion information for list Y may be first scaled to list X. The motion vector of the new motion candidate may then be the average motion vector of the two reference picture lists. In some embodiments, the motion vector in the prediction direction X may not be used (e.g., the motion vector in the prediction direction X is changed to (0, 0) and the reference index in the prediction direction X is changed to-1), and the prediction direction may be changed to 1-X, X being 0 or 1. In some embodiments, the converted unidirectional motion vector may be used to update the HMVP lookup table. In some embodiments, the derived bidirectional motion information, e.g., bidirectional MVs prior to conversion to unidirectional MVs, may be used to update the HMVP lookup table. In some embodiments, the converted unidirectional motion vectors may be stored and may be used for motion prediction, TMVP, deblocking, etc. of subsequent coded blocks. In some embodiments, the derived bi-directional motion information, e.g., bi-directional MVs prior to conversion to uni-directional MVs, may be stored and used for motion prediction, TMVP, deblocking, etc. of subsequent coded blocks. In some embodiments, the converted uni-directional motion vectors may be used for motion refinement. In some embodiments, the derived bi-directional motion information may be used for motion refinement and/or sample point refinement, for example, using optical flow methods. In some embodiments, the prediction block generated from the derived bi-directional motion information may be first refined, and then only one prediction block may be utilized to derive the final prediction and/or reconstructed block for one block.
In some embodiments, when certain conditions are true, the (bi-directionally predicted) motion vector may be converted to a uni-directional motion vector before being used as a base Merge candidate in MMVD.
In some embodiments, when certain conditions are true (e.g., the size of the block satisfies the conditions as specified in example 5 above), the (bi-directionally predicted) motion vector may be converted to a uni-directional motion vector before being inserted into the Merge list.
In some embodiments, the converted uni-directional motion vector may only come from reference list 0. In some embodiments, when the current slice/slice group/picture is bi-predicted, the converted uni-directional motion vector may be from reference list 0 or list 1. In some embodiments, when the current slice/slice group/picture is bi-predicted, the transformed uni-directional motion vectors from reference list 0 and list 1 may be interleaved in the Merge list or/and the MMVD base Merge candidate list.
In some embodiments, how the motion information is converted into a unidirectional motion vector may depend on the reference picture. In some embodiments, list 1 motion information may be utilized if all reference pictures of one video data unit (such as a slice/slice group) are past pictures in display order. In some embodiments, if at least one of the reference pictures of one video data unit (such as a slice/slice group) is a past picture and at least one is a future picture, then in display order, list 0 motion information may be utilized. In some embodiments, how the motion information is converted to a unidirectional motion vector may depend on a low latency check flag.
In some embodiments, the conversion process may be invoked immediately prior to the motion compensation process. In some embodiments, the conversion process may be invoked immediately after the motion candidate list (e.g., Merge list) construction process. In some embodiments, the conversion process may be invoked before the add MVD process in the MMVD process is invoked. That is, the add MVD process follows the design of unidirectional prediction rather than bidirectional prediction. In some embodiments, the conversion process may be invoked before invoking the sample point refinement process in the PROF process. That is, the add MVD process follows the design of unidirectional prediction rather than bidirectional prediction. In some embodiments, the translation process may be invoked before the BIO (also known as BDOF) process is invoked. That is, in some cases, the BIO may be disabled because it has been converted to unidirectional prediction. In some embodiments, the conversion process may be invoked before the DMVR process is invoked. That is, in some cases, the DMVR may be disabled because it has been converted to unidirectional prediction.
Example 17:in some embodiments, how the motion candidate list is generated may depend on the block size, e.g., as described in example 5 above.
In some embodiments, for certain block sizes, all motion candidates derived from spatial and/or temporal blocks and/or HMVP and/or other kinds of motion candidates may be restricted to being uni-directionally predicted.
In some embodiments, for some block sizes, if one motion candidate derived from a spatial domain block and/or a temporal domain block and/or an HMVP and/or other kind of motion candidate is bi-predictive, it may first be converted to uni-predictive before being added to the candidate list.
Example 18:whether sharing of the Merge list is allowed may depend on the encoding mode.
In some embodiments, the Merge list may not be allowed to be shared for blocks encoded in the normal Merge mode, and may be allowed to be shared for blocks encoded in the IBC mode.
In some embodiments, when one block divided from a parent sharing node is encoded in the normal Merge mode, the update of the HMVP table may be disabled after encoding/decoding the block.
Example 19:in the examples disclosed above, the block size/width/height of the luma block may also be changed to the block size/width/height of the chroma blocks, such as Cb, Cr or G/B/R.
GBi mode row buffer reduction
Example 20: whether the GBi weighted index (including CABAC context selection) can be inherited or predicted from neighboring blocks depends on the location of the current block.
In some embodiments, the GBi weighted index cannot be inherited or predicted from neighboring blocks that are not in the same coding tree unit (CTU, also referred to as a largest coding unit LCU) as the current block.
In some embodiments, the GBi weighted index cannot be inherited or predicted from a neighboring block that is not in the same CTU line (line) or CTU line (row) as the current block.
In some embodiments, the GBi weighted index cannot be inherited or predicted from neighboring blocks in the mxn region of the current block. For example, M-N-64. In this case, the slice/picture is divided into a plurality of non-overlapping MxN regions.
In some embodiments, the GBi weighted index cannot be inherited or predicted from a neighboring block that is not in the same M × N region line (line) or M × N region line (row) as the current block. For example, M-N-64. The CTU line (line)/line (row) and the region line (line)/line (row) are depicted in fig. 10.
In some embodiments, assuming that the top left corner (or any other location) of the current block is (x, y) and the top left corner (or any other location) of the neighboring block is (x ', y'), it cannot be inherited from or predicted from the neighboring block if one of the following conditions is met.
(1) x/M! x'/M. For example, M128 or 64.
(2) y/N! y'/N. For example, N-128 or 64.
(3) ((x/M | = x '/M) & (y/N | = y'/N)). For example, M-N-128 or M-N-64.
(4) ((x/M | = x '/M) | (y/N | = y'/N)). For example, M-N-128 or M-N-64.
(5) x > > M! X' > > M. For example, M ═ 7 or 6.
(6) y > > N! Y' > > N. For example, N ═ 7 or 6.
(7) (x > > M! & (y > > N! & (y' > > N) }. For example, M ═ N ═ 7 or M ═ N ═ 6.
(8) (x > > M! | (y > > N | = y' > > N)). For example, M ═ N ═ 7 or M ═ N ═ 6.
In some embodiments, a flag is signaled in the PPS or slice header or slice group header or slice to indicate whether GBi can be applied in the picture/slice group/slice. In some embodiments, whether and how GBi is used (such as how many candidate weights and values of weights) may be derived for a picture/slice. In some embodiments, the derivation may depend on information such as QP, temporal layer, POC distance, etc.
Fig. 10 shows an example of CTU (region) rows. Shaded CTUs (regions) are in one CUT (region) row and unshaded CTUs (regions) are in another CUT (region) row.
Simplification of inter-frame intra prediction (IIP)
Example 21:the intra prediction mode coding in the IIP encoded block is independent of the intra prediction mode coding of the neighboring blocks of the IIP encoding.
In some embodiments, only the intra-prediction mode of the intra-coded block may be used in the encoding of the intra-prediction mode of the IIP coded block, such as during the MPM list construction process.
In some embodiments, intra prediction modes in an IIP encoded block are encoded without mode prediction from any neighboring blocks.
Example 22:when IIP coding blockWhen both the intra prediction mode of the IIP coding block and the intra prediction mode of the intra coding block are used to encode the intra prediction mode of the new IIP coding block, the intra prediction mode of the IIP coding block may have a lower priority than the intra prediction mode of the intra coding block.
In some embodiments, intra prediction modes for IIP encoded blocks and intra-coded neighboring blocks are utilized when deriving MPMs for the IIP encoded blocks. However, the intra prediction mode from the intra-coded neighboring block may be inserted into the MPM list before the intra prediction mode from the IIP-coded neighboring block.
In some embodiments, the intra prediction mode from the intra-coded neighboring block may be inserted into the MPM list after the intra prediction mode from the IIP-coded neighboring block.
Example 23:the intra prediction mode in the IIP encoded block may also be used to predict the intra prediction mode of the intra encoded block.
In some embodiments, the intra prediction mode in the IIP coding block may be used to derive the MPM of the normal intra coding block. In some embodiments, when both the intra-prediction mode of the IIP encoded block and the intra-prediction mode of the intra-encoded block are used to derive the MPM of the normal intra-encoded block, the intra-prediction mode of the IIP encoded block may be lower in priority than the intra-prediction mode of the intra-encoded block.
In some embodiments, the intra-prediction mode in the IIP encoded block may also be used to predict the intra-prediction mode of a normal intra-coded block or an IIP encoded block only when one or more of the following conditions are met:
1. both blocks are in the same row of CTUs.
2. Both blocks are in the same CTU.
3. The two blocks are in the same M × N region, such as M-N-64.
4. Two blocks are in the same M × N region row, such as M-N-64.
Example 24:in some embodiments, the MPM construction process for the IIP encoded blocks should be the same as the MPM construction process for the normal intra encoded blocks.
In some embodiments, six MPMs are used for an inter-coded block using inter-frame intra prediction.
In some embodiments, only part of the MPM is used for the IIP encoded block. In some embodiments, the first MPM is always used. Alternatively, furthermore, the MPM flag and MPM index need not be signaled. In some embodiments, the first four MPMs may be utilized. Alternatively, further, the MPM flag need not be signaled, but the MPM index needs to be signaled.
In some embodiments, each block may select one from the MPM list according to the intra prediction modes included in the MPM list, such as selecting a mode having a smallest index compared to a given mode (e.g., plane).
In some embodiments, each block may select a subset of modes from the MPM list and signal the mode index in the subset.
In some embodiments, the context used to encode the intra MPM mode is reused to encode the intra mode in the IIP encoding block. In some embodiments, different contexts used to encode intra MPM modes are used to encode intra modes in the IIP encoding block.
Example 25:in some embodiments, for angular intra prediction modes other than horizontal and vertical, equal weights are used for intra-prediction blocks and inter-prediction blocks generated for the IIP coding block.
Example 26:in some embodiments, for certain positions, a zero weight may be applied in the IIP encoding process.
In some embodiments, a zero weight may be applied to the intra prediction block used in the IIP encoding process.
In some embodiments, a zero weight may be applied to the inter-prediction block used in the IIP encoding process.
Example 27:in some embodiments, the intra prediction mode of the IIP encoded block can only be selected as one of the MPMs regardless of the size of the current block.
In some embodiments, no MPM flag is signaled and inferred to be 1 regardless of the size of the current block.
Example 28:for IIP coding blocks, luma prediction chroma mode (LM) mode is used instead of Derived Mode (DM) mode to intra predict chroma components.
In some embodiments, both DM and LM may be enabled.
In some embodiments, multiple intra prediction modes may be allowed for the chroma component.
In some embodiments, whether multiple modes are allowed for the chroma components may depend on the color format. In one example, for 4: 4: 4 color format, the allowed chroma intra prediction mode may be the same as the chroma intra prediction mode of the luma component.
Example 29: inter-frame intra prediction may not be allowed in one or more specific cases:
A.w-T1-h-T1, for example, T1-4
B.w > T1 h > T1, e.g. T1-64
C. (w ═ T1& & h ═ T2) | (w ═ T2& & h ═ T1), for example, T1 ═ 4, T2 ═ 16.
Example 30:for blocks using bi-directional prediction, inter intra prediction may not be allowed.
In some embodiments, if bi-directional prediction is used for the selected Merge candidate of the IIP coding block, it will be converted to a uni-directional prediction Merge candidate. In some embodiments, only the prediction from reference list 0 is retained, and the Merge candidate is considered as a unidirectional prediction referencing reference list 0. In some embodiments, only the prediction from reference list 1 is retained, and the Merge candidate is considered as a unidirectional prediction referencing reference list 1.
In some embodiments, a restriction is added that the selected Merge candidate should be a uni-directional prediction Merge candidate. Alternatively, the signaled Merge index of the IIP coding block indicates the index of the uni-directionally predicted Merge candidate (i.e., the bi-directionally predicted Merge candidate is not counted).
In some embodiments, the Merge candidate list construction process used in the delta prediction mode may be used to derive a motion candidate list for the IIP coding block.
Example 31:when inter-frame intra prediction is applied, some coding tools may not be allowed.
In some embodiments, bi-directional optical flow (BIO) is not applied to bi-directional prediction.
In some embodiments, Overlapped Block Motion Compensation (OBMC) is not applied.
In some embodiments, the decoder-side motion vector derivation/refinement process is not allowed.
Example 32:the intra prediction process for inter intra prediction may be different from the intra prediction process for normal intra coded blocks.
In some embodiments, neighboring samples may be filtered in different ways. In some embodiments, the neighboring samples are not filtered prior to performing intra prediction for inter-frame intra prediction.
In some embodiments, the intra prediction for inter intra prediction is not subjected to a location-dependent intra prediction sample filtering process. In some embodiments, multiple rows of intra prediction are not allowed in inter intra prediction. In some embodiments, wide-angle intra prediction is not allowed in inter intra prediction.
Example 33:assume intra and inter prediction values in mixed intra and inter prediction are PIntra and Pinter, and weighting factors are w _ intra and w _ inter, respectively. The predicted value of position (x, y) is calculated as (PINtra (x, y) × w _ intra (x, y) + PINter (x, y) × w _ inter (x, y) + offset (x, y))>>N, where w _ intra (x, y) + w _ iner (x, y) ═ 2^ N, and offset (x, y) ^ 2 (N-1). In one example, N ═ 3.
Example 34:in some embodiments, the MPM flags signaled in the normal intra-coded block and in the IIP-coded block should share the same arithmetic coding context.
Example 35:in some embodiments, no MPM is needed for intra prediction in IIP coded blocksThe mode is encoded. (assuming the width and height of the block are w and h).
In some embodiments, the four modes { planar, DC, vertical, horizontal } are binarized to 00, 01, 10, and 11 (any mapping rule such as 00-planar, 01-DC, 10-vertical, 11-horizontal may be used).
In some embodiments, the four modes { planar, DC, vertical, horizontal } are binarized to 0, 10, 110, and 111 (any mapping rule such as 0-planar, 10-DC, 110-vertical, 111-horizontal may be used).
In some embodiments, the four modes { planar, DC, vertical, horizontal } are binarized to 1, 01, 001, and 000 (any mapping rule such as 1-planar, 01-DC, 001-vertical, 000-horizontal may be used).
In some embodiments, only three modes { planar, DC, vertical } may be used when W > N × H (N is an integer such as 2) may be used. The three patterns are binarized to 1, 01, 11 (any mapping rule such as 1-plane, 01-DC, 11-vertical may be used).
In some embodiments, only three modes { planar, DC, vertical } may be used when W > N × H (N is an integer such as 2) may be used. The three patterns are binarized to 0, 10, 00 (any mapping rule such as 0-plane, 10-DC, 00-vertical may be used).
In some embodiments, only three modes { planar, DC, horizontal } may be used when H > N × W (N is an integer such as 2) may be used. The three patterns are binarized to 1, 01, 11 (any mapping rule such as 1-plane, 01-DC, 11-level may be used).
In some embodiments, only three modes { planar, DC, horizontal } may be used when H > N × W (N is an integer such as 2) may be used. The three patterns are binarized to 0, 10, 00 (any mapping rule such as 0-plane, 10-DC, 00-level may be used).
Example 36:in some embodiments, only DC and planar modes are used in the IIP coding block. In some embodiments, a flag is signaled to indicate whether to use DC or DCAnd (4) a plane.
Example 37:in some embodiments, the IIP is performed differently for different color components.
In some embodiments, inter-frame intra prediction is not performed on chroma components (such as Cb and Cr).
In some embodiments, in an IIP coding block, the intra prediction mode for the chroma component is different from the intra prediction mode for the luma component. In some embodiments, the DC mode is always used for chrominance. In some embodiments, the planar mode is always used for chrominance. In some embodiments, the LM mode is always used for chroma.
In some embodiments, how different color components are IIP may depend on the color format (such as 4: 2: 0 or 4: 4: 4).
In some embodiments, how the different color components are IIP may depend on the block size. For example, when the width or height of the current block is equal to or less than 4, inter intra prediction is not performed on chrominance components such as Cb and Cr.
MV precision problem
In the following discussion, the precision for the MV stored for spatial motion prediction is denoted as P1, while the precision for the MV stored for temporal motion prediction is denoted as P2.
Example 38:p1 and P2 may be the same, or they may be different.
In some embodiments, P1 is a 1/16 luminance pixel and P2 is a 1/4 luminance pixel. In some embodiments, P1 is 1/16 luminance pixels and P2 is 1/8 luminance pixels. In some embodiments, P1 is 1/8 luminance pixels and P2 is 1/4 luminance pixels. In some embodiments, P1 is 1/8 luminance pixels and P2 is 1/8 luminance pixels. In some embodiments, P2 is a 1/16 luminance pixel and P1 is a 1/4 luminance pixel. In some embodiments, P2 is 1/16 luminance pixels and P1 is 1/8 luminance pixels. In some embodiments, P2 is 1/8 luminance pixels and P1 is 1/4 luminance pixels.
Example 39:p1 and P2 may not be fixed. In some embodiments, for the differenceThe standard profiles/levels/hierarchies of P1/P2 may differ. In some embodiments, P1/P2 may be different for pictures in different temporal layers. In some embodiments, P1/P2 may be different for pictures having different widths/heights. In some embodiments, the P1/P2 may be signaled from the encoder to the decoder in the VPS/SPS/PPS/slice header/slice group header/slice/CTU/CU.
Example 40: for MV (MVx, MVy), the precision of MVx and MVy may be different, denoted as Px and Py.
In some embodiments, Px/Py may be different for different standard profiles/levels/hierarchies. In some embodiments, Px/Py may be different for pictures in different temporal layers. In some embodiments, Px may be different for pictures having different widths. In some embodiments, Py may be different for pictures with different heights. In some embodiments, Px/Py may be signaled from the encoder to the decoder in VPS/SPS/PPS/slice header/slice group header/slice/CTU/CU.
Example 41:before the MV (MVx, MVy) is put into storage for temporal motion prediction, it should be changed to the correct precision.
In some embodiments, if P1> -P2, MVx-Shift (MVx, P1-P2) and MVy-Shift (MVy, P1-P2). In some embodiments, if P1 ═ P2, MVx ═ sigshift (MVx, P1-P2) and MVy ═ sigshift (MVy, P1-P2). In some embodiments, if P1 < P2, MVx ═ MVx < < (P2-P1) and MVy < (P2-P1).
Example 42:suppose MV (MVx, MVy) precisions Px and Py, and MVx or MVy is stored by an integer having N bits. The range of MVs (MVx, MVy) is MinX <=MVx<MaxX, and MinY<=MVy<=MaxY。
In some embodiments, MinX may be equal to MinY, or may not be equal to MinY. In some embodiments, MaxX may be equal to MaxY, or may not be equal to MaxY. In some embodiments, { MinX, MaxX } may depend on Px. In some embodiments, { MinY, MaxY } may depend on Py. In some embodiments, { MinX, MaxX, MinY, MaxY } may depend on N. In some embodiments, { MinX, MaxX, MinY, MaxY } may be different for MVs stored for spatial and temporal motion prediction. In some embodiments, { MinX, MaxX, MinY, MaxY } may be different for different standard profiles/levels/hierarchies. In some embodiments, { MinX, MaxX, MinY, MaxY } may be different for pictures in different temporal layers. In some embodiments, { MinX, MaxX, MinY, MaxY } may be different for pictures with different widths/heights. In some embodiments, { MinX, MaxX, MinY, MaxY } may be signaled from the encoder to the decoder in VPS/SPS/PPS/slice header/slice group header/slice/CTU/CU. In some embodiments, { MinX, MaxX } may be different for pictures with different widths. In some embodiments, { MinY, MaxY } may be different for pictures with different heights. In some embodiments, MVx is clipped to [ MinX, MaxX ] before being placed into the storage of spatial motion prediction. In some embodiments, MVx is clipped to [ MinX, MaxX ] before being placed in the storage of temporal motion prediction. In some embodiments, MVy is clipped to [ MinY, MaxY ] before it is placed into the storage of spatial motion prediction. In some embodiments, MVy is clipped to [ MinY, MaxY ] before being placed in the storage of temporal motion prediction.
Affine Merge mode row buffer reduction
Example 43:the affine model inherited by the affine Merge candidate from neighboring blocks (derived CPMV or affine parameters) is always a 6-parameter affine model.
In some embodiments, if the neighboring block is encoded with a 4-parameter affine model, the affine model is still inherited as a 6-parameter affine model.
In some embodiments, whether the 4-parameter affine model from the neighboring block is inherited as a 6-parameter affine model or a 4-parameter affine model may depend on the location of the current block. In some embodiments, if the neighboring block is not in the same coding tree unit (CTU, also referred to as largest coding unit LCU) as the current block, the 4-parameter affine model from the neighboring block is inherited as a 6-parameter affine model. In some embodiments, if the neighboring block is not in the same CTU line (line) or CTU line (row) as the current block, the 4-parameter affine model from the neighboring block is inherited as a 6-parameter affine model. In some embodiments, if the neighboring block is not in the mxn region of the current block, the 4-parameter affine model from the neighboring block is inherited as a 6-parameter affine model. For example, M-N-64. In this case, the slice/picture is divided into a plurality of non-overlapping MxN regions. In some embodiments, if the neighboring block is not in the same M × N region line (line) or M × N region line (row) as the current block, the 4-parameter affine model from the neighboring block is inherited as a 6-parameter affine model. For example, M-N-64. The CTU line (line)/line (row) and the region line (line)/line (row) are depicted in fig. 10.
In some embodiments, assuming that the top left corner (or any other location) of the current block is (x, y) and the top left corner (or any other location) of the neighboring block is (x ', y'), the 4-parameter affine model of the neighboring block is inherited as a 6-parameter affine model if the neighboring block satisfies one or more of the following conditions:
(a) x/M! x'/M. For example, M128 or 64.
(b) y/N! y'/N. For example, N-128 or 64.
(c) ((x/M | = x '/M) & (y/N | = y'/N)). For example, M-N-128 or M-N-64.
(d) ((x/M | = x '/M) | (y/N | = y'/N)). For example, M-N-128 or M-N-64.
(e) x > > M! X' > > M. For example, M ═ 7 or 6.
(f) y > > N! Y' > > N. For example, N ═ 7 or 6.
(g) (x > > M! & (y > > N! & (y' > > N) }. For example, M ═ N ═ 7 or M ═ N ═ 6.
(h) (x > > M! | (y > > N | = y' > > N)). For example, M ═ N ═ 7 or M ═ N ═ 6.
5. Examples of the embodiments
The following description shows an example of how the disclosed techniques may be implemented within the syntax structure of the current VVC standard. Newly added content is shown in bold and deleted content is shown in italics.
5.1 embodiment #1 (with inter prediction of 4x4 disabled, and bi-prediction of 4x8, 8x4, 4x16, and 16x4 blocks disabled)
7.3.6.6 coding unit syntax
7.4.7.6 coding Unit semantics
pred _ mode _ flag equal to 0 specifies that the current coding unit is coded in inter prediction mode. pred _ mode _ flag equal to 1 indicates that the current coding unit is coded in intra prediction mode. For x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1, the variable CuPredMode [ x ] [ y ] is derived as follows:
-if pred _ MODE _ flag is equal to 0, set CuPredMode [ x ] [ y ] equal to MODE _ INTER.
Else (pred _ MODE _ flag equal to 1), set CuPredMode [ x ] [ y ] equal to MODE _ INTRA.
When pred _ mode _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units with cbWidth equal to 4 and cbHeight equal to 4; and when decoding P or B slice groups, it is inferred to be equal to 0.
pred _ mode _ IBC _ flag equal to 1 specifies that the current coding unit is coded in IBC prediction mode. pred _ mode _ IBC _ flag equal to 0 specifies that the current coding unit is not coded in IBC prediction mode.
When pred _ mode _ ibc _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units coded with cbWidth equal to 4 and cbHeight equal to 4 and coded in skip mode; and when decoding P or B slice groups, it is inferred to be equal to 0.
When pred _ MODE _ IBC _ flag is equal to 1, the variable CuPredMode [ x ] [ y ] is set equal to MODE _ IBC for x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1.
inter _ pred _ idc x0 y0 specifies according to tables 7-9 that list0, list1 or bi-prediction is used for the current coding unit. The array indices x0, y0 specify the position of the top-left luma sample of the coding block under consideration relative to the top-left luma sample of the picture (x0, y 0).
Tables 7-9-associated with inter prediction mode names
When inter _ PRED _ idc x0 y0 is not present, it is inferred to be equal to PRED _ L0.
8.5.2.1 overview
The inputs to this process are:
-a luma position of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current coding block in the luma samples,
a variable cbHeight specifying the height of the current coding block in the luma samples.
The outputs of this process are:
luminance motion vectors mvL0[0] [0] and mvL1[0] [0] with an accuracy of 1/16 fractional samples,
reference indices refIdxL0 and refIdxL1,
the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0],
-bi-prediction weight index gbaidx.
Let variable LX be RefPicList [ X ] of the current picture, where X is 0 or 1.
For the derivation of the variables mvL0[0] [0] and mvL1[0] [0], refIdxL0 and refIdxL1, and predFlagL0[0] [0] and predFlagL1[0] [0], the following applies:
-if Merge _ flag [ xCb ] [ yCb ] is equal to 1, then the luma position (xCb, yCb), the variables cbWidth and cbHeight are used as inputs to invoke the process of deriving the luma motion vector for the Merge mode specified in section 8.5.2.2, the outputs are luma motion vector mvL0[0] [0], mvL1[0] [0], reference indices refIdxL0, refIdxL1, the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0] and the bi-directional prediction weight index gbiIdx.
Otherwise, the following applies:
for predFlagLX [0] [0], mvLX [0] [0] and refIdxLX in PRED _ LX, in the variable PRED _ LX and in the case of replacing X by 0 or 1 in the syntax elements ref _ idx _ LX and MvdLX, the following sequence steps apply:
1. the variables refIdxLX and predFlagLX [0] [0] are derived as follows:
if inter _ PRED _ idc [ xCb ] [ yCb ] is equal to PRED _ LX or PRED _ BI,
refIdxLX=ref_idx_lX[xCb][yCb] (8-266)
predFlagLX[0][0]=1 (8-267)
-otherwise, the variables refIdxLX and predFlagLX [0] [0] are specified as:
refIdxLX=-1 (8-268)
predFlagLX[0][0]=0 (8-269)
2. derivation of the variable mvdLX is as follows:
mvdLX[0]=MvdLX[xCb][yCb][0] (8-270)
mvdLX[1]=MvdLX[xCb][yCb][1] (8-271)
3. when predFlagLX [0] [0] is equal to 1, the luma coding block position (xCb, yCb), the coding block width cbWidth, the coding block height cbHeight, and the variable refIdxLX are taken as inputs to invoke the derivation process of luma motion vector prediction in section 8.5.2.8, and output as mvpLX.
4. When predFlagLX [0] [0] is equal to 1, the luma motion vector mvLX [0] [0] is derived as follows:
uLX[0]=(mvpLX[0]+mvdLX[0]+218)%218 (8-272)
mvLX[0][0][0]=(uLX[0]>=217)?(uLX[0]-218):
uLX[0] (8-273)
uLX[1]=(mvpLX[1]+mvdLX[1]+218)%218(8-274)
mvLX[0][0][1]=(uLX[1]>=217)?(uLX[1]-218):
uLX[1] (8-275)
note 1-mvLX [0] obtained as described above][0][0]And mvLX [0]][0][1]Will always be at-217To 217Range of-1 (both ends inclusive).
-setting the bi-prediction weight index gbaidx equal to gbi _ idx [ xCb ] [ yCb ].
Setting refIdxL1 equal to-1, predFlagL1 equal to 0, and gbiIdx equal to 0 when all of the following conditions are true:
predFlagL0[0] [0] equals 1.
predFlagL1[0] [0] equals 1.
––(cbWidth+cbHeight==8)||(cbWidth+cbHeight==12)||(cbWidth+cbHeight==20)
-cbWidth equal to 4; cbHeight equals 4.
The update process of the history-based motion vector predictor list specified in section 8.5.2.16 is called with luma motion vectors mvL0[0] [0] and mvL1[0] [0], reference indices refIdxL0 and refIdxL1, prediction list with flags predflag l0[0] [0] and predflag l1[0] [0] and bidirectional prediction weight index gbiIdx.
9.5.3.8 binarization procedure of inter _ pred _ idc
The input to this process is a binarization request for the syntax element inter _ pred _ idc, the current luma coding block width cbWidth and the current luma coding block height cbHeight.
The output of this process is binarization of the syntax elements.
Binarization of the syntax element inter _ pred _ idc is specified in tables 9-9.
TABLE 9-9 binarization of inter _ pred _ idc
9.5.4.2.1 overview
Table 9-10-ctxInc assignment to syntax elements with context coded binary bits
5.2 embodiment #2 (disabled 4x4 inter prediction)
7.3.6.6 coding unit syntax
7.4.7.6 coding Unit semantics
pred _ mode _ flag equal to 0 specifies that the current coding unit is coded in inter prediction mode. pred _ mode _ flag equal to 1 indicates that the current coding unit is coded in intra prediction mode. For x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1, the variable CuPredMode [ x ] [ y ] is derived as follows:
-if pred _ MODE _ flag is equal to 0, set CuPredMode [ x ] [ y ] equal to MODE _ INTER.
Else (pred _ MODE _ flag equal to 1), set CuPredMode [ x ] [ y ] equal to MODE _ INTRA.
When pred _ mode _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units with cbWidth equal to 4 and cbHeight equal to 4; and when decoding P or B slice groups, it is inferred to be equal to 0.
pred _ mode _ IBC _ flag equal to 1 specifies that the current coding unit is coded in IBC prediction mode. pred _ mode _ IBC _ flag equal to 0 specifies that the current coding unit is not coded in IBC prediction mode.
When pred _ mode _ ibc _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units coded with cbWidth equal to 4 and cbHeight equal to 4 and coded in skip mode; and when decoding P or B slice groups, it is inferred to be equal to 0.
When pred _ MODE _ IBC _ flag is equal to 1, the variable CuPredMode [ x ] [ y ] is set equal to MODE _ IBC for x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1.
5.3 embodiment #3 (disabling bi-prediction for 4x8, 8x4, 4x16, and 16x4 blocks)
7.4.7.6 coding Unit semantics
inter _ pred _ idc x0 y0 specifies according to tables 7-9 that list0, list1 or bi-prediction is used for the current coding unit. The array indices x0, y0 specify the position of the top-left luma sample of the coding block under consideration relative to the top-left luma sample of the picture (x0, y 0).
Tables 7-9-associated with inter prediction mode names
When inter _ PRED _ idc x0 y0 is not present, it is inferred to be equal to PRED _ L0.
8.5.2.1 overview
The inputs to this process are:
-a luma position of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current coding block in the luma samples,
A variable cbHeight specifying the height of the current coding block in the luma samples.
The outputs of this process are:
luminance motion vectors mvL0[0] [0] and mvL1[0] [0] with an accuracy of 1/16 fractional samples,
reference indices refIdxL0 and refIdxL1,
the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0],
-bi-prediction weight index gbaidx.
Let variable LX be RefPicList [ X ] of the current picture, where X is 0 or 1.
For the derivation of the variables mvL0[0] [0] and mvL1[0] [0], refIdxL0 and refIdxL1, and predFlagL0[0] [0] and predFlagL1[0] [0], the following applies:
-if Merge _ flag [ xCb ] [ yCb ] is equal to 1, then the luma position (xCb, yCb), the variables cbWidth and cbHeight are used as inputs to invoke the process of deriving the luma motion vector for the Merge mode specified in section 8.5.2.2, the outputs are luma motion vector mvL0[0] [0], mvL1[0] [0], reference indices refIdxL0, refIdxL1, the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0] and the bi-directional prediction weight index gbiIdx.
Otherwise, the following applies:
for predFlagLX [0] [0], mvLX [0] [0] and refIdxLX in PRED _ LX, in the variable PRED _ LX and in the case of replacing X by 0 or 1 in the syntax elements ref _ idx _ LX and MvdLX, the following sequence steps apply:
5. The variables refIdxLX and predFlagLX [0] [0] are derived as follows:
if inter _ PRED _ idc [ xCb ] [ yCb ] is equal to PRED _ LX or PRED _ BI,
refIdxLX=ref_idx_lX[xCb][yCb] (8-266)
predFlagLX[0][0]=1 (8-267)
-otherwise, the variables refIdxLX and predFlagLX [0] [0] are specified as:
refIdxLX=-1 (8-268)
predFlagLX[0][0]=0 (8-269)
6. derivation of the variable mvdLX is as follows:
mvdLX[0]=MvdLX[xCb][yCb][0] (8-270)
mvdLX[1]=MvdLX[xCb][yCb][1] (8-271)
7. when predFlagLX [0] [0] is equal to 1, the luma coding block position (xCb, yCb), the coding block width cbWidth, the coding block height cbHeight, and the variable refIdxLX are taken as inputs to invoke the derivation process of luma motion vector prediction in section 8.5.2.8, and output as mvpLX.
8. When predFlagLX [0] [0] is equal to 1, the luma motion vector mvLX [0] [0] is derived as follows:
uLX[0]=(mvpLX[0]+mvdLX[0]+218)%218 (8-272)
mvLX[0][0][0]=(uLX[0]>=217)?(uLX[0]-218):
uLX[0] (8-273)
uLX[1]=(mvpLX[1]+mvdLX[1]+218)%218 (8-274)
mvLX[0][0][1]=(uLX[1]>=217)?(uLX[1]-218):
uLX[1] (8-275)
note 1-mvLX [0] obtained as described above][0][0]And mvLX [0]][0][1]Will always be at-217To 217Range of-1 (both ends inclusive).
-setting the bi-prediction weight index gbaidx equal to gbi _ idx [ xCb ] [ yCb ].
Setting refIdxL1 equal to-1, predFlagL1 equal to 0, and gbiIdx equal to 0 when all of the following conditions are true:
predFlagL0[0] [0] equals 1.
predFlagL1[0] [0] equals 1.
––(cbWidth+cbHeight==8)||(cbWidth+cbHeight==12)||(cbWidth+cbHeight==20)
-cbWidth equal to 4; cbHeight equals 4.
The update process of the history-based motion vector predictor list specified in section 8.5.2.16 is called with luma motion vectors mvL0[0] [0] and mvL1[0] [0], reference indices refIdxL0 and refIdxL1, prediction list with flags predflag l0[0] [0] and predflag l1[0] [0] and bidirectional prediction weight index gbiIdx.
9.5.3.8 binarization procedure of inter _ pred _ idc
The input to this process is a binarization request for the syntax element inter _ pred _ idc, the current luma coding block width cbWidth and the current luma coding block height cbHeight.
The output of this process is binarization of the syntax elements.
Binarization of the syntax element inter _ pred _ idc is specified in tables 9-9.
TABLE 9-9 binarization of inter _ pred _ idc
9.5.4.2.1 overview
Table 9-10-ctxInc assignment to syntax elements with context coded binary bits
Embodiment #4 (disabling 4x4 inter prediction and disabling bi-prediction for 4x8, 8x4 blocks)
7.3.6.6 coding unit syntax
7.4.7.6 coding Unit semantics
pred _ mode _ flag equal to 0 specifies that the current coding unit is coded in inter prediction mode. pred _ mode _ flag equal to 1 indicates that the current coding unit is coded in intra prediction mode. For x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1, the variable CuPredMode [ x ] [ y ] is derived as follows:
-if pred _ MODE _ flag is equal to 0, set CuPredMode [ x ] [ y ] equal to MODE _ INTER.
Else (pred _ MODE _ flag equal to 1), set CuPredMode [ x ] [ y ] equal to MODE _ INTRA.
When pred _ mode _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units with cbWidth equal to 4 and cbHeight equal to 4; and when decoding P or B slice groups, it is inferred to be equal to 0.
pred _ mode _ IBC _ flag equal to 1 specifies that the current coding unit is coded in IBC prediction mode. pred _ mode _ IBC _ flag equal to 0 specifies that the current coding unit is not coded in IBC prediction mode.
When pred _ mode _ ibc _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units coded with cbWidth equal to 4 and cbHeight equal to 4 and coded in skip mode; and when decoding P or B slice groups, it is inferred to be equal to 0.
When pred _ MODE _ IBC _ flag is equal to 1, the variable CuPredMode [ x ] [ y ] is set equal to MODE _ IBC for x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1.
inter _ pred _ idc x0 y0 specifies according to tables 7-9 that list0, list1 or bi-prediction is used for the current coding unit. The array indices x0, y0 specify the position of the top-left luma sample of the coding block under consideration relative to the top-left luma sample of the picture (x0, y 0).
Tables 7-9-associated with inter prediction mode names
When inter _ PRED _ idc x0 y0 is not present, it is inferred to be equal to PRED _ L0.
8.5.2.1 overview
The inputs to this process are:
-a luma position of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture (xCb, yCb),
A variable cbWidth specifying the width of the current coding block in the luma samples,
a variable cbHeight specifying the height of the current coding block in the luma samples.
The outputs of this process are:
luminance motion vectors mvL0[0] [0] and mvL1[0] [0] with an accuracy of 1/16 fractional samples,
reference indices refIdxL0 and refIdxL1,
the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0],
-bi-prediction weight index gbaidx.
Let variable LX be RefPicList [ X ] of the current picture, where X is 0 or 1.
For the derivation of the variables mvL0[0] [0] and mvL1[0] [0], refIdxL0 and refIdxL1, and predFlagL0[0] [0] and predFlagL1[0] [0], the following applies:
-if Merge _ flag [ xCb ] [ yCb ] is equal to 1, then the luma position (xCb, yCb), the variables cbWidth and cbHeight are used as inputs to invoke the process of deriving the luma motion vector for the Merge mode specified in section 8.5.2.2, the outputs are luma motion vector mvL0[0] [0], mvL1[0] [0], reference indices refIdxL0, refIdxL1, the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0] and the bi-directional prediction weight index gbiIdx.
Otherwise, the following applies:
for predFlagLX [0] [0], mvLX [0] [0] and refIdxLX in PRED _ LX, in the variable PRED _ LX and in the case of replacing X by 0 or 1 in the syntax elements ref _ idx _ LX and MvdLX, the following sequence steps apply:
1. The variables refIdxLX and predFlagLX [0] [0] are derived as follows:
if inter _ PRED _ idc [ xCb ] [ yCb ] is equal to PRED _ LX or PRED _ BI,
refIdxLX=ref_idx_lX[xCb][yCb] (8-266)
predFlagLX[0][0]=1 (8-267)
-otherwise, the variables refIdxLX and predFlagLX [0] [0] are specified as:
refIdxLX=-1 (8-268)
predFlagLX[0][0]=0 (8-269)
2. derivation of the variable mvdLX is as follows:
mvdLX[0]=MvdLX[xCb][yCb][0] (8-270)
mvdLX[1]=MvdLX[xCb][yCb][1] (8-271)
3. when predFlagLX [0] [0] is equal to 1, the luma coding block position (xCb, yCb), the coding block width cbWidth, the coding block height cbHeight, and the variable refIdxLX are taken as inputs to invoke the derivation process of luma motion vector prediction in section 8.5.2.8, and output as mvpLX.
4. When predFlagLX [0] [0] is equal to 1, the luma motion vector mvLX [0] [0] is derived as follows:
uLX[0]=(mvpLX[0]+mvdLX[0]+218)%218 (8-272)
mvLX[0][0][0]=(uLX[0]>=217)?(uLX[0]-218):
uLX[0] (8-273)
uLX[1]=(mvpLX[1]+mvdLX[1]+218)%218 (8-274)
mvLX[0][0][1]=(uLX[1]>=217)?(uLX[1]-218):
uLX[1] (8-275)
note 1-mvLX [0] obtained as described above][0][0]And mvLX [0]][0][1]Will always be at-217To 217Range of-1 (both ends inclusive).
-setting the bi-prediction weight index gbaidx equal to gbi _ idx [ xCb ] [ yCb ].
Setting refIdxL1 equal to-1, predFlagL1 equal to 0, and gbiIdx equal to 0 when all of the following conditions are true:
predFlagL0[0] [0] equals 1.
predFlagL1[0] [0] equals 1.
- (cbWidth + cbHeight ═ 8) | (cbWidth + cbHeight ═ 12) -cbWidth equal to 4;
cbHeight equal to 4.
The update process of the history-based motion vector predictor list specified in section 8.5.2.16 is called with luma motion vectors mvL0[0] [0] and mvL1[0] [0], reference indices refIdxL0 and refIdxL1, prediction list with flags predflag l0[0] [0] and predflag l1[0] [0] and bidirectional prediction weight index gbiIdx.
9.5.3.8 binarization procedure of inter _ pred _ idc
The input to this process is a binarization request for the syntax element inter _ pred _ idc, the current luma coding block width cbWidth and the current luma coding block height cbHeight.
The output of this process is binarization of the syntax elements.
Binarization of the syntax element inter _ pred _ idc is specified in tables 9-9.
TABLE 9-9 binarization of inter _ pred _ idc
9.5.4.2.1 overview
Table 9-10-ctxInc assignment to syntax elements with context coded binary bits
5.5 embodiment #5 (disabling 4x4 inter prediction and disabling bi-prediction for 4x8, 8x4 blocks, disabling shared Merge lists for regular Merge mode)
7.3.6.6 coding unit syntax
7.4.7.6 coding Unit semantics
pred _ mode _ flag equal to 0 specifies that the current coding unit is coded in inter prediction mode. pred _ mode _ flag equal to 1 indicates that the current coding unit is coded in intra prediction mode. For x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1, the variable CuPredMode [ x ] [ y ] is derived as follows:
-if pred _ MODE _ flag is equal to 0, set CuPredMode [ x ] [ y ] equal to MODE _ INTER.
Else (pred _ MODE _ flag equal to 1), set CuPredMode [ x ] [ y ] equal to MODE _ INTRA.
When pred _ mode _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units with cbWidth equal to 4 and cbHeight equal to 4; and when decoding P or B slice groups, it is inferred to be equal to 0.
pred _ mode _ IBC _ flag equal to 1 specifies that the current coding unit is coded in IBC prediction mode. pred _ mode _ IBC _ flag equal to 0 specifies that the current coding unit is not coded in IBC prediction mode.
When pred _ mode _ ibc _ flag is not present, it is inferred to be equal to 1 when decoding I slice groups or coding units coded with cbWidth equal to 4 and cbHeight equal to 4 and coded in skip mode; and when decoding P or B slice groups, it is inferred to be equal to 0.
When pred _ MODE _ IBC _ flag is equal to 1, the variable CuPredMode [ x ] [ y ] is set equal to MODE _ IBC for x x0., x0+ cbWidth-1 and y y0., y0+ cbHeight-1.
inter _ pred _ idc x0 y0 specifies according to tables 7-9 that list0, list1 or bi-prediction is used for the current coding unit. The array indices x0, y0 specify the position of the top-left luma sample of the coding block under consideration relative to the top-left luma sample of the picture (x0, y 0).
Tables 7-9-associated with inter prediction mode names
When inter _ PRED _ idc x0 y0 is not present, it is inferred to be equal to PRED _ L0.
8.5.2.1 overview
The inputs to this process are:
-a luma position of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current coding block in the luma samples,
a variable cbHeight specifying the height of the current coding block in the luma samples.
The outputs of this process are:
luminance motion vectors mvL0[0] [0] and mvL1[0] [0] with an accuracy of 1/16 fractional samples,
reference indices refIdxL0 and refIdxL1,
the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0],
-bi-prediction weight index gbaidx.
Let variable LX be RefPicList [ X ] of the current picture, where X is 0 or 1.
For the derivation of the variables mvL0[0] [0] and mvL1[0] [0], refIdxL0 and refIdxL1, and predFlagL0[0] [0] and predFlagL1[0] [0], the following applies:
-if Merge _ flag [ xCb ] [ yCb ] is equal to 1, then the luma position (xCb, yCb), the variables cbWidth and cbHeight are used as inputs to invoke the process of deriving the luma motion vector for the Merge mode specified in section 8.5.2.2, the outputs are luma motion vector mvL0[0] [0], mvL1[0] [0], reference indices refIdxL0, refIdxL1, the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0] and the bi-directional prediction weight index gbiIdx.
Otherwise, the following applies:
for predFlagLX [0] [0], mvLX [0] [0] and refIdxLX in PRED _ LX, in the variable PRED _ LX and in the case of replacing X by 0 or 1 in the syntax elements ref _ idx _ LX and MvdLX, the following sequence steps apply:
5. the variables refIdxLX and predFlagLX [0] [0] are derived as follows:
if inter _ PRED _ idc [ xCb ] [ yCb ] is equal to PRED _ LX or PRED _ BI,
refIdxLX=ref_idx_lX[xCb][yCb] (8-266)
predFlagLX[0][0]=1 (8-267)
-otherwise, the variables refIdxLX and predFlagLX [0] [0] are specified as:
refIdxLX=-1 (8-268)
predFlagLX[0][0]=0 (8-269)
6. derivation of the variable mvdLX is as follows:
mvdLX[0]=MvdLX[xCb][yCb][0] (8-270)
mvdLX[1]=MvdLX[xCb][yCb][1] (8-271)
7. when predFlagLX [0] [0] is equal to 1, the luma coding block position (xCb, yCb), the coding block width cbWidth, the coding block height cbHeight, and the variable refIdxLX are taken as inputs to invoke the derivation process of luma motion vector prediction in section 8.5.2.8, and output as mvpLX.
8. When predFlagLX [0] [0] is equal to 1, the luma motion vector mvLX [0] [0] is derived as follows:
uLX[0]=(mvpLX[0]+mvdLX[0]+218)%218(8-272)
mvLX[0][0][0]=(uLX[0]>=217)?(uLX[0]-218):
uLX[0](8-273)
uLX[1]=(mvpLX[1]+mvdLX[1]+218)%218(8-274)
mvLX[0][0][1]=(uLX[1]>=217)?(uLX[1]-218):
uLX[1](8-275)
note 1-mvLX [0] obtained as described above][0][0]And mvLX [0]][0][1]Will always be at-217To 217Range of-1 (both ends inclusive).
-setting the bi-prediction weight index gbaidx equal to gbi _ idx [ xCb ] [ yCb ].
Setting refIdxL1 equal to-1, predFlagL1 equal to 0, and gbiIdx equal to 0 when all of the following conditions are true:
predFlagL0[0] [0] equals 1.
predFlagL1[0] [0] equals 1.
- - (cbWidth + cbHeight ═ 8) | (cbWidth + cbHeight ═ 12) -cbWidth equal to 4;
cbHeight equal to 4.
The update process of the history-based motion vector predictor list specified in section 8.5.2.16 is called with luma motion vectors mvL0[0] [0] and mvL1[0] [0], reference indices refIdxL0 and refIdxL1, prediction list with flags predflag l0[0] [0] and predflag l1[0] [0] and bidirectional prediction weight index gbiIdx.
8.5.2.2 Merge mode brightness motion vector derivation process
This procedure is invoked only when merge _ flag [ xCb ] [ yPb ] is equal to 1, where (xCb, yCb) specifies the top left luma sample of the current luma coding block relative to the top left luma samples of the current picture.
The inputs to this process are:
-a luma position of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture (xCb, yCb),
a variable cbWidth specifying the width of the current coding block in the luma samples,
a variable cbHeight specifying the height of the current coding block in the luma samples.
The outputs of this process are:
luminance motion vectors mvL0[0] [0] and mvL1[0] [0] with an accuracy of 1/16 fractional samples,
Reference indices refIdxL0 and refIdxL1,
the prediction list utilizes flags predFlagL0[0] [0] and predFlagL1[0] [0],
-bi-prediction weight index gbaidx.
The bi-prediction weight index gbaidx is set equal to 0.
The variables xSmr, ySmr, smrWidth, smrHeight and smrNumHmvpCand are derived as follows:
xSmr=IsInSmr[xCb][yCb]?SmrX[xCb][yCb]:xCb (8-276)
ySmr=IsInSmr[xCb][yCb]?SmrY[xCb][yCb]:yCb (8-277)
smrWidth=IsInSmr[xCb][yCb]?SmrW[xCb][yCb]:cbWidth (8-278)
smrHeight=IsInSmr[xCb][yCb]?SmrH[xCb][yCb]:cbHeight (8-279)
smrNumHmvpCand=IsInSmr[xCb][yCb]?NumHmvpSmrCand:NumHmvpCand (8-280)
8.5.2.6 derivation of history-based Merge candidates
The inputs to this process are:
a Merge candidate list mergeCandList,
-a variable isinSmr specifying whether the current coding unit is within the shared Merge candidate region,
the number of Merge candidates available in the list numMercMergeCand.
The outputs of this process are:
-a modified Merge candidate list mergeCandList,
-the number of modified Merge candidates in the list numMercMergeCand.
The variable ispreneda1And isprauededb1Are set equal to FALSE.
The array smrhmwpandlist and the variable smrnumhmmvpcand are derived as follows:
smrHmvpCandList=isInSmrHmvpSmrCandList:HmvpCandList(8-353)
smrNumHmvpCand=isInSmrNumHmvpSmrCand:NumHmvpCand
(8-354)
for each candidate in smrhmvpandlist [ hMvpIdx ] with index hMvpIdx ═ 1.. smrumhmvppcand, the following ordered steps are repeated until numcuremergecand equals (MaxNumMergeCand-1):
1. the variable sameMotion is derived as follows:
if A is for N1Or B1For any Merge candidate N, if all of the following conditions are TRUE, then sameMotion and ispRendedN are set equal to TRUE:
-hMvpIdx is less than or equal to 2.
-candidate smrhmwpandlist [ smrnumhmwpand-hMvpIdx ] is equal to Merge candidate N.
-isprenedn equals FALSE.
-otherwise, set sameMotion equal to FALSE.
2. When sameMotion equals FALSE, the candidate smrhmpCandList [ smrNumhHmvpCand-hMvpIdx ] is added to the Merge candidate list as follows:
mergeCandList[numCurrMergeCand++]=smrHmvpCandList[smrNumHmvpCand-hMvpIdx](8-355)
9.5.3.8 binarization procedure of inter _ pred _ idc
The input to this process is a binarization request for the syntax element inter _ pred _ idc, the current luma coding block width cbWidth and the current luma coding block height cbHeight.
The output of this process is binarization of the syntax elements.
Binarization of the syntax element inter _ pred _ idc is specified in tables 9-9.
TABLE 9-9 binarization of inter _ pred _ idc
9.5.4.2.1 overview
Table 9-10-ctxInc assignment to syntax elements with context coded binary bits
Fig. 11 is a block diagram of a video processing apparatus 1100. Apparatus 1100 may be used to implement one or more of the methods described herein. The apparatus 1100 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1100 may include one or more processors 1102, one or more memories 1104, and video processing hardware 1106. The processor 4302 may be configured to implement one or more of the methods described herein. Although some embodiments may operate without memory, the memory(s) 1104 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 1106 may be used to implement some of the techniques described in this document in hardware circuits.
Fig. 12 is a flow diagram of an example method 1200 for video processing. The method 1200 includes determining (1202) a size constraint between a representative motion vector of the affine-encoded current video block and a motion vector of a sub-block of the current video block, and performing (1204) a conversion between a bitstream representation and pixel values of the current video block or the sub-block by using the size constraint.
In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during the conversion from a pixel representation of the video to a corresponding bitstream representation, and vice versa. As defined by the syntax, the bitstream representation of the current video block may, for example, correspond to bits collocated or interspersed in different locations within the bitstream. For example, a macroblock may be encoded according to the transform and encoded error residual values and also using bits in the header and other fields in the bitstream.
It should be appreciated that the disclosed techniques may be used to implement embodiments in which the implementation complexity of video processing is reduced by reduced memory requirements or line buffer size requirements. Some of the presently disclosed techniques may be described using the following clause-based description.
1. A video processing method, comprising:
determining a size limit between a representative motion vector of an affine-encoded current video block and a motion vector of a sub-block of the current video block; and
the conversion between the bit stream representation and the pixel values of the current video block or sub-block is performed by using size constraints.
2. The method of clause 1, wherein performing the conversion comprises generating a bitstream representation from the pixel values.
3. The method of clause 1, wherein performing the conversion comprises generating pixel values from the bitstream representation.
4. The method according to any of clauses 1 to 3, wherein the size limitation comprises constraining the values of the motion vectors (MVx, MVy) of the sub-blocks according to: MVx > -MV ' x-DH0 and MVx < ═ MV ' x + DH1 and MVy > -MV ' y-DV0 and MVy < ═ MV ' y + DV1, where MV ' ═ MV ' x, MV ' y;
wherein MV' represents a representative motion vector; and wherein DH0, DH1, DV0, and DV1 represent positive numbers.
5. The method of clause 4, wherein the size limit comprises at least one of:
dh0 equals DH1 or DV0 equals DV 1.
Dh0 equals DV0 or DH1 equals DV 1.
Dh0 and DH1 differ or DV0 and DV1 differ.
Signaling DH0, DH1, DV0 and DV1 in a bitstream representation at a video parameter set level, or sequence parameter set level, or picture parameter set level, or slice header level, or coding tree unit level, or coding unit level, or prediction unit level.
Dh0, DH1, DV0 and DV1 are functions of the video processing mode.
Dh0, DH1, DV0 and DV1 depend on the width and height of the current video block.
Dh0, DH1, DV0, and DV1 depend on whether the current video block is encoded using uni-directional prediction or bi-directional prediction.
Dh0, DH1, DV0 and DV1 depend on the location of the sub-blocks.
6. The method of any of clauses 1-5, wherein the representative motion vector corresponds to a control point motion vector of the current video block.
7. The method of any of clauses 1 to 5, wherein the representative motion vector corresponds to a motion vector of a corner sub-block of the current video block.
8. The method according to any of clauses 1 to 7, wherein the precision of the motion vector for the sub-block and the representative motion vector corresponds to the motion vector signaling precision in the bitstream representation.
9. The method according to any one of clauses 1 to 7, wherein the precision of the motion vector for the sub-block and the representative motion vector corresponds to a storage precision for storing the motion vector.
10. A video processing method, comprising:
determining one or more sub-blocks of the current video block for the affine-encoded current video block, wherein each sub-block has a size of MxN pixels, wherein M and N are multiples of 2 or 4;
Fitting the motion vectors of the sub-blocks to a size limit; and
the conversion between the bit stream representation and the pixel values of the current video block is performed conditionally based on the flip-flops by using size constraints.
11. The method of clause 10, wherein performing the conversion comprises generating a bitstream representation from the pixel values.
12. The method of clause 10, wherein performing the conversion comprises generating pixel values from the bitstream representation.
13. The method of any of clauses 10-12, wherein the size constraint limits a maximum difference between integer portions of sub-block motion vectors of the current video block to less than or equal to K pixels, where K is an integer.
14. The method of any of clauses 10-13, wherein the method is applied only if the current video block is encoded using bi-prediction.
15. The method of any of clauses 10-13, wherein the method is applied only if the current video block is encoded using unidirectional prediction.
16. The method of any of clauses 10-13, wherein the value of M, N or K is a function of a uni-prediction or bi-prediction mode of the current video block.
17. The method of any of clauses 10-13, wherein the value of M, N or K is a function of the height or width of the current video block.
18. The method according to any of clauses 10 to 17, wherein the trigger is included in the bitstream representation at a video parameter set level, or a sequence parameter set level, or a picture parameter set level, or a slice header level, or a coding tree unit level, or a coding unit level, or a prediction unit level.
19. The method of clause 18, wherein the trigger signals M, N or the value of K.
20. The method of any of clauses 10-19, wherein the one or more sub-blocks of the current video block are calculated based on a type of affine encoding used for the current video block.
21. The method of clause 20, wherein the sub-blocks of the uni-directional prediction and bi-directional prediction affine prediction modes are calculated using two different methods.
22. The method of clause 21, wherein in the case that the current video block is a bi-directionally predicted affine block, the width or height of the sub-blocks from different reference lists is different.
23. The method of any of clauses 20-22, wherein the one or more sub-blocks correspond to a luma component.
24. The method of any of clauses 10 to 23, wherein a motion vector difference between the motion vector value of the current video block and the motion vector value of one of the one or more sub-blocks is used to determine the width and height of the one or more sub-blocks.
25. The method according to any of clauses 20 to 23, wherein the calculating is based on pixel precision signaled in the bitstream representation.
26. A video processing method, comprising:
determining that a current video block meets a size condition; and
based on the determination, a conversion between a bit stream representation and pixel values of the video block is performed by excluding a bi-predictive coding mode of the current video block.
27. A video processing method, comprising:
determining that a current video block meets a size condition; and
based on the determination, a conversion between a bitstream representation in which the inter prediction mode is signaled according to a size condition and pixel values of the current video block is performed.
28. A video processing method, comprising:
determining that a current video block meets a size condition; and
based on the determination, a conversion between the bitstream representation and the pixel values of the current video block is performed, wherein the generation of the Merge candidate list during the conversion depends on the size condition.
29. A video processing method, comprising:
determining that a sub-coding unit of a current video block satisfies a size condition; and
based on the determination, a conversion between a bitstream representation of the current video block and pixel values is performed, wherein a coding tree partitioning process for generating sub-coding units depends on a size condition.
30. The method of any of clauses 26 to 29, wherein the dimensional condition is one of, wherein w is the width and h is the height:
(a) w is equal to T1 and h is equal to T2, or h is equal to T1 and w is equal to T2;
(b) w is equal to T1 and h is not greater than T2, or h is equal to T1 and w is not greater than T2;
(c) w is not greater than T1 and h is not greater than T2, or h is not greater than T1 and w is not greater than T2.
31. The method of clause 30, wherein T1-8 and T2-8 or T1-8, T2-4 or T1-4 and T2-4, or T1-4 and T2-16.
32. The method of any of clauses 26 to 29, wherein the converting comprises generating a bitstream representation from pixel values of the current video block or generating pixel values of the current video block from the bitstream representation.
33. A video processing method, comprising:
determining a weight index of a generalized bi-prediction (GBi) process for the current video block based on a location of the current video block; and
the conversion between the current video block and its bitstream representation is performed using the weight index to implement the GBi process.
34. The method of clause 33, wherein converting comprises generating a bit stream representation from pixel values of the current video block or generating pixel values of the current video block from the bit stream representation.
35. The method of any of clauses 33 or clauses 34, wherein determining comprises: for a current video block at a first location, another weight index of a neighboring block is inherited or predicted, and for a current video block at a second location, the GBI is computed without inheritance from the neighboring block.
36. The method of clause 35, wherein the second location comprises a current video block in a different coding tree unit than the neighboring blocks.
37. The method of clause 35, wherein the second location corresponds to the current video block being in a different coding tree unit line (line) or a different coding tree unit line (row) from the neighboring block.
38. A video processing method, comprising:
determining that a current video block is encoded as an Intra Inter Prediction (IIP) encoded block; and
the conversion between the current video block and its bitstream representation is performed using a simplified rule for determining an intra prediction mode or Most Probable Mode (MPM) for the current video block.
39. The method of clause 38, wherein converting comprises generating a bit stream representation from pixel values of the current video block or generating pixel values of the current video block from the bit stream representation.
40. The method of any of clauses 38-39, wherein the simplification rule specifies determining an intra-prediction encoding mode for a current video block that is intra-inter prediction (IIP) encoded to be independent of another intra-prediction encoding mode for the adjacent video block.
41. The method of any of clauses 38 to 39, wherein the intra-prediction encoding mode is represented in the bitstream representation using encoding independent of neighboring blocks.
42. The method of any of clauses 38 to 40, wherein the simplification rule specifies a preference for an encoding mode of the intra-coded block over an encoding mode of the intra-predicted block.
43. The method according to clause 38, wherein the simplification rule specifies that the MPM is determined by inserting intra prediction modes from intra-coded neighboring blocks before inserting intra prediction modes from the IIP-coded neighboring blocks.
44. The method of clause 38, wherein the simplification rule specifies that the MPM is determined using the same construction procedure as for another normal intra-coded block.
45. A video processing method, comprising:
determining that a current video block meets a simplification criterion; and
switching between the current video block and the bitstream representation is performed by disabling an inter intra prediction mode for the switching or by disabling additional coding tools for the switching.
46. The method of clause 45, wherein converting comprises generating a bit stream representation from pixel values of the current video block or generating pixel values of the current video block from the bit stream representation.
47. The method of any of clauses 45-46, wherein the simplification criterion comprises a width or height of the current video block being equal to T1, wherein T1 is an integer.
48. The method of any of clauses 45-46, wherein the simplification criterion comprises a width or height of the current video block being greater than T1, wherein T1 is an integer.
49. The method of any of clauses 45-46, wherein the simplification criteria include a width of the current video block being equal to T1 and a height of the current video block being equal to T2.
48. The method of any of clauses 45-46, wherein the simplification criterion specifies that the current video block uses a bi-prediction mode.
49. The method of any of clauses 45-46, wherein the additional encoding tool comprises bi-directional optical flow (BIO) encoding.
50. The method of any of clauses 45-46, wherein the additional coding tool comprises an overlapped block motion compensation mode.
51. A video processing method, comprising:
performing a conversion between the current video block and a bitstream representation of the current video block using a motion vector based encoding process, wherein:
(a) during the conversion process, precision P1 is used to store spatial motion predictors and precision P2 is used to store temporal motion predictors, where P1 and P2 are fractions, or
(b) Precision Px is used to store x motion vectors and precision Py is used to store y motion vectors, where Px and Py are fractions.
52. The method of clause 51, wherein P1, P2, Px, and Py are different numbers.
53. The method of clause 52, wherein:
p1 is a 1/16 luminance pixel and P2 is a 1/4 luminance pixel, or
P1 is a 1/16 luminance pixel and P2 is a 1/8 luminance pixel, or
P1 is a 1/8 luminance pixel and P2 is a 1/4 luminance pixel, or
P1 is a 1/8 luminance pixel and P2 is a 1/8 luminance pixel, or
P2 is a 1/16 luminance pixel and P1 is a 1/4 luminance pixel, or
P2 is a 1/16 luminance pixel and P1 is a 1/8 luminance pixel, or
P2 is a 1/8 luminance pixel, and P1 is a 1/4 luminance pixel.
54. The method of clauses 51-52, wherein P1 and P2 differ for different pictures included in different temporal layers in the bitstream representation.
55. The method of clauses 51-52, wherein the calculated motion vectors are processed by a precision correction process before being stored as temporal motion prediction.
56. The method of clauses 51-52, wherein storing comprises storing the x motion vector and the y motion vector as N-bit integers, and wherein the range of values of the x motion vector is [ MinX, MaxX ], and the range of values of the y motion vector is [ MinY, MaxY ], wherein the range satisfies one or more of:
a.MinX is equal to MinY,
maxx is equal to MaxY,
{ MinX, MaxX } depends on Px;
{ MinY, MaxY } depends on Py;
{ MinX, MaxX, MinY, MaxY } depends on N.
f. { MinX, MaxX, MinY, MaxY } is different for an MV stored for spatial motion prediction and another MV stored for temporal motion prediction.
g. For pictures in different temporal layers, { MinX, MaxX, MinY, MaxY } is different.
h. For pictures with different widths or heights, { MinX, MaxX, MinY, MaxY } is different.
i. For pictures with different widths, { MinX, MaxX } is different.
j. For pictures with different heights, { MinY, MaxY } is different.
k. MVx is clipped to [ MinX, MaxX ] before storage for spatial motion prediction.
Before storage for temporal motion prediction, MVx is clipped to [ MinX, MaxX ].
Before storing for spatial motion prediction, MVy is clipped to [ MinY, MaxY ].
Before storing for temporal motion prediction, MVy is clipped to [ MinY, MaxY ].
59. A video processing method, comprising: interpolating a small sub-block of W1xH1 size within a large sub-block of W2xH2 size of the current video block by extracting a (W2+ N-1-PW) (H2+ N-1-PH) block, pixel-filling the extracted block, performing boundary pixel repetition on the pixel-filled block, and obtaining pixel values of the small sub-blocks, wherein W1, W2, H1, H2, and PW and PH are integers; and performing a conversion between the current video block and a bitstream representation of the current video block using the interpolated pixel values of the small sub-blocks.
60. The method of clause 59, wherein converting comprises generating the current video block from the bitstream representation or generating the bitstream representation from the current sub-block.
61. The method of any one of clauses 59-60, wherein W2-H2-8, W1-H1-4, and PW-PH-0.
62. A video processing method, comprising:
performing a motion compensation operation during the transition of the WxH sized current video block and a bitstream representation of the current video block by fetching (W + N-1-PW) × (W + N-1-PH) reference pixels and filling reference pixels outside the fetched reference pixels during the motion compensation operation; and
the result of the motion compensation operation is used to perform a conversion between the current video block and a bitstream representation of the current video block, where W, H, N, PW and PH are integers.
63. The method of clause 62, wherein converting comprises generating the current video block from the bitstream representation or generating the bitstream representation from the current sub-block.
64. The method of any of clauses 62-63, wherein filling comprises repeating the left or right boundary of the extracted pixel.
65. The method of any of clauses 62-63, wherein filling comprises repeating the upper or lower boundary of the acquired pixel.
66. The method of any of clauses 62-63, wherein filling comprises setting the pixel values to a constant.
67. The method of clause 38, wherein the rule specifies that the same arithmetic coding context is used during the conversion as used for other intra-coded blocks.
68. The method of clause 38, wherein the converting of the current video block does not include using MPM on the current video block.
69. The method of clause 38, wherein the simplification rule specifies that only DC and planar modes are used for bit stream representation of the current video block as the IIP encoded block.
70. The method of clause 38, wherein the simplified rule specifies different intra prediction modes for the luma and chroma components.
71. The method of clause 44, wherein a subset of the MPMs is used for the current video block encoded by the IIP.
72. The method of clause 38, wherein the simplification rule indicates that the MPM is selected based on the intra-prediction modes included in the MPM list.
73. The method according to clause 38, wherein the simplification rule indicates that a subset of MPMs is to be selected from the MPM list, and the mode index associated with the subset is signaled.
74. The method of clause 38, wherein the context used to encode the intra MPM mode is used to encode the intra mode of the IIP-encoded current video block.
75. The method of clause 44, wherein equal weights are used for the intra-predicted block and the inter-predicted block generated for the current video block, the current video block being an IIP encoded block.
76. The method of clause 44, wherein a zero weight is used for the location in the IIP encoding process of the current video block.
77. The method of clause 77, wherein the zero weight is applied to the intra-predicted block used in the IIP encoding process.
78. The method according to clause 77, wherein the zero weight is applied to the inter-predicted block used in the IIP encoding process.
79. A video processing method, comprising:
determining, based on a size of the current video block, that bi-prediction or uni-prediction of the current video block is not allowed; and
based on the determination, a conversion between the bitstream representation and the pixel values of the current video block is performed by disabling the bi-directional prediction or the uni-directional prediction mode. For example, the disallowed modes are not used to encode or decode the current video block. The conversion operation may represent video encoding or compression, or video decoding or decompression.
80. The method of clause 79, wherein the current video block is 4x8, and determining comprises determining that bi-prediction is not allowed. Other examples are given in example 5.
81. The method of clause 79, wherein the current video block is 4x8 or 8x4, and determining comprises determining that bi-prediction is not allowed.
82. The method of clause 79, wherein the current video block is 4xN, where N is an integer of < ═ 16, and the determining comprises determining that bi-prediction is not allowed.
83. The method of any of clauses 26-29 or 79-82, wherein the size of the current block corresponds to the size of the color component or the luma component of the current block.
84. The method of clause 83, wherein disabling bi-prediction or uni-prediction applies to all three components of the current video block.
85. The method according to clause 83, wherein the disabling of bi-directional prediction or uni-directional prediction applies only to the color component whose size is used as the size of the current block.
86. The method of any of clauses 79 to clause 85, wherein the converting is performed by disabling bi-prediction and further using the Merge candidate of bi-prediction, and then allocating only one motion vector from one reference list to the current video block.
87. The method of clause 79, wherein the current video block is 4x4, and the determining comprises determining that bi-prediction and uni-prediction are prohibited.
88. The method of clause 87, wherein the current video block is encoded as an intra block.
89. The method of clause 87, wherein the current video block is restricted to use integer-pixel motion vectors.
Additional examples and embodiments of clauses 78-89 are described in example 5.
90. A method of processing video, comprising:
determining a video encoding condition of the current video block based on the size of the current video block; and
based on video coding conditions, a conversion between the current video block and a bitstream representation of the current video block is performed.
91. The method of clause 90, wherein the video coding condition specifies selectively signaling a skip flag or an intra block coding flag in the bitstream representation.
92. The method of clause 90 or 91, wherein the video coding condition specifies a prediction mode that selectively signals the current video block.
93. The method of any of clauses 90-92, wherein the video encoding condition specifies a delta mode encoding that selectively signals the current video block.
94. The method of any of clauses 90-93, wherein the video coding condition specifies selectively signaling a current video block inter prediction direction.
95. The method of any of clauses 90-94, wherein the video coding condition specifies selectively modifying a motion vector or a block vector for intra block copying of the current video block.
96. The method of any of clauses 90-95, wherein the video condition depends on a height in pixels of the current video block.
97. The method of any of clauses 90-96, wherein the video condition depends on a width in pixels of the current video block.
98. The method of any of clauses 90-95, wherein the video condition depends on whether the current video block is square.
Additional examples of the terms 90-98 are provided in items 11-16 listed in section 4 of this document.
99. A video encoder apparatus comprising a processor configured to perform the method of one or more of clauses 1-98.
100. A video decoder apparatus comprising a processor configured to perform the method of one or more of clauses 1-98.
101. A computer readable medium having code stored thereon, which when executed by a processor causes the processor to implement the method of any one or more of clauses 1-98.
Fig. 16 is a block diagram illustrating an example video processing system 1600 in which various techniques disclosed herein may be implemented. Various embodiments may include some or all of the components of system 1600. The system 1600 can include an input 1602 for receiving video content. The video content may be received in a raw or uncompressed format (e.g., 8 or 10 bit multi-component pixel values), or may be received in a compressed or encoded format. The input 1602 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces (such as ethernet, Passive Optical Network (PON), etc.) and wireless interfaces (such as Wi-Fi or cellular interfaces).
Examples of a peripheral bus interface or display interface may include Universal Serial Bus (USB) or High Definition Multimedia Interface (HDMI) or Displayport, among others. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described herein may be embodied in various electronic devices such as mobile phones, laptops, smart phones, or other devices capable of performing digital data processing and/or video display.
Fig. 17 is a flowchart representation of a video processing method 1700 according to the present disclosure. The method 1700 includes, at operation 1702, for a conversion between a current block of video and a bitstream representation of the video using an affine coding tool, determining that a first motion vector of a sub-block of the current block and a second motion vector, which is a representative motion vector of the current block, comply with a size constraint. The method 1700 also includes, at operation 1704, performing a transformation based on the determination.
In some embodiments, the first motion vector of the sub-block is represented as (MVx, MVy) and the second motion vector is represented as (MV 'x, MV' y). The size constraint indicates MVx > -MV 'x-DH 0, MVx < (MV' x + DH1, MVy > -MV 'y-DV 0, and MVy < (MV' y + DV 1), DH0, DH1, DV0, and DV1 are positive numbers. In some embodiments, DH0 ═ DH 1. In some embodiments, DH0 ≠ DH 1. In some embodiments, DV0 ═ DV 1. In some embodiments, DV0 ≠ DV 0. In some embodiments, DH0 ═ DV 0. In some embodiments, DH0 ≠ DV 0. In some embodiments, DH1 ═ DV 1. In some embodiments, DH1 ≠ DV 1.
In some embodiments, at least one of DH0, DH1, DV0, or DV1 is signaled in the bitstream representation at a video parameter set level, sequence parameter set level, picture parameter set level, slice header, coding tree unit level, coding unit level, or prediction unit level. In some embodiments, DH0, DH1, DV0, and DV1 differ for different profiles, levels, or hierarchies of transformations. In some embodiments, wherein DH0, DH1, DV0, and DV1 are based on the weight or height of the current block. In some embodiments, DH0, DH1, DV0, and DV1 are based on the prediction mode of the current block, which is either a uni-directional prediction mode or a bi-directional prediction mode. In some embodiments, DH0, DH1, DV0, and DV1 are based on the location of the sub-blocks in the current block.
In some embodiments, the second motion vector comprises a control point motion vector of the current block. In some embodiments, the second motion vector comprises a motion vector of a second sub-block of the current block. In some embodiments, the second sub-block comprises a center sub-block of the current block. In some embodiments, the second sub-block comprises a corner sub-block of the current block. In some embodiments, the second motion vector comprises a motion vector derived for an intra or an extra position of the current block, the position being encoded using the same affine model as the current block. In some embodiments, the location comprises a center location of the current block.
In some embodiments, the first motion vector is adjusted to satisfy a size constraint. In some embodiments, the bitstream is invalid if the first motion vector does not satisfy the size constraint with respect to the second motion vector. In some embodiments, the first motion vector and the second motion vector are represented according to a motion vector signaling precision in the bitstream representation. In some embodiments, the first motion vector and the second motion vector are represented according to a storage precision for storing the motion vector. In some embodiments, the first motion vector and the second motion vector are represented according to a precision different from a motion vector signaling precision or a storage precision used to store the motion vector.
Fig. 18 is a flowchart representation of a video processing method 1800 according to the present disclosure. The method 1800 includes, at operation 1802, determining an affine model including six parameters for a conversion between a current block of video and a bitstream representation of the video. The affine model inherits from affine coding information of neighboring blocks of the current block. The method 1800 includes, at operation 1804, performing a transformation based on the affine model.
In some embodiments, the neighboring blocks are encoded using a second affine model having six parameters, which is identical to the second affine model. In some embodiments, the neighboring blocks are encoded using a third affine model having four parameters. In some embodiments, the affine model is determined based on the location of the current block. In some embodiments, the affine model is determined according to a third affine model in a case where the neighboring block is not in the same Coding Tree Unit (CTU) as the current block. In some embodiments, in the case where the neighboring block is not in the same CTU line (line) or the same CTU line (row) as the current block, the affine model is determined according to the third affine model.
In some embodiments, a slice, strip, or graph is divided into multiple non-overlapping regions. In some embodiments, the affine model is determined from the third affine model in case the neighboring block is not in the same region as the current block. In some embodiments, in the case where the neighboring block is not in the same region line (line) or region line (row) as the current block, the affine model is determined according to the third affine model. In some embodiments, each region has a size of 64 x 64. In some embodiments, the upper left corner of the current block is represented as (x, y) and the upper left corners of neighboring blocks are represented as (x ', y'), and wherein the affine model is determined according to the third affine model in case that conditions on x, y, x ', and y' are satisfied. In some embodiments, the condition indicates that x/M ≠ x'/M, M being a positive integer. In some embodiments, M is 128 or 64. In some embodiments, the condition indicates that y/N ≠ y'/N, N being a positive integer. In some embodiments, N is 128 or 64. In some embodiments, the condition indicates that x/M ≠ x '/M and y/N ≠ y'/N, with M and N being positive integers. In some embodiments, M-N-128 or M-N-64. In some embodiments, the condition indicates that x > > M ≠ x' > > M, which is a positive integer. In some embodiments, M is 6 or 7. In some embodiments, the condition indicates y > > N ≠ y' > > N, N being a positive integer. In some embodiments, N is 6 or 7. In some embodiments, the condition indicates that x > > M ≠ x '> > M and y > > N ≠ y' > > N, with M and N being positive integers. In some embodiments, M-N-6 or M-N-7.
Fig. 19 is a flowchart representation of a video processing method 1900 according to the present disclosure. The method 1900 includes, at operation 1902, determining, for a transition between a block of video and a bitstream representation of the video, whether a bi-predictive coding technique is applicable to the block, based on a block size having a width W and a height H, W and H being positive integers. The method 1900 includes, at operation 1904, performing a conversion in accordance with the determination.
In some embodiments, bi-directional predictive coding techniques are not applicable where W-T1 and H-T2, T1 and T2 are positive integers. In some embodiments, bi-directional predictive coding techniques are not applicable where W-T2 and H-T1, T1 and T2 are positive integers. In some embodiments, bi-directional predictive coding techniques are not applicable where W ≦ T1 and H ≦ T2, T1, and T2 are positive integers. In some embodiments, bi-directional predictive coding techniques are not applicable where W ≦ T2 and H ≦ T1, T1, and T2 are positive integers. In some embodiments, T1-4 and T2-16. In some embodiments, bi-directional predictive coding techniques are not applicable where W ≦ T1 and H ≦ T2, T1, and T2 are positive integers. In some embodiments, T1-T2-8. In some embodiments, T1-8 and T2-4. In some embodiments, T1-T2-4. In some embodiments, T1-4 and T2-8.
In some embodiments, where a bi-predictive coding technique is applicable, an indicator indicating information about the bi-predictive coding technique is signaled in the bitstream. In some embodiments, in the case where the bidirectional predictive coding technique is not applicable to the block, an indicator indicating information about the bidirectional predictive coding technique of the block is excluded from the bitstream. In some embodiments, the bi-predictive coding technique is not applicable where the block size is one of 4x 8 or 8 x 4. In some embodiments, bi-predictive coding techniques are not applicable where the size of a block is 4xN or Nx4, N is a positive integer, and N ≦ 16. In some embodiments, the size of the block corresponds to a first color component of the block, and for the first color component and a remaining color component of the block, it is determined whether a bi-predictive coding technique is applicable. In some embodiments, the size of the block corresponds to a first color component of the block, and it is determined whether the bi-predictive coding technique is applicable only for the first color component. In some embodiments, the first color component comprises a luminance component.
In some embodiments, the method further comprises, in the event that a bidirectional predictive coding technique is not applicable to the current block, assigning a single motion vector from the first reference list or the second reference list upon determining that the selected Merge candidate is encoded using the bidirectional predictive coding technique. In some embodiments, the method further comprises, in the event that bi-predictive coding techniques are not applicable to the current block, determining that the trigonometric prediction mode is not applicable to the block. In some embodiments, whether the bi-directional predictive coding technique is applicable is associated with a prediction direction, which is further associated with a uni-directional predictive coding technique, and wherein the prediction direction is signaled in the bitstream based on the size of the block. In some embodiments, information about the unidirectional predictive coding technique is signaled in the bitstream if: (1) WxH <64 or (2) WxH ═ 64, and W does not equal H. In some embodiments, information about the unidirectional predictive coding technique is signaled in the bitstream if: (1) w x H >64 or (2) W x H64, W being equal to H.
In some embodiments, the restriction indicates that neither bi-directional coding techniques nor unidirectional techniques are applicable to a block where the size of the block is 4x 4. In some embodiments, the restriction applies to the case where the block is affine coded. In some embodiments, the restriction applies to the case where the block is not affine coded. In some embodiments, the restriction applies to the case where the block is intra-coded. In some embodiments, the restriction does not apply to the case where the motion vectors of the blocks have integer precision.
In some embodiments, signaling to generate blocks based on partitioning of parent blocks is skipped in the bitstream, the parent blocks being sized (1) for a quad-tree partition to 8x8, (2) for a binary tree partition to 8x4 or 4x8, (3) or for a ternary tree partition to 4x16 or 16x 4. In some embodiments, an indicator indicating that the motion vector has integer precision is set to 1 in the bitstream. In some embodiments, the motion vector of the block is rounded to integer precision.
In some embodiments, bi-predictive coding techniques are applied to blocks. The size of the reference block is (W + N-1-PW) x (H + N-1-PH), and the boundary pixels of the reference block are repeated to generate a second block of size (W + N-1) x (H + N-1) for the interpolation operation, N denotes the interpolation filter tap, N, PW and PH are integers. In some embodiments, PH is 0, and at least the pixels of the left or right boundary are repeated to generate the second block. In some embodiments, PW is 0, and at least the pixels of the upper or lower boundary are repeated to generate the second block. In some embodiments PW >0 and PH >0, and generating the second block by repeating at least pixels of the upper or lower boundary after repeating at least pixels of the left or right boundary. In some embodiments PW >0 and PH >0, and generating the second block by repeating at least pixels of the left or right boundary after repeating at least pixels of the upper or lower boundary. In some embodiments, the pixels of the left boundary are repeated M1 times, and wherein the pixels of the right boundary are repeated (PW-M1) times. In some embodiments, the pixels of the upper boundary are repeated M2 times, and wherein the pixels of the lower boundary are repeated (PH-M2) times. In some embodiments, for the conversion, how to repeat the boundary pixels of the reference block applies to some or all of the reference blocks. In some embodiments, PW and PH are different for different components of a block.
In some embodiments, the Merge candidate list construction process is performed based on the size of the block. In some embodiments, the Merge candidate is considered as a uni-directional prediction candidate that references the first reference list in a uni-directional prediction coding technique, in the following cases: (1) the Merge candidate is encoded using a bi-directional prediction encoding technique, and (2) bi-directional prediction does not apply to the block depending on the size of the block. In some embodiments, the first reference list comprises reference list 0 or reference list 1 of a unidirectional predictive coding technique. In some embodiments, a Merge candidate is considered unavailable when: (1) the Merge candidate is encoded using a bi-directional prediction encoding technique, and (2) bi-directional prediction does not apply to the block depending on the size of the block. In some embodiments, during the Merge candidate list construction process, unavailable Merge candidates are deleted from the Merge candidate list. In some embodiments, the Merge candidate list construction process of the triangle prediction mode is invoked in case of a block to which bi-prediction is not applicable, depending on the size of the block.
Fig. 20 is a flowchart representation of a video processing method 2000 according to the present disclosure. The method 2000 includes, at operation 2002, determining, for a transition between a block of video and a bitstream representation of the video, whether a coding tree partitioning process is applicable to the block based on a size of a sub-block, the sub-block being a sub-coding unit of the block according to the coding tree partitioning process. The sub-blocks have a width W and a height H, W and H being positive integers. The method 2000 further includes, at operation 2004, performing a conversion according to the determination.
In some embodiments, the code tree partitioning procedure is not applicable where W-T1 and H-T2, T1 and T2 are positive integers. In some embodiments, the code tree partitioning procedure is not applicable where W-T2 and H-T1, T1 and T2 are positive integers. In some embodiments, the code tree partitioning procedure is not applicable where W ≦ T1 and H ≦ T2, T1, and T2 are positive integers. In some embodiments, the code tree partitioning procedure is not applicable where W ≦ T2 and H ≦ T1, T1, and T2 are positive integers. In some embodiments, T1-4 and T2-16. In some embodiments, the code tree partitioning process is not applicable where W ≦ T1 and H ≦ T2, T1, and T2 are positive integers. In some embodiments, T1-T2-8. In some embodiments, T1-8 and T2-4. In some embodiments, T1-T2-4. In some embodiments, T1 ═ 4. In some embodiments, T2 ═ 4. In an embodiment, in case the coding tree partitioning process is not applicable to the current block, signaling of the coding tree partitioning process is omitted from the bitstream.
Fig. 21 is a flowchart representation of a video processing method 2100 according to the present disclosure. The method 2100 includes, at operation 2102, determining, for a transition between a current block of video and a bitstream representation of the video, whether to derive an index for bi-prediction with coding unit level weight (BCW) coding mode based on a rule regarding a location of the current block. In the BCW encoding mode, a weight set including a plurality of weights is used to generate a bidirectional predictor of a current block. The method 2100 further includes performing a transformation based on the determination at operation 2104.
In some embodiments, the bi-directional predictor of the current block is generated as a non-average weighted sum of two motion vectors with at least one weight in the set of weights applied. In some embodiments, the rule specifies that in the case where the current block and the neighboring blocks are located in different coding tree units or maximum coding units, the index is not derived from the neighboring blocks. In some embodiments, the rule specifies that in case the current block and the neighboring blocks are located in different lines (lines) or lines (rows) in the coding tree unit, the index is not derived from the neighboring blocks. In some embodiments, the rule specifies that in the event that the current block and the neighboring block are located in different non-overlapping regions of a slice, or picture of the video, the index is not derived from the neighboring block. In some embodiments, the rule specifies that in the case where the current block and the neighboring block are located in different lines of a non-overlapping region of a slice, or picture of the video, the index is not derived from the neighboring block. In some embodiments, each region has a size of 64 x 64.
In some embodiments, corners of the current block are represented as (x, y), and corners of neighboring blocks are represented as (x ', y'). The rule specifies that the index is not derived from neighboring blocks if (x, y) and (x ', y') satisfy a condition. In some embodiments, the condition indicates that x/M ≠ x'/M, M being a positive integer. In some embodiments, M is 128 or 64. In some embodiments, the condition indicates that y/N ≠ y'/N, N being a positive integer. In some embodiments, N is 128 or 64. In some embodiments, the condition indicates (x/M ≠ x '/M) and (y/N ≠ y'/N), M and N being positive integers. In some embodiments, M-N-128 or M-N-64. In some embodiments, the condition indicates that x > > M ≠ x' > > M, which is a positive integer. In some embodiments, M is 6 or 7. In some embodiments, the condition indicates y > > N ≠ y' > > N, N being a positive integer. In some embodiments, wherein N is 6 or 7. In some embodiments, the condition indicates (x > > M ≠ x '> > M) and (y > > N ≠ y' > > N), M and N being positive integers. In some embodiments, M-N-6 or M-N-7.
In some embodiments, whether the BCW coding mode is applicable to a picture, slice group, or slice is signaled in a picture parameter set, slice header, slice group header, or slice, respectively, in the bitstream. In some embodiments, whether the BCW coding mode is applicable to a picture, slice group, or slice is derived based on information associated with the picture, slice group, or slice. In some embodiments, the information includes at least a Quantization Parameter (QP), temporal layer, or picture order count distance.
Fig. 22 is a flowchart representation of a video processing method 2200 in accordance with the present disclosure. The method 2200 includes, at operation 2202, determining, for a transition between a current block of video encoded using a Combined Inter and Intra Prediction (CIIP) encoding technique and a bitstream representation of the video, an intra prediction mode of the current block independent of intra prediction modes of neighboring blocks. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method 2200 also includes performing a transformation based on the determination at operation 2204.
In some embodiments, the intra prediction mode of the current block is determined without reference to the intra prediction modes of any neighboring blocks. In some embodiments, adjacent blocks are encoded using CIIP coding techniques. In some embodiments, the intra prediction of the current block is determined based on an intra prediction mode of a second neighboring block encoded using an intra prediction encoding technique. In some embodiments, whether to determine the intra prediction mode of the current block according to the second intra prediction mode is based on whether a condition specifying a relationship between the current block as the first block and a second neighboring block as the second block is satisfied. In some embodiments, the determination is part of a Most Probable Mode (MPM) construction process for deriving a current block of a list of MPM modes.
Fig. 23 is a flowchart representation of a video processing method 2300 according to the present disclosure. The method 2300 includes, at operation 2302, for a transition between a current block of video encoded using a Combined Inter and Intra Prediction (CIIP) encoding technique and a bitstream representation of the video, determining an intra prediction mode for the current block according to a first intra prediction mode for a first neighboring block and a second intra prediction mode for a second neighboring block. The first neighboring block is encoded using an intra prediction encoding technique and the second neighboring block is encoded using a CIIP encoding technique. The first intra prediction mode is given a different priority than the second intra prediction mode. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method 2300 further includes performing a conversion based on the determination at operation 2304.
In some embodiments, the determination is part of a Most Probable Mode (MPM) construction process for deriving a current block of a list of MPM modes. In some embodiments, the first intra-prediction mode precedes the second intra-prediction mode in the MPM candidate list. In some embodiments, the first intra-prediction mode is located after the second intra-prediction mode in the MPM candidate list. In some embodiments, the encoding of the intra-prediction mode bypasses a Most Probable Mode (MPM) construction process for the current block. In some embodiments, the method further comprises determining an intra prediction mode of a subsequent block from the intra prediction mode of the current block, wherein the subsequent block is encoded using an intra prediction encoding technique and the current block is encoded using a CIIP encoding technique. In some embodiments, the determination is part of a Most Probable Mode (MPM) construction process for subsequent blocks. In some embodiments, during MPM construction of a subsequent block, the intra-prediction mode of the current block is given a lower priority than the intra-prediction mode of another neighboring block encoded using an intra-prediction encoding technique. In some embodiments, whether to determine the intra prediction mode of the subsequent block according to the intra prediction mode of the current block is based on whether a condition specifying a relationship between the subsequent block as the first block and the current block as the second block is satisfied. In some embodiments, the conditions include at least one of: (1) the first block and the second block are located in a same row of a Coding Tree Unit (CTU), (2) the first block and the second block are located in a same CTU, (3) the first block and the second block are in a same region, or (4) the first block and the second block are in a same row of a region. In some embodiments, the width of the region is the same as the height of the region. In some embodiments, the region has a size of 64 x 64.
In some embodiments, only a subset of the list of Most Probable Modes (MPMs) of the normal intra-coding technique is used for the current block. In some embodiments, the subset includes a single MPM mode in the list of MPM modes of the normal intra-coding technique. In some embodiments, the single MPM mode is the first MPM mode in the list. In some embodiments, the index indicating a single MPM mode is omitted in the bitstream. In some embodiments, the subset includes the first four MPM modes in the MPM mode list. In some embodiments, an index indicating the MPM mode in the subset is signaled in the bitstream. In some embodiments, the coding context used to encode the intra-coded block is reused to encode the current block. In some embodiments, the first MPM flag of the intra-coded block and the second MPM flag of the current block share the same coding context in the bitstream. In some embodiments, the intra prediction mode of the current block is selected from the MPM mode list regardless of the size of the current block. In some embodiments, the MPM construction process is enabled by default, and wherein a flag indicating the MPM construction process is omitted in the bitstream. In some embodiments, the current block does not require an MPM list construction process.
In some embodiments, luma prediction chroma mode is used to process chroma components of the current block. In some embodiments, the derived mode is used to process the chroma components of the current block. In some embodiments, multiple intra prediction modes are used to process the chroma components of the current block. In some embodiments, multiple intra prediction modes are used based on the color format of the chroma components. In some embodiments, where the color format is 4: 4: 4, the plurality of intra prediction modes are the same as the intra prediction mode of the luminance component of the current block. In some embodiments, each of four intra prediction modes, including planar mode, DC mode, vertical mode, and horizontal mode, is encoded using one or more bits. In some embodiments, four intra prediction modes are encoded using 00, 01, 10, and 11. In some embodiments, four intra prediction modes are encoded using 0, 10, 110, 111. In some embodiments, four intra prediction modes are encoded using 1, 01, 001, 000. In some embodiments, only a subset of the four intra prediction modes are available for use in the case that the width W and height H of the current block satisfy the condition. In some embodiments, where W > nxh, N being an integer, the subset includes a planar mode, a DC mode, and a vertical mode. In some embodiments, 1, 01, and 11 are used to encode the planar mode, the DC mode, and the vertical mode. In some embodiments, the planar mode, the DC mode, and the vertical mode are encoded using 0, 10, and 00. In some embodiments, where H > nxw, N being an integer, the subset includes a planar mode, a DC mode, and a horizontal mode. In some embodiments, the planar mode, the DC mode, and the horizontal mode are encoded using 1, 01, and 11. In some embodiments, the planar mode, the DC mode, and the horizontal mode are encoded using 0, 10, and 00. In some embodiments, N ═ 2. In some embodiments, only DC mode and planar mode are used for the current block. In some embodiments, an indicator indicating DC mode or planar mode is signaled in the bitstream.
Fig. 24 is a flowchart representation of a video processing method 2400 according to the present disclosure. The method 2400 includes, at operation 2402, determining, for a transition between a current block of video and a bitstream representation of the video, whether a Combined Inter and Intra Prediction (CIIP) process is applicable to a color component of the current block based on a size of the current block. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method 2400 further includes performing a conversion based on the determination at operation 2404.
In some embodiments, the color components include chroma components, and wherein in the case that the width of the current block is less than 4, the CIIP process is not performed on the chroma components. In some embodiments, the color components include chroma components, and wherein the CIIP process is not performed on the chroma components if the height of the current block is less than 4. In some embodiments, the intra prediction mode of the chroma component of the current block is different from the intra prediction mode of the luma component of the current block. In some embodiments, the chroma component predicts one of the chroma modes using DC mode, planar mode, or luma. In some embodiments, the intra prediction mode for the chroma component is determined based on a color format of the chroma component. In some embodiments, the color format includes 4: 2: 0 or 4: 4: 4.
Fig. 25 is a flowchart representation of a video processing method 2500 according to the present disclosure. The method 2500 includes, at operation 2502, determining, for a transition between a current block of video and a bitstream representation of the video, whether to apply a Combined Inter and Intra Prediction (CIIP) encoding technique to the current block based on characteristics of the current block. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The method 2500 also includes, at operation 2504, performing a conversion based on the determination.
In some embodiments, the characteristic includes a size of the current block having a width W and a height H, W and H being integers, and the inter-frame prediction encoding technique is disabled for the current block if the size of the block satisfies a condition. In some embodiments, the condition indicates that W is equal to T1, and T1 is an integer. In some embodiments, the condition indicates that H is equal to T1, and T1 is an integer. In some embodiments, T1 ═ 4. In some embodiments, T1 ═ 2. In some embodiments, the condition indicates that W is greater than T1 or H is greater than T1, T1 being an integer. In some embodiments, T1 is 64 or 128. In some embodiments, the condition indicates that W is equal to T1 and H is equal to T2, T1 and T2 are integers. In some embodiments, the condition indicates that W is equal to T2 and H is equal to T1, T1 and T2 are integers. In some embodiments, T1-4 and T2-16.
In some embodiments, the characteristic comprises an encoding technique applied to the current block, and the CIIP encoding technique is disabled for the current block if the encoding technique satisfies a condition. In some embodiments, the condition-indicating encoding technique is a bi-directional predictive encoding technique. In some embodiments, the bidirectional predictive coding Merge candidate is converted into a unidirectional predictive coding Merge candidate to allow the inter-frame intra-prediction coding technique to be applied to the current block. In some embodiments, the translated Merge candidates are associated with reference list 0 of the uni-directional predictive coding technique. In some embodiments, the translated Merge candidates are associated with reference List 1 of unidirectional predictive coding techniques. In some embodiments, the Merge candidates for uni-directional predictive coding of a block are selected only for the transition. In some embodiments, the bidirectionally predictive coded Merge candidates are discarded to determine Merge indices indicating Merge candidates in the bitstream representation. In some embodiments, an inter-frame intra-prediction encoding technique is applied to the current block according to the determination. In some embodiments, the Merge candidate list construction process for the delta prediction mode is used to derive the motion candidate list for the current block.
Fig. 26 is a flowchart representation of a video processing method 2600 in accordance with the present disclosure. The method 2600 includes, at operation 2602, determining, for a transition between a current block of video and a bitstream representation of the video, whether to disable encoding tools for the current block based on whether the current block is encoded in a Combined Inter and Intra Prediction (CIIP) encoding technique. CIIP encoding techniques use the inter and intra prediction values to derive a final prediction value for the current block. The encoding tool includes at least one of bi-directional optical flow (BDOF), Overlapped Block Motion Compensation (OBMC), or decoder-side motion vector refinement procedure (DMVR). The method 2500 also includes performing a conversion based on the determination at operation 2604.
In some embodiments, the intra-prediction process for the current block is different from the intra-prediction process for the second block encoded using intra-prediction encoding techniques. In some embodiments, filtering of neighboring samples is skipped during intra prediction of the current block. In some embodiments, during intra prediction of the current block, the location-dependent intra prediction sample filtering process is disabled. In some embodiments, during intra prediction of the current block, the multiple row intra prediction process is disabled. In some embodiments, in the intra prediction process of the current block, the wide-angle intra prediction process is disabled.
Fig. 27 is a flowchart representation of a method 2700 of video processing according to the present disclosure. The method 2700 includes, at operation 2702, determining, for a transition between a block of the video and a bitstream representation of the video, a first precision P1 of motion vectors for spatial motion prediction and a second precision P2 of motion vectors for temporal motion prediction. P1 and/or P2 are scores. P1 and P2 are different from each other. The method 2600 further includes performing the conversion based on the determination at operation 2704.
In some embodiments, the first precision is 1/16 luminance pixels and the second precision is 1/4 luminance pixels. In some embodiments, the first precision is 1/16 luminance pixels and the second precision is 1/8 luminance pixels. In some embodiments, the first precision is 1/8 luminance pixels and the second precision is 1/4 luminance pixels. In some embodiments, the first precision is 1/16 luminance pixels and the second precision is 1/4 luminance pixels. In some embodiments, the first precision is 1/16 luminance pixels and the second precision is 1/8 luminance pixels. In some embodiments, the first precision is 1/8 luminance pixels and the second precision is 1/4 luminance pixels. In some embodiments, at least one of the first or second accuracies is below 1/16 luminance pixels.
In some embodiments, at least one of the first or second accuracies is variable. In some embodiments, the first or second precision varies according to a profile, level, or hierarchy of the video. In some embodiments, the first precision or the second precision varies according to a temporal layer of a picture in the video. In some embodiments, the first precision or the second precision varies according to the size of the pictures in the video.
In some embodiments, at least one of the first or second precision is signaled in a video parameter set, a sequence parameter set, a picture parameter set, a slice header, a slice group header, a slice, a coding tree unit, or a coding unit in the bitstream representation. In some embodiments, the motion vector is represented as (MVx, MVy) and the precision of the motion vector is represented as (Px, Py), and wherein Px is associated with MVx and Py is associated with MVy. In some embodiments, Px and Py are variables according to the profile, level, or hierarchy of the video. In some embodiments, Px and Py vary according to the temporal layer of the pictures in the video. In some embodiments, Px and Py are variables that depend on the width of the pictures in the video. In some embodiments, Px and Py are signaled in a video parameter set, a sequence parameter set, a picture parameter set, a slice header, a slice group header, a slice, a coding tree unit, or a coding unit in the bitstream representation. In some embodiments, the decoded motion vector is denoted as (MVx, MVy), and wherein the motion vector is adjusted according to the second precision before storing the motion vector as a temporal motion prediction motion vector. In some embodiments, the temporal motion prediction motion vectors are adjusted to (Shift (MVx, P1-P2), Shift (MVy, P1-P2)), P1 and P2 are integers, and P1 ≧ P2, Shift represents a right Shift operation to an unsigned number. In some embodiments, the temporal motion prediction motion vectors are adjusted to (SignShift (MVx, P1-P2), SignShift (MVy, P1-P2)), P1 and P2 are integers and P1 ≧ P2, SignShift representing a right shift operation on a signed number. In some embodiments, the temporal motion prediction motion vectors are adjusted to (MVx < < (P1-P2)), MVy < < (P1-P2)), P1 and P2 are integers and P1 ≧ P2, < < denotes a left shift operation on a signed or unsigned number.
Fig. 28 is a flowchart representation of a method 2800 of video processing according to the present disclosure. The method 2800 includes, at operation 2802, determining a motion vector (MVx, MVy) having a precision (Px, Py) for a transition between a block of video and a bitstream representation of the video. Px is associated with MVx and Py is associated with MVy. MVx and MVy are represented by N bits, and MinX is not less than MVx and not more than MaxX, and MinY is not less than MVy and not more than MaxY, and MinX, MaxX, MinY and MaxY are real numbers. The method 2700 further includes, at operation 2804, performing the converting based on the determination.
In some embodiments, MinX — MinY. In some embodiments, MinX ≠ MinY. In some embodiments, MaxX ═ MaxY. In some embodiments, MaxX ≠ MaxY.
In some embodiments, at least one of MinX or MaxX is based on Px. In some embodiments, the motion vector has a precision noted as (Px, Py), and wherein at least one of MinY or MaxY is based on Py. In some embodiments, at least one of MinX, MaxX, MinY, or MaxY is based on N. In some embodiments, at least one of MinX, MaxX, MinY or MaxY of a spatial motion prediction motion vector is different from MinX, MaxX, MinY or MaxY of a corresponding temporal motion prediction motion vector. In some embodiments, at least one of MinX, MaxX, MinY or MaxY is a function of the profile, level or hierarchy of the video. In some embodiments, at least one of MinX, MaxX, MinY, or MaxY is a function of the temporal layer of the pictures in the video. In some embodiments, at least one of MinX, MaxX, MinY, or MaxY is a function of the size of the pictures in the video. In some embodiments, at least one of MinX, MaxX, MinY, or MaxY is signaled in a video parameter set, sequence parameter set, picture parameter set, slice header, slice group header, slice, coding tree unit, or coding unit in the bitstream representation. In some embodiments, MVx is clipped to [ MinX, MaxX ] before being used for spatial or temporal motion prediction. In some embodiments, MVy is clipped to [ MinY, MaxY ] before being used for spatial or temporal motion prediction.
Fig. 29 is a flowchart representation of a video processing method 2900 according to the present disclosure. The method 2900 includes, at operation 2902, determining, for a transition between a current block of video and a bitstream representation of the video, whether a shared Merge list applies to the current block according to an encoding mode of the current block. The method 2800 includes, at operation 2904, performing a conversion based on the determination.
In some embodiments, where the current block is encoded using conventional Merge mode, sharing the Merge list is not applicable. In some embodiments, the shared Merge list applies in case of encoding the current block using an Intra Block Copy (IBC) mode. In some embodiments, the method further comprises: maintaining a motion candidate table based on past conversions of the video and bitstream representations prior to performing the conversion; and disabling updating of the motion candidate table in a case where the current block is a subblock of a parent block of the applicable shared Merge list and the current block is encoded using a normal Merge mode after the conversion is performed.
Fig. 30 is a flowchart representation of a video processing method 3000 according to the present disclosure. The method 3000 includes, at operation 3002, for a transition between a current block of video of size W × H and a bitstream representation of the video, determining a second block of size (W + N-1) x (H + N-1) for motion compensation during the transition. The second block is determined based on a reference block of size (W + N-1-PW) x (H + N-1-PH). N denotes the filter size and W, H, N, PW and PH are non-negative integers. PW and PH are not both equal to 0. The method 3000 further includes, at operation 3004, performing a conversion based on the determination.
In some embodiments, pixels in the second block that are outside the reference block are determined by repeating one or more boundaries of the reference block. In some embodiments, PH is 0, and at least a left or right boundary of the reference block is repeated to generate the second block. In some embodiments, PW is 0, and at least an upper boundary or a lower boundary of the reference block is repeated to generate the second block. In some embodiments, PW >0 and PH >0, and the second block is generated by repeating at least the left or right boundary of the reference block and then repeating at least the upper or lower boundary of the reference block. In some embodiments, PW >0 and PH >0, and the second block is generated by repeating at least an upper or lower boundary of the reference block and then repeating at least a left or right boundary of the reference block.
In some embodiments, the left boundary of the reference block is repeated M1 times and the right boundary of the reference block is repeated (PW-M1) times, M1 being a positive integer. In some embodiments, the upper boundary of the reference block is repeated M2 times and the lower boundary of the reference block is repeated (PH-M2) times, M2 being a positive integer. In some embodiments, at least one of PW or PH is different for different color components of the current block, the color components comprising at least a luma component or one or more chroma components. In some embodiments, at least one of PW or PH varies depending on the size or shape of the current block. In some embodiments, at least one of the PW or PH varies according to an encoding characteristic of the current block, the encoding characteristic including unidirectional predictive encoding or bidirectional predictive encoding.
In some embodiments, pixels in the second block that are outside the reference block are set to a single value. In some embodiments, the single value is 1< (BD-1), which is the bit depth of a pixel sample in the reference block. In some embodiments, BD is 8 or 10. In some embodiments, a single value is derived based on pixel samples of a reference block. In some embodiments, the single value is signaled in a video parameter set, a sequence parameter set, a picture parameter set, a slice header, a slice group header, a slice, a coding tree unit line (row), a coding tree unit line (line), a coding unit, or a prediction unit. In some embodiments, in the case where the current block is affine encoded, padding of pixels in a second block located outside the reference block is disabled.
Fig. 31 is a flowchart representation of a video processing method 3000 according to the present disclosure. Method 3000 includes, for a transition between a current block of video of size WxH and a bitstream representation of the video, determining a second block of size (W + N-1) x (H + N-1) for motion compensation during the transition at operation 3102. W, H is a non-negative integer, and wherein N is a non-negative integer and is based on the filter size. During the conversion, a refined motion vector is determined based on a multi-point search according to a motion vector refinement operation on the original motion vector, and wherein pixels along the reference block boundary are determined by repeating one or more non-boundary pixels. Method 3100 also includes, at operation 3104, performing a transition based on the determination.
In some embodiments, processing the current block includes filtering the current block in a motion vector refinement operation. In some embodiments, whether the reference block is suitable for processing of the current block is determined based on the size of the current block. In some embodiments, the interpolating of the current block comprises: a plurality of sub-blocks of the current block are interpolated based on the second block. The size of each sub-block is W1 × H1, and W1, H1 are non-negative integers. In some embodiments, W1 ═ H1 ═ 4, W ═ H ═ 8, and PW ═ PH 0. In some embodiments, the second block is determined based entirely on an integer portion of a motion vector of at least one of the plurality of sub-blocks. In some embodiments, in a case where a maximum difference between integer parts of motion vectors of all of the plurality of sub-blocks is equal to or less than 1 pixel, the reference block is determined based on the integer part of the motion vector of the upper-left sub-block of the current block, and each of a right boundary and a lower boundary of the reference block is repeated once to obtain the second block. In some embodiments, in a case where a maximum difference between integer parts of motion vectors of all of the plurality of sub-blocks is equal to or less than 1 pixel, the reference block is determined based on the integer part of the motion vector of a lower-right sub-block of the current block, and each of a left boundary and an upper boundary of the reference block is repeated once to obtain the second block. In some embodiments, the second block is determined based entirely on the modified motion vector of one of the plurality of sub-blocks.
In some embodiments, in case the maximum difference between integer parts of the motion vectors of all the plurality of sub-blocks is equal to or less than two pixels, the motion vector of the upper left sub-block of the current block is modified by adding an integer pixel distance to each component to obtain a modified motion vector. A reference block is determined based on the modified motion vector, and each of a left boundary, a right boundary, an upper boundary, and a lower boundary of the reference block is repeated once to obtain a second block.
In some embodiments, in the case where the maximum difference between the integer parts of the motion vectors of all of the plurality of sub-blocks is equal to or less than two pixels, the motion vector of the lower-right sub-block of the current block is modified by subtracting an integer-pixel distance from each component to obtain a modified motion vector. A reference block is determined based on the modified motion vector, and each of a left boundary, a right boundary, an upper boundary, and a lower boundary of the reference block is repeated once to obtain a second block.
Fig. 32 is a flowchart representation of a video processing method 3200 according to the present disclosure. The method 3200 includes, at operation 3202, determining a prediction value at a location in a block based on a weighted sum of inter prediction values and intra prediction values at the location for a transition between the block of video and a bitstream representation of the video encoded using a Combined Inter and Intra Prediction (CIIP) encoding technique. The weighted sum is based on adding an offset to an initial sum obtained based on the inter prediction value and the intra prediction value, and the offset is added before performing a right shift operation to determine the weighted sum. The method 3200 also includes, at operation 3204, performing a transformation based on the determination.
In some embodiments, the location in the block is denoted as (x, y), the inter prediction value at location (x, y) is denoted as pin (x, y), the intra prediction value at location (x, y) is denoted as Pintra (x, y), the inter prediction weight at location (x, y) is denoted as w _ inter (x, y), and the intra prediction weight at location (x, y) is denoted as w _ intra (x, y). The predicted value of the position (x, y) is determined as (Pintra (x, y) × w _ intra (x, y) + pin (x, y) × w _ inter (x, y) + offset (x, y)) > > N, where w _ intra (x, y) + w _ inter (x, y) > 2^ N and offset (x, y) > 2^ (N-1), N is a positive integer. In some embodiments, N ═ 2.
In some embodiments, equal weights are used for the inter and intra prediction values at a location to determine a weighted sum. In some embodiments, a zero weight is used to determine the weighted sum according to the position in the block. In some embodiments, a zero weight is applied to the inter prediction value. In some embodiments, a zero weight is applied to the intra prediction value.
Fig. 33 is a flowchart representation of a video processing method 3300 according to the present disclosure. The method 3300 includes, at operation 3302, for a transition between a current block of the video and a bitstream representation of the video, determining a manner in which to represent encoding information of the current block in the bitstream representation based in part on whether a condition associated with a size of the current block is satisfied. The method 3300 further includes, at operation 3304, performing a conversion based on the determination.
In some embodiments, the coding information is signaled in the bitstream representation in case the condition is met. In some embodiments, the encoding information is omitted from the bitstream representation in case the condition is met. In some embodiments, at least one of an Intra Block Copy (IBC) encoding tool or an inter prediction encoding tool is disabled for the current block if the condition is satisfied. In some embodiments, the encoding information is omitted from the bitstream representation in the event that the IBC coding tools and the inter-prediction coding tools are disabled. In some embodiments, the coding information is signaled in the bitstream representation with IBC coding tools enabled and inter prediction tools disabled.
In some embodiments, the encoding information includes operating modes associated with the IBC tool and/or the inter-prediction encoding tool, the operating modes including at least a normal mode or a skip mode. In some embodiments, the encoding information comprises an indicator indicating use of a prediction mode associated with the encoding tool. In some embodiments, no coding tools are used in the case where coding information is omitted from the bitstream representation. In some embodiments, the encoding tool is used in the case where the encoding information is omitted from the bitstream representation. In some embodiments, the coding tools include Intra Block Copy (IBC) coding tools or inter prediction coding tools.
In some embodiments, one or more indicators indicating other one or more coding tools are signaled in the bitstream representation in the event that the indicators are omitted from the bitstream. In some embodiments, the other one or more coding tools include at least an intra-coding tool or a palette coding tool. In some embodiments, where the inter-prediction encoding tool is disabled and the IBC encoding tool is enabled, the indicator distinguishes between inter and intra modes, and the one or more indicators include a first indicator that indicates IBC mode and a second indicator that indicates palette mode signaled in the bitstream.
In some embodiments, the encoding information includes a third indicator for a skip mode, the skip mode being an inter skip mode or an IBC skip mode. In some embodiments, the third indicator for skip mode is signaled with the inter-prediction coding tools disabled and the IBC coding tools enabled. In some embodiments, in the case where the inter prediction encoding tool is disabled and the IBC encoding tool is enabled, the prediction mode is determined as the IBC mode although an indicator indicating use of the IBC mode is omitted in the bitstream representation, and wherein the skip mode is applied to the current block.
In some embodiments, the encoded information comprises a triangular pattern. In some embodiments, the coding information comprises inter prediction directions. In some embodiments, in case the current block is encoded using a unidirectional predictive coding tool, the coding information is omitted in the bitstream representation. In some embodiments, the encoding information includes an indicator indicating use of a Symmetric Motion Vector Difference (SMVD) method. In some embodiments, the current block is set to unidirectional prediction despite an indicator in the bitstream representation indicating the use of the SMVD method, in case the condition is satisfied. In some embodiments, only list 0 or list 1 associated with the uni-directional predictive coding tool is used in the motion compensation process, with the exclusion of an indicator from the bitstream representation indicating the use of the SMVD method.
In some embodiments, the condition indicates that the width of the current block is T1 and the height of the current block is T2, and T1 and T2 are positive integers. In some embodiments, the condition indicates that the width of the current block is T2 and the height of the current block is T1, and T1 and T2 are positive integers. In some embodiments, the condition indicates that the width of the current block is T1, and the height of the current block is less than or equal to T2, T1 and T2 being positive integers. In some embodiments, the condition indicates that the width of the current block is T2, and the height of the current block is less than or equal to T1, T1 and T2 being positive integers. In some embodiments, the condition indicates that the width of the current block is less than or equal to T1 and the height of the current block is less than or equal to T2, and T1 and T2 are positive integers. In some embodiments, the condition indicates that the width of the current block is greater than or equal to T1 and the height of the current block is greater than or equal to T2, and T1 and T2 are positive integers. In some embodiments, the condition indicates that the width of the current block is greater than or equal to T1 or the height of the current block is greater than or equal to T2, and T1 and T2 are positive integers. In some embodiments, T1-4 and T2-16. In some embodiments, T1-T2-8. In some embodiments, T1-8 and T2-4. In some embodiments, T1-T2-4. In some embodiments, T1-T2-32. In some embodiments, T1-T2-64. In some embodiments, T1-T2-128.
In some embodiments, the condition indicates that the size of the current block is 4x8, 8x4, 4x16, or 16x 4. In some embodiments, the condition indicates that the size of the current block is 4 × 8, 8 × 4. In some embodiments, the condition indicates that the size of the current block is 4xN or Nx4, N is a positive integer and N ≦ 16. In some embodiments, the condition indicates that the size of the current block is 8xN or Nx8, N is a positive integer and N ≦ 16. In some embodiments, the condition indicates that the color component of the current block includes less than or equal to N samples. In some embodiments, N-16. In some embodiments, N-32.
In some embodiments, the condition indicates that the color component of the current block has a width equal to the height. In some embodiments, the condition indicates that the size of the color component of the current block is 4x8, 8x4, 4x4, 4x16, or 16x 4. In some embodiments, the color component comprises a luminance component or one or more chrominance components. In some embodiments, if the condition indicates that the width of the current block is 4 and the height is 4, the inter prediction tool is disabled. In some embodiments, the IBC prediction tool is disabled if the condition indicates that the width of the current block is 4 and the height is 4.
In some embodiments, list 0 associated with the uni-directional predictive coding tool is used in the motion compensation process in the following cases: (1) the width of the current block is 4 and the height is 8, or (2) the width of the current block is 8 and the height is 4. The bi-directional predictive coding tool is disabled during motion compensation.
Fig. 34 is a flowchart representation of a video processing method 3400 according to the present disclosure. The method 3400 includes, at operation 3402, determining a modified set of motion vectors for a transition between a current block of video and a bitstream representation of the video. The method 3400 includes, at operation 3404, performing a conversion based on the modified set of motion vectors. Since the current block satisfies the condition, the modified set of motion vectors is a modified version of the set of motion vectors associated with the current block.
In some embodiments, the set of motion vectors is derived using one of a conventional Merge coding technique, a subblock temporal motion vector prediction coding technique (sbTMVP), or a motion vector difference Merge (mmvd) coding technique. In some embodiments, the set of motion vectors includes block vectors for an Intra Block Copy (IBC) encoding technique. In some embodiments, the condition specifies that the size of the luma component of the current block is the same as a predefined size. In some embodiments, the predefined dimensions include at least one of 8x4 or 4x 8.
In some embodiments, modifying the set of motion vectors comprises, in the event that the motion information for the current block is determined to be bi-directional and comprises a first motion vector referencing a first reference picture in a first reference picture list for a first prediction direction and a second motion vector referencing a second reference picture in a second reference picture list for a second prediction direction, converting the set of motion vectors to uni-directional motion vectors. In some embodiments, the motion information of the current block is derived based on motion information of neighboring blocks. In some embodiments, the method includes discarding information of one of the first prediction direction or the second prediction direction. In some embodiments, discarding information comprises modifying the discarded motion vector in the prediction direction to (0, 0). In some embodiments, discarding information comprises modifying a reference index in the discarded prediction direction to-1. In some embodiments, discarding information comprises modifying the discarded prediction direction to another prediction direction.
In some embodiments, the method further comprises deriving a new motion candidate based on the information of the first prediction direction and the information of the second prediction direction. In some embodiments, deriving the new motion candidate comprises determining a motion vector of the new motion candidate based on an average of a first motion vector in the first prediction direction and a second motion vector in the second prediction direction. In some embodiments, deriving the new motion candidate comprises determining a scaled motion vector by scaling the first motion vector in the first prediction direction in accordance with the information of the second prediction direction; and determining a motion vector of the new motion candidate based on an average of the scaled motion vector and a second motion vector in a second prediction direction.
In some embodiments, the first reference picture list is list 0 and the second reference picture list is list 1. In some implementations, the first reference picture list is list 1 and the second reference picture list is list 0. The method includes updating a motion candidate table determined based on a past conversion using the converted motion vector. In some embodiments, the table comprises a Historical Motion Vector Prediction (HMVP) table.
In some embodiments, the method comprises updating the motion candidate table determined based on past transformations using information of the first prediction direction and the second prediction direction before modifying the set of motion vectors. In some embodiments, the converted motion vector is used for a motion compensation operation for a subsequent block. In some embodiments, the converted motion vector is used to predict the motion vector of the subsequent block.
In some embodiments, information of the first prediction direction and the second prediction direction before the modification of the motion vector set is used for a motion compensation operation of a subsequent block. In some embodiments, information prior to modifying the set of motion vectors is used to predict the motion vectors of subsequent blocks. In some embodiments, the converted motion vector is used for a motion refinement operation of the current block. In some embodiments, information of a first prediction direction and a second prediction direction before modifying the motion vector set is used for a refinement operation of the current block. In some embodiments, the refinement operation comprises an optical flow encoding operation comprising at least a bi-directional optical flow (BDOF) operation or a prediction with optical flow refinement (PROF) operation.
In some embodiments, the method includes generating an inter-prediction block based on the information of the first prediction direction and the information of the second prediction direction, refining the inter-prediction block, and determining a final prediction block based on one of the inter-prediction blocks.
Fig. 35 is a flow chart representing a method 3500 for video processing according to the present disclosure. The method 3500 includes, at operation 3502, for a transition between a current block of video and a bitstream representation of the video, determining a uni-directional motion vector from the bi-directional motion vector if a block size is satisfied. The uni-directional motion vector is then used as a Merge candidate for the transform. The method 3500 further includes, at operation 3504, performing the conversion in accordance with the determination.
In some embodiments, the converted uni-directional motion vector is used as a base Merge candidate during motion vector difference Merge (mmvd) encoding. In some embodiments, wherein said converted unidirectional motion vector is subsequently inserted into a Merge list. In some embodiments, the unidirectional motion vector is converted based on a prediction direction of the bidirectional motion vector. In some embodiments, the unidirectional motion vector is converted based on only one prediction direction, and wherein the only one prediction direction is associated with reference picture list 0. In some embodiments, where the unit of video data in which the current block is located is bi-predictive, the only one prediction direction of a first candidate in the candidate list is associated with reference picture list 0 and the only one prediction direction of a second candidate in the candidate list is associated with reference picture list 1. The candidate list comprises a Merge candidate list or an MMVD basic candidate list.
In some embodiments, the unit of video data comprises a current slice, group of pictures, or picture of the video. In some embodiments, the unidirectional motion vector is determined based on the reference picture list 1 in a case where all reference pictures of the video data unit are past pictures according to a display order. In some embodiments, the unidirectional motion vector is determined based on the reference picture list 0 in a case where at least a first reference picture of reference pictures of the video data unit is a past picture in display order and at least a second reference picture of the reference pictures of the video data unit is a future picture according to display order. In some embodiments, unidirectional motion vectors determined based on different reference picture lists are placed in the Merge list in an interleaved manner.
In some embodiments, the unidirectional motion vector is determined based on a low-latency check indicator. In some embodiments, the unidirectional motion vector is determined prior to a motion compensation process during the transition. In some embodiments, the unidirectional motion vector is determined after a motion candidate list construction process during the conversion. In some embodiments, the unidirectional motion vector is determined before adding the motion vector difference in a motion vector difference Merge (MMVD) encoding process.
In some embodiments, the condition of the block size indicates that the width of the current block is T1, and the height of the current block is T2, and T1 and T2 are positive integers. In some embodiments, the condition of the block size indicates that the width of the current block is T2, and the height of the current block is T1, and T1 and T2 are positive integers. In some embodiments, the condition of the block size indicates that the width of the current block is T1, and the height of the current block is less than or equal to T2, and T1 and T2 are positive integers. In some embodiments, the condition of the block size indicates that the width of the current block is T2, and the height of the current block is less than or equal to T1, and T1 and T2 are positive integers.
Fig. 36 is a flow diagram representing a method 3600 for video processing in accordance with the present disclosure. The method 3600 includes, at operation 3602, for a transition between a current block of a video and a bitstream representation of the video, determining, based on a size of the current block, that a motion candidate for the current block is restricted to a uni-directional prediction candidate. The method 3600 further includes, at operation 3604, performing the transformation based on the determination.
In some embodiments, all motion candidates for the current block are restricted to uni-directional prediction candidates if the size of the current block satisfies a condition. In some embodiments, a bi-directional motion candidate is converted into the uni-directional prediction candidate if a size of the current block satisfies a condition. In some embodiments, the size of the current block indicates that the width of the current block is T1, and the height of the current block is T2, and T1 and T2 are positive integers. In some embodiments, the condition of the block size indicates that the width of the current block is T2, and the height of the current block is T1, and T1 and T2 are positive integers. In some embodiments, the size of the current block indicates that the width of the current block is T1, and the height of the current block is less than or equal to T2, T1 and T2 being positive integers. In some embodiments, the size of the current block indicates that the width of the current block is T2, and the height of the current block is less than or equal to T1, T1 and T2 being positive integers. In some embodiments, T1 is equal to 4, and T2 is equal to 8. In some embodiments, T1 is equal to 4, and T2 is equal to 4. In some embodiments, T1 is equal to 8, and T2 is equal to 8.
Fig. 37 is a flow diagram representing a method 3700 for video processing in accordance with the present disclosure. The method 3700 includes, at operation 3702, determining a uni-directional motion vector from a bi-directional motion vector for a transition between a current block of video and a bitstream representation of the video, if a block size is satisfied. A decoder-side motion vector refinement (DMVR) process is then disabled based on determining to use the unidirectional motion vector during the transition. The method 3700 includes, at operation 3704, performing the transformation based on the determination.
In some embodiments, the unidirectional motion vector is determined prior to a sample point refinement process. In some embodiments, the DMVR process derives refined motion vectors based on a predicted sample cost calculation. In some embodiments, the predicted sample cost calculation calculates a Sum of Absolute Differences (SAD) between predicted samples.
In some embodiments, the unidirectional motion vector is determined prior to a bidirectional optical flow (BDOF) process. In some embodiments, the BDOF process is subsequently disabled during the transition based on the unidirectional motion vector. In some embodiments, the BDOF process derives refined predicted samples based on predicted sample gradient calculations. In some embodiments, the unidirectional motion vector is determined prior to a Predictive Refinement (PROF) process with optical flow.
In some embodiments, performing the conversion comprises generating the bitstream representation based on a current block of the video. In some embodiments, performing the conversion includes generating a current block of the video from the bitstream representation.
The disclosure and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine readable storage device, a machine readable storage substrate, a memory device, a composition of matter effecting a machine readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or groups of computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or claims, but rather as descriptions of features of particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claim combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Also, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples have been described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.
Claims (40)
1. A video processing method, comprising:
for a transition between a current block of video and a bitstream representation of the video, determining a uni-directional motion vector from a bi-directional motion vector if a condition of block size is satisfied, wherein a decoder-side motion vector refinement (DMVR) process is subsequently disabled based on the determination to use the uni-directional motion vector during the transition; and
performing the conversion based on the determination.
2. The method of claim 1, wherein the unidirectional motion vector is determined prior to a sample refinement process.
3. The method of claim 1, wherein the DMVR process derives refined motion vectors based on a predicted sample cost calculation.
4. The method of claim 3, wherein the predicted sample cost calculation calculates a Sum of Absolute Differences (SAD) between predicted samples.
5. The method of any one or more of claims 1-4, wherein the unidirectional motion vectors are determined prior to a bidirectional optical flow (BDOF) process.
6. The method of claim 5, wherein the BDOF process is subsequently disabled during the transition based on the unidirectional motion vector.
7. The method of claim 6, wherein the BDOF process derives refined predicted samples based on a predicted sample gradient calculation.
8. The method of any one or more of claims 1-7, wherein the unidirectional motion vectors are determined prior to a predictive refinement with optical flow (PROF) process.
9. A video processing method, comprising:
for a transition between a current block of video and a bitstream representation of the video, determining a unidirectional motion vector from a bidirectional motion vector if a condition of block size is satisfied, wherein the unidirectional motion vector is subsequently used as a Merge candidate for the transition; and
performing the conversion based on the determination.
10. The method of claim 9, wherein the converted uni-directional motion vector is used as a basic Merge candidate in a motion vector difference (mmvd) encoding process.
11. The method of claim 9, wherein the converted unidirectional motion vector is subsequently inserted into a Merge list.
12. The method of any one or more of claims 9-11, wherein the unidirectional motion vector is converted based on one prediction direction of the bidirectional motion vector.
13. The method of claim 12, wherein the unidirectional motion vector is converted based on only one prediction direction, and wherein the only one prediction direction is associated with reference picture list 0.
14. The method of claim 12, wherein, in a case that the unit of video data in which the current block is located is bi-predicted, the only one prediction direction of a first candidate in a candidate list is associated with reference picture list 0 and the only one prediction direction of a second candidate in the candidate list is associated with reference picture list 1, wherein the candidate list comprises a Merge candidate list or an MMVD basic candidate list.
15. The method of claim 14, wherein the unit of video data comprises a current slice, a group of pictures, or a picture of the video.
16. The method of claim 14 or 15, wherein the unidirectional motion vector is determined based on the reference picture list 1 in case all reference pictures of the video data unit are past pictures according to a display order.
17. The method of claim 14 or 15, wherein the unidirectional motion vector is determined based on the reference picture list 0 if at least a first reference picture of reference pictures of the video data unit is a past picture and at least a second reference picture of the reference pictures of the video data unit is a future picture according to a display order.
18. The method of any one or more of claims 14 to 17, wherein unidirectional motion vectors determined based on different reference picture lists are placed in the Merge list in an interleaved manner.
19. The method of any one or more of claims 9-18, wherein the unidirectional motion vector is determined based on a low latency check indicator.
20. A method as claimed in any one or more of claims 9 to 19, wherein the unidirectional motion vector is determined prior to a motion compensation process during the transition.
21. The method of any one or more of claims 9-19, wherein the unidirectional motion vector is determined after a motion candidate list construction process during the transition.
22. The method of any one or more of claims 9 to 19, wherein the unidirectional motion vector is determined prior to adding a motion vector difference during motion vector difference Merge (MMVD) encoding.
23. The method of any one or more of claims 9 to 22, wherein the condition of the block size indicates that the width of the current block is T1 and the height of the current block is T2, with T1 and T2 being positive integers.
24. The method of any one or more of claims 9 to 22, wherein the condition of the block size indicates that the width of the current block is T2 and the height of the current block is T1, with T1 and T2 being positive integers.
25. The method of any one or more of claims 9 to 22, wherein the condition of the block size indicates that the width of the current block is T1, and the height of the current block is less than or equal to T2, T1 and T2 being positive integers.
26. The method of any one or more of claims 9 to 22, wherein the condition of the block size indicates that the width of the current block is T2, and the height of the current block is less than or equal to T1, T1 and T2 being positive integers.
27. A video processing method, comprising:
determining, for a transition between a current block of video and a bitstream representation of the video, that a motion candidate for the current block is restricted to a uni-directional prediction candidate based on a size of the current block; and
performing the conversion based on the determination.
28. The method of claim 27, wherein all motion candidates of the current block are restricted to uni-directional prediction candidates if the size of the current block satisfies a condition.
29. The method of claim 27, wherein a bi-directional motion candidate is converted into the uni-directional prediction candidate if the size of the current block satisfies a condition.
30. The method of any one or more of claims 27 to 29, wherein the size of the current block indicates that the current block has a width of T1 and a height of T2, T1, and T2 being positive integers.
31. The method of any one or more of claims 27 to 29, wherein the condition of the block size indicates that the width of the current block is T2 and the height of the current block is T1, T1 and T2 being positive integers.
32. The method of any one or more of claims 27 to 29, wherein the size of the current block indicates that the current block has a width of T1 and a height of less than or equal to T2, T1 and T2 being positive integers.
33. The method of any one or more of claims 27 to 29, wherein the size of the current block indicates that the current block has a width of T2 and a height of less than or equal to T1, T1 and T2 being positive integers.
34. The method of any of claims 23-26 and 30-33, wherein T1 is equal to 4 and T2 is equal to 8.
35. The method of any of claims 23-26 and 30-33, wherein T1 is equal to 4 and T2 is equal to 4.
36. The method of any of claims 23-26 and 30-33, wherein T1 is equal to 8 and T2 is equal to 8.
37. The method of any one or more of claims 1-36, wherein performing the transformation comprises generating the bitstream representation based on a current block of the video.
38. The method of any one or more of claims 1-36, wherein performing the transformation comprises generating a current block of the video from the bitstream representation.
39. A video processing apparatus comprising a processor configured to perform the method of one or more of claims 1 to 38.
40. A computer-readable medium having code stored thereon, which when executed by a processor causes the processor to implement the method of any one or more of claims 1 to 38.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2019079397 | 2019-03-24 | ||
CNPCT/CN2019/079397 | 2019-03-24 | ||
PCT/CN2020/080824 WO2020192643A1 (en) | 2019-03-24 | 2020-03-24 | Derivation of converted uni-prediction candidate |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113632477A true CN113632477A (en) | 2021-11-09 |
Family
ID=72609720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202080023944.4A Pending CN113632477A (en) | 2019-03-24 | 2020-03-24 | Derivation of transformed uni-directional prediction candidates |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113632477A (en) |
WO (1) | WO2020192643A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980030414A (en) * | 1996-10-29 | 1998-07-25 | 김광호 | Moving picture decoding apparatus based on compulsory one-way motion compensation |
CN101711481A (en) * | 2006-10-18 | 2010-05-19 | 汤姆森特许公司 | Method and apparatus for video coding using prediction data refinement |
US20130202037A1 (en) * | 2012-02-08 | 2013-08-08 | Xianglin Wang | Restriction of prediction units in b slices to uni-directional inter prediction |
CN107113424A (en) * | 2014-11-18 | 2017-08-29 | 联发科技股份有限公司 | Bidirectional predictive video coding method based on the motion vector from single directional prediction and merging candidate |
-
2020
- 2020-03-24 CN CN202080023944.4A patent/CN113632477A/en active Pending
- 2020-03-24 WO PCT/CN2020/080824 patent/WO2020192643A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980030414A (en) * | 1996-10-29 | 1998-07-25 | 김광호 | Moving picture decoding apparatus based on compulsory one-way motion compensation |
CN101711481A (en) * | 2006-10-18 | 2010-05-19 | 汤姆森特许公司 | Method and apparatus for video coding using prediction data refinement |
US20130202037A1 (en) * | 2012-02-08 | 2013-08-08 | Xianglin Wang | Restriction of prediction units in b slices to uni-directional inter prediction |
CN104094605A (en) * | 2012-02-08 | 2014-10-08 | 高通股份有限公司 | Restriction of prediction units in b slices to uni-directional inter prediction |
CN107113424A (en) * | 2014-11-18 | 2017-08-29 | 联发科技股份有限公司 | Bidirectional predictive video coding method based on the motion vector from single directional prediction and merging candidate |
Non-Patent Citations (1)
Title |
---|
XIU, XIAOYU等: "CE9.1.3: Complexity reduction on decoder-side motion vector refinement (DMVR)", 《JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 11TH MEETING: LJUBLJANA, SI, 10–18 JULY 2018,DOCUMENT: JVET-K0342》, pages 1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2020192643A1 (en) | 2020-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111448797B (en) | Reference size for inter-prediction interpolation | |
CN111436228B (en) | Simplification of combined inter-intra prediction | |
CN113545065B (en) | Use of converted uni-directional prediction candidates | |
CN113545038B (en) | Size dependent inter-frame coding | |
CN113632477A (en) | Derivation of transformed uni-directional prediction candidates | |
CN118870018A (en) | Bandwidth control method for inter-frame prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |