CN110876064A - Partially interleaved prediction - Google Patents

Partially interleaved prediction Download PDF

Info

Publication number
CN110876064A
CN110876064A CN201910828124.1A CN201910828124A CN110876064A CN 110876064 A CN110876064 A CN 110876064A CN 201910828124 A CN201910828124 A CN 201910828124A CN 110876064 A CN110876064 A CN 110876064A
Authority
CN
China
Prior art keywords
block
sub
prediction
blocks
current block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910828124.1A
Other languages
Chinese (zh)
Other versions
CN110876064B (en
Inventor
张凯
张莉
刘鸿彬
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Publication of CN110876064A publication Critical patent/CN110876064A/en
Application granted granted Critical
Publication of CN110876064B publication Critical patent/CN110876064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/583Motion compensation with overlapping blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Methods, systems, and apparatus related to sub-block based motion prediction in video coding are described. In one representative aspect, a method of video processing includes, during a transition between a current block and an encoded representation of the current block, determining a prediction block for the current block. The prediction block includes a first portion and a second portion. The second portion corresponds to a weighted combination of a first inter-prediction block in which the current block is subdivided into sub-blocks using the first pattern and a second inter-prediction block in which the current block is subdivided into sub-blocks using the second pattern. The method also includes generating a current block from the first partition and the second partition.

Description

Partially interleaved prediction
Cross Reference to Related Applications
The present application claims in time the priority and benefits of international patent application No. PCT/CN2018/103770 filed on 3.9.2018, international patent application No. PCT/CN2018/104984 filed on 11.9.2018, and international patent application No. PCT/CN2019/070058 filed on 2.1.2019, according to applicable patent laws and/or according to rules of paris convention. The entire contents of International patent application No. PCT/CN2018/103770, International patent application No. PCT/CN2018/104984, and International patent application No. PCT/CN2019/070058 are incorporated by reference as part of the disclosure of this patent document for all purposes.
Technical Field
This patent document relates to video encoding and decoding techniques, apparatuses, and systems.
Background
Motion Compensation (MC) is a technique in video processing for predicting frames in video given previous and/or future frames by considering the motion of the camera and/or objects in the video. Motion compensation may be used for encoding of video data for video compression.
Disclosure of Invention
This document discloses methods, systems, and apparatus relating to sub-block based motion prediction in video motion compensation.
In one representative aspect, a method of video processing is disclosed. The method includes, during a transition between a current block and an encoded representation of the current block, determining a prediction block for the current block. The prediction block includes a first portion and a second portion. The second portion corresponds to a weighted combination of a first inter-prediction block in which the current block is subdivided into sub-blocks using the first pattern and a second inter-prediction block in which the current block is subdivided into sub-blocks using the second pattern. The method also includes generating a current block from the first partition and the second partition.
In another representative aspect, a method of video processing is disclosed. The method comprises the following steps of,
a prediction block for the current block is generated. The prediction block includes a first portion and a second portion. The second portion corresponds to a weighted combination of a first inter-prediction block in which the current block is subdivided into sub-blocks using the first pattern and a second inter-prediction block in which the current block is subdivided into sub-blocks using the second pattern. The method also includes converting the prediction block into an encoded representation in a bitstream.
In another representative aspect, a method for improving bandwidth usage and prediction accuracy of a block-based motion prediction video system is disclosed. The method includes selecting a set of pixels from a video frame to form a block, subdividing the block into a first set of sub-blocks according to a first pattern, generating a first intermediate prediction block based on the first set of sub-blocks, subdividing the block into a second set of sub-blocks according to a second pattern, generating a second intermediate prediction block based on the second set of sub-blocks, and determining a prediction block based on the first intermediate prediction block and the second intermediate prediction block. At least one sub-block in the second set has a different size than the sub-blocks in the first set.
In another representative aspect, a method for improving block-based motion prediction in a video system is disclosed. The method includes selecting a set of pixels from a video frame to form a block, subdividing the block into a plurality of sub-blocks based on a size of the block or information from another block spatially or temporally adjacent to the block, and generating a motion vector prediction by applying an encoding algorithm to the plurality of sub-blocks. At least one of the plurality of sub-blocks has a different size than the other sub-blocks.
In another representative aspect, an apparatus is disclosed that includes a processor and a non-transitory memory having instructions thereon. The instructions, when executed by a processor, cause the processor to select a set of pixels from a video frame to form a block, subdivide the block into a first set of sub-blocks according to a first pattern, generate a first intermediate prediction block based on the first set of sub-blocks, subdivide the block into a second set of sub-blocks according to a second pattern, wherein at least one sub-block in the second set has a different size than the sub-blocks in the first set, generate a second intermediate prediction block based on the second set of sub-blocks, and determine the prediction block based on the first intermediate prediction block and the second intermediate prediction block.
In yet another representative aspect, a method of video processing includes deriving one or more motion vectors for a first set of sub-blocks of a current video block, wherein each of the first set of sub-blocks has a first subdivision pattern, and reconstructing the current video block based on the one or more motion vectors.
In yet another representative aspect, the various techniques described herein may be implemented as a computer program product stored on a non-transitory computer readable medium. The computer program product comprises program code for performing the methods described herein.
In yet another representative aspect, a video decoder device may implement a method as described herein.
The details of one or more implementations are set forth in the accompanying drawings, the drawings, and the description below. Other features will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a diagram illustrating an example of sub-block based prediction.
Fig. 2 shows an example of an affine motion field of a block described by two control point motion vectors.
Fig. 3 shows an example of an affine motion vector field for each sub-block of a block.
Fig. 4 shows an example of motion vector prediction of a block 400 in AF _ INTER mode.
Fig. 5A shows an example of the selection order of candidate blocks of a current Coding Unit (CU).
Fig. 5B shows another example of a candidate block of the current CU in the AF _ MERGE mode.
Fig. 6 shows an example of an Alternative Temporal Motion Vector Prediction (ATMVP) motion prediction process for a CU.
Fig. 7 shows an example of one CU with four sub-blocks and neighboring blocks.
FIG. 8 illustrates an exemplary optical flow trace in a bi-directional optical flow (BIO) method.
Fig. 9A shows an example of access locations outside of a block.
FIG. 9B shows a padding area (padding area) that may be used to avoid additional memory accesses and computations.
Fig. 10 illustrates an example of bilateral matching used in a Frame Rate Up Conversion (FRUC) method.
Fig. 11 illustrates an example of template matching used in the FRUC method.
Fig. 12 shows an example of unilateral Motion Estimation (ME) in the FRUC method.
Fig. 13 illustrates an example of interleaved prediction with two subdivision patterns in accordance with the disclosed technique.
Fig. 14A illustrates an exemplary subdivision pattern of a block into 4x4 sub-blocks in accordance with the disclosed technique.
Fig. 14B illustrates an exemplary subdivision pattern of a block into 8 x 8 sub-blocks in accordance with the disclosed techniques.
Fig. 14C illustrates an exemplary subdivision pattern of a block into 4x 8 sub-blocks in accordance with the disclosed technique.
Fig. 14D illustrates an exemplary subdivision pattern of a block into 8 x4 sub-blocks in accordance with the disclosed technique.
Fig. 14E illustrates an example subdivision pattern for subdividing a block into non-uniform sub-blocks in accordance with the disclosed technique.
Fig. 14F illustrates another example subdivision pattern for subdividing a block into non-uniform sub-blocks in accordance with the disclosed techniques.
Fig. 14G illustrates yet another example subdivision pattern for subdividing a block into non-uniform sub-blocks in accordance with the disclosed techniques.
Fig. 15A to 15D show exemplary embodiments of partial interleaving prediction.
Fig. 16A to 16C show exemplary embodiments of deriving MVs of one subdivision pattern from another subdivision pattern.
Fig. 17A to 17C illustrate exemplary embodiments of selecting a subdivision pattern based on the dimensions of a current video block.
Fig. 18A and 18B illustrate exemplary embodiments of deriving MVs of sub-blocks in one component within a subdivision pattern from MVs of another component of sub-blocks within another subdivision pattern.
Fig. 19 is an exemplary flow diagram of a method for improving bandwidth usage and prediction accuracy of a block-based motion prediction video system.
Fig. 20 is another exemplary flow diagram of a method for improving bandwidth usage and prediction accuracy for a block-based motion prediction video system.
FIG. 21 is a block diagram of a video processing device that may be used to implement embodiments of the disclosed technology.
Fig. 22 is an exemplary flow diagram of a method for video processing in accordance with the present technology.
Fig. 23 is an exemplary flow diagram of a method for video processing in accordance with the present technology.
Fig. 24 is a block diagram of an exemplary video processing system in which the disclosed techniques may be implemented.
Detailed Description
Global motion compensation is one of many variations of motion compensation techniques and can be used to predict the motion of a camera. However, moving objects within a frame are not adequately represented by various implementations of global motion compensation. Local motion estimation (such as block motion compensation) that subdivides a frame into blocks of pixels for motion prediction may be used to account for objects that are moving within the frame.
Sub-block based prediction developed based on block motion compensation was first introduced into the video coding standard by High Efficiency Video Coding (HEVC) Annex I (3D-HEVC). Fig. 1 is a diagram illustrating an example of sub-block based prediction. In the case of sub-block based prediction, a block 100, such as a Coding Unit (CU) or a Prediction Unit (PU), is subdivided into several non-overlapping sub-blocks 101. Different sub-blocks may be allocated different motion information such as reference indices or Motion Vectors (MVs). Motion compensation is then performed separately for each sub-block.
In order to explore future video coding techniques beyond HEVC, the joint video exploration group (jfet) was established by the Video Coding Experts Group (VCEG) and the Moving Picture Experts Group (MPEG) in 2015 jointly. Many methods have been adopted by jfet and added to reference software named Joint Exploration Model (JEM). In JEM, sub-block based prediction is employed in several coding techniques, such as affine prediction, optional temporal motion vector prediction (ATMVP), spatial-temporal motion vector prediction (STMVP), bi-directional optical flow (BIO), and frame rate up-conversion (FRUC), discussed in detail below.
Affine prediction
In HEVC, only the translational motion model is applied to Motion Compensated Prediction (MCP). However, the camera and object may have many types of motion, such as zoom in/out, rotation, perspective motion, and/or other unusual motion. JEM, on the other hand, applies a simplified affine transform motion compensated prediction. FIG. 2 shows a motion vector V from two control points0And V1An example of an affine motion field of block 200 is described. The Motion Vector Field (MVF) of block 200 may be described by the following equation:
Figure BDA0002189776630000051
as shown in fig. 2, (v)0x,v0y) Is the movement of the control point in the upper left cornerVector, and (v)1x,v1y) Is the motion vector of the upper right hand corner control point. To simplify motion compensated prediction, sub-block based affine transform prediction may be applied. The subblock size M × N is derived as follows:
Figure BDA0002189776630000052
here, MvPre is the motion vector score precision (e.g., 1/16 in JEM). (v)2x,v2y) Is the motion vector of the lower left control point calculated according to equation (1). If desired, M and N can be adjusted downward to be divisors of w and h, respectively.
Fig. 3 shows an example of affine MVF for each sub-block of block 300. To derive the motion vector for each M × N sub-block, the motion vector for the center sample of each sub-block may be calculated according to equation (1) and rounded to motion vector fractional precision (e.g., 1/16 in JEM). A motion compensated interpolation filter may then be applied to generate a prediction for each sub-block using the derived motion vectors. After MCP, the high precision motion vector of each sub-block is rounded and saved to the same precision as the normal motion vector.
In JEM, there are two affine motion patterns: AF _ INTER mode and AF _ MERGE mode. For CUs with both width and height larger than 8, the AF _ INTER mode may be applied. An affine flag in the CU level is signaled in the bitstream to indicate whether AF _ INTER mode is used. In AF _ INTER mode, neighboring blocks are used to construct pairs of motion vectors { (v)0,v1)|v0={vA,vB,vc},v1={vD,vE} of the candidate list. Fig. 4 shows an example of Motion Vector Prediction (MVP) of a block 400 in AF _ INTER mode. As shown in FIG. 4, v is selected from the motion vector of sub-block A, B or C0. The motion vectors from the neighboring blocks may be scaled according to the reference list. The motion vector may also be scaled according to a relationship between a reference Picture Order Count (POC) of the neighboring block, a reference POC of the current CU, and a POC of the current CU. Selecting from neighboring sub-blocks D and Ev1The scheme of (a) is similar. If the number of candidate lists is less than 2, the list is populated by pairs of motion vectors formed by copying each of the AMVP candidates. When the candidate list is greater than 2, the candidates may first be classified according to neighboring motion vectors (e.g., based on the similarity of two motion vectors in a pair of candidates). In some implementations, the first two candidates are retained. In some embodiments, a rate-distortion (RD) cost check is used to determine which motion vector pair candidate to select as the Control Point Motion Vector Predictor (CPMVP) for the current CU. An index indicating the position of CPMVP in the candidate list may be signaled in the bitstream. After the CPMVP of the current affine CU is determined, affine motion estimation is applied and the Control Point Motion Vectors (CPMVs) are found. The difference between CPMV and CPMVP is then signaled in the bitstream.
When a CU is applied in AF _ MERGE mode, it gets the first block encoded in affine mode from the valid neighboring reconstructed blocks. Fig. 5A shows an example of the selection order of candidate blocks of the current CU 500. As shown in fig. 5A, the selection order may be from left (501), up (502), top right (503), bottom left (504) to top left (505) of the current CU 500. Fig. 5B shows another example of a candidate block of the current CU 500 in the AF _ MERGE mode. If the neighboring lower-left block 501 is encoded in affine mode, as shown in FIG. 5B, then the motion vectors v for the upper-left, upper-right, and lower-left corners of the CU containing sub-block 501 are derived2、v3And v4. Based on v2、v3And v4Calculating motion vector v of the top left corner on current CU 5000. The motion vector v at the top right of the current CU is calculated accordingly1. In the current CU v0And v1After the CPMV is calculated according to the affine motion model in equation (1), the MVF of the current CU can be generated. To identify whether the current CU is encoded in AF _ MERGE mode, an affine flag may be signaled in the bitstream when there is at least one neighboring block encoded in affine mode.
Optional temporal motion vector prediction (ATMVP)
In the ATMVP method, the Temporal Motion Vector Prediction (TMVP) method is modified by retrieving multiple sets of motion information (including motion vectors and reference indices) from a block smaller than the current CU.
Fig. 6 shows an example of the ATMVP motion prediction process for CU 600. The ATMVP method predicts the motion vector of sub-CU 601 within CU600 in two steps. The first step is to identify a corresponding block 651 in the reference picture 650 with a temporal vector. The reference picture 650 is also referred to as a motion source picture. The second step is to partition the current CU600 into sub-CUs 601 and obtain the motion vector and reference index of each sub-CU from the block corresponding to each sub-CU.
In the first step, the reference picture 650 and the corresponding block are determined from motion information of spatial neighboring blocks of the current CU 600. To avoid the repeated scanning process of the neighboring blocks, the first MERGE candidate in the MERGE candidate list of the current CU600 is used. The first available motion vector and its associated reference index are set as the indices of the temporal vector and the motion source picture. In this way, the corresponding block can be identified more accurately than the TMVP, where the corresponding block (sometimes referred to as a collocated block) is always in the lower right or central position relative to the current CU.
In a second step, the corresponding block of sub-CU 651 is identified by a temporal vector in the motion source picture 650 by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (e.g., the smallest motion mesh covering the center sample) is used to derive the motion information of the sub-CU. After identifying the motion information of the corresponding nxn block, it is converted into a motion vector and reference index of the current sub-CU in the same way as the TMVP of HEVC, where motion scaling and other processes apply. For example, the decoder checks whether a low delay condition is met (e.g., POC of all reference pictures of the current picture is less than POC of the current picture) and possibly predicts a motion vector MVy of each sub-CU using a motion vector MVx (e.g., a motion vector corresponding to reference picture list X) (e.g., X equals 0 or 1, and Y equals 1-X).
Spatial-temporal motion vector prediction (STMVP)
In the STMVP method, the motion vectors of the sub-CUs are derived recursively following a raster scan order. Fig. 7 shows an example of one CU with four sub-blocks and neighboring blocks. Consider an 8 × 8 CU 700, which includes four 4 × 4 sub-CUs, a (701), B (702), C (703), and D (704). The neighboring 4 × 4 blocks in the current frame are labeled a (711), b (712), c (713), and d (714).
The motion derivation of sub-CU a begins by identifying its two spatial neighborhoods (neighbors). The first neighborhood is the nxn block (block c 713) above the sub-CU a 701. If this block c (713) is not available or intra coded, the other nxn blocks above the sub-CU a (701) are checked (from left to right, starting from block c 713). The second neighborhood is the block to the left of sub-CU a701 (block b 712). If block b (712) is not available or intra-coded, the other blocks to the left of sub-CU a701 are examined (from top to bottom, starting from block b 712). The motion information obtained from the neighboring blocks for each list is scaled to the first reference frame for the given list. Next, the Temporal Motion Vector Prediction (TMVP) of sub-block a701 is derived by following the same process as the TMVP as specified in HEVC. The motion information of the collocated block at block D704 is retrieved and scaled accordingly. Finally, after extracting and scaling the motion information, all available motion vectors are averaged separately for each reference list. The averaged motion vector is assigned as the motion vector of the current sub-CU.
Bidirectional optical flow (BIO)
The bi-directional optical flow (BIO) approach is a sample-wise motion refinement over block-wise motion compensation for bi-directional prediction. In some implementations, sample-level motion refinement does not use signaling.
Make I(k)Luminance values from reference k (k 0, 1) after block motion compensation, and
Figure BDA0002189776630000081
are respectively I(k)The horizontal and vertical components of the gradient. Motion vector field (v) assuming that the optical flow is validx,vy) Given by:
Figure BDA0002189776630000082
the optical flow equation is compared withHermite interpolation for the motion trajectory of each sample combines to obtain a unique third order polynomial that matches the function value I(k)And derivatives at the ends
Figure BDA0002189776630000083
And both. The value of the polynomial at t-0 is the BIO prediction:
Figure BDA0002189776630000084
FIG. 8 illustrates an exemplary optical flow trace in a bi-directional optical flow (BIO) method. Here, τ0And τ1Refers to the distance from the reference frame. Based on Ref0And Ref1POC of (2) calculating the distance τ0And τ1:τ0POC (current) -POC (Ref)0),τ1=POC(Ref1) -POC (current). If the two predictions are from the same time direction (both from the past or from the future), then the sign is different (e.g., τ0·τ1< 0). In this case, if the predictions are not from the same time instant (e.g., τ)0≠τ1) BIO is applied. Both reference regions have non-zero motion (e.g., MVx)0,MVy0,MVx1,MVy1Not equal to 0) and the block motion vector is proportional to the temporal distance (e.g., MVx)0/MVx1=MVy0/MVy1=-τ01)。
Determining a motion vector field (v) by minimizing the difference Δ between the values in points A and Bx,vy). Fig. 9A-9B illustrate examples of the intersection of a motion trajectory and a reference frame plane. For Δ, the model uses only the first linear term of the local Taylor (Taylor) expansion:
Figure BDA0002189776630000091
all values in the above equation depend on the sample position, denoted as (i ', j'). Assuming that the motion is consistent in the local surrounding area, Δ may be minimized within a (2M +1) × (2M +1) square window Ω centered on the current predicted point (i, j), where M equals 2:
Figure BDA0002189776630000092
for this optimization problem, JEM uses a simplified approach, first minimizing in the vertical direction and then in the horizontal direction. This results in the following:
Figure BDA0002189776630000093
Figure BDA0002189776630000094
wherein,
Figure BDA0002189776630000095
Figure BDA0002189776630000096
Figure BDA0002189776630000097
to avoid division by zero or a small value, the regularization parameters r and m may be introduced into equations (7) and (8).
r=500·4d-8Equation (10)
m=700·4d-8Equation (11)
Here, d is the bit depth of the video sample.
To make memory access like BIO identical to conventional bi-predictive motion compensation, the overall prediction and gradient values I are computed for the position within the current block(k)
Figure BDA0002189776630000101
FIG. 9A illustrates access locations outside of block 900Examples of (2). As shown in fig. 9A, in equation (9), (2M +1) × (2M +1) square window Ω centered on the currently predicted point on the boundary of the predicted block needs to access a position outside the block. In JEM, I outside the block(k)
Figure BDA0002189776630000102
Is set equal to the nearest available value within the block. This may be accomplished, for example, as filling area 901, as shown in fig. 9B.
Using BIO, the motion field can be refined for each sample. To reduce computational complexity, the BIO of block-based designs is used in JEM. Motion refinement may be calculated based on 4x4 blocks. In block-based BIO, s in equation (9) for all samples in a 4 × 4 blocknCan be aggregated, and then snIs used to derive the BIO motion vector offset for the 4x4 block. More specifically, the following formula may be used for block-based BIO derivation:
Figure BDA0002189776630000103
Figure BDA0002189776630000104
Figure BDA0002189776630000105
here, bkRefers to the set of samples belonging to the kth 4x4 block of the predicted block. S in equations (7) and (8)nQuilt ((s)n,bk) > 4) to derive the associated motion vector offset.
In some cases, MV cliques (regions) of the BIO may be unreliable due to noise or irregular motion. Thus, in BIO, the size of the MV cluster is clipped to the threshold. The threshold is determined based on whether all of the reference pictures of the current picture come from one direction. For example, if all reference pictures of the current picture are from one direction, the value of the threshold is set to 12 × 214-d(ii) a Otherwise, it is set to 12 × 213-d
The gradient of the BIO may be calculated using operations consistent with the HEVC motion compensation process (e.g., 2D separable Finite Impulse Response (FIR)) while simultaneously with motion compensated interpolation. In some embodiments, the input to the 2D separable FIR is the same reference frame sample as the motion compensation process and fractional position (fracX, fracY) from the fractional portion of the block motion vector. For horizontal gradients
Figure BDA0002189776630000106
The signal is first interpolated vertically using BIOfilters, which corresponds to the fractional position fracY with the de-scaling displacement d-8. The gradient filter BIOfiltG is then applied in the horizontal direction corresponding to the fractional position fracX with the de-scaling displacement 18-d. For vertical gradients
Figure BDA0002189776630000111
The gradient filter is applied vertically using the BIOfiltER G, corresponding to the fractional position fracY with the de-scaling displacement d-8. Then the signal permutation is performed in the horizontal direction using biafflters, corresponding to the fractional position fracX with the de-scaling displacement 18-d. The length of the interpolation filter of the gradient calculation, bianterg and the signal permutation, bianterf, may be shorter (e.g. 6-tap) in order to maintain reasonable complexity. Table 1 shows exemplary filters that may be used for gradient calculations for different fractional positions of block motion vectors in a BIO. Table 2 shows exemplary interpolation filters that may be used for prediction signal generation in BIO.
Table 1: exemplary Filter for gradient computation in BIO
Fractional pixel position Gradient interpolation filter (BIOfilterg)
0 {8,-39,-3,46,-17,5}
1/16 {8,-32,-13,50,-18,5}
1/8 {7,-27,-20,54,-19,5}
3/16 {6,-21,-29,57,-18,5}
1/4 {4,-17,-36,60,-15,4}
5/16 {3,-9,-44,61,-15,4}
3/8 {1,-4,-48,61,-13,3}
7/16 {0,1,-54,60,-9,2}
1/2 {-1,4,-57,57,-4,1}
Table 2: exemplary interpolation Filter for prediction Signal Generation in BIO
Fractional pixel position Interpolation filter (BIO) for prediction signalsfilterS)
0 {0,0,64,0,0,0}
1/16 {1,-3,64,4,-2,0}
1/8 {1,-6,62,9,-3,1}
3/16 {2,-8,60,14,-5,1}
1/4 {2,-9,57,19,-7,2}
5/16 {3,-10,53,24,-8,2}
3/8 {3,-11,50,29,-9,2}
7/16 {3,-11,44,35,-10,3}
1/2 {3,-10,35,44,-11,3}
In JEM, when the two predictions are from different reference pictures, the BIO may be applied to all bi-predicted blocks. When Local Illumination Compensation (LIC) is enabled for a CU, the BIO may be disabled.
In some embodiments, OBMC is applied to the block after the normal MC process. To reduce computational complexity, BIO may not be applied during the OBMC process. This means that during the OBMC process, when its own MV is used, BIO is applied in the MC process of the block, and when the MV of the neighboring block is used, BIO is not applied in the MC process.
Frame Rate Up Conversion (FRUC)
When the Merge flag of a CU is true, the FRUC flag may be signaled to the CU. When the FRUC flag is false, the Merge index may be signaled and the normal Merge mode is used. When the FRUC flag is true, an additional FRUC mode flag may be signaled to indicate which method (e.g., bilateral matching or template matching) is to be used to derive motion information for the block.
At the encoder side, the decision whether to use FRUC Merge mode for a CU is based on RD cost selection, as done for normal Merge candidates. For example, multiple matching patterns (e.g., bilateral matching and template matching) are checked for CUs by using RD cost selection. The one pointing to the minimum cost is further compared to other CU patterns. If the FRUC matching pattern is the one with the highest efficiency, the FRUC flag is set to true for the CU and the associated matching pattern is used.
Typically, the motion derivation process in FRUC Merge mode has two steps: CU-level motion search is performed first, followed by sub-CU-level motion refinement. At the CU level, an initial motion vector is derived for the entire CU based on bilateral matching or template matching. First, a list of MV candidates is generated and the candidate pointing to the smallest matching cost is selected as the starting point for further CU-level refinement. A local search based on bilateral matching or template matching is then performed around the starting point. The MV that results in the smallest matching cost is taken as the MV of the entire CU. Subsequently, the motion information is further refined at the sub-CU level using the derived CU motion vector as a starting point.
For example, the following derivation process is performed for W × H CU motion information derivation. In the first stage, the MVs of the entire W × HCU are derived. In the second stage, the CU is further subdivided into M × M sub-CUs. The value of M is calculated as in (16), D is a predetermined division depth, which is set to 3 by default in JEM. The MV of each sub-CU is then derived.
Figure BDA0002189776630000131
Fig. 10 shows an example of bilateral matching used in a Frame Rate Up Conversion (FRUC) method. Bilateral matching is used to derive motion information of a current CU by finding a closest match between two blocks along a motion trajectory of the current CU (1000) in two different reference pictures (1010,1011). Under the assumption of a continuous motion trajectory, the motion vectors MV0(1001) and MV1(1002) pointing to the two reference blocks are proportional to the temporal distance between the current picture and the two reference pictures, e.g., TD0(1003) and TD1 (1004). In some embodiments, the bilateral matching becomes bidirectional MV based mirroring when the current picture 1000 is temporally between two reference pictures (1010,1011) and the temporal distances from the current picture to the two reference pictures are the same.
Fig. 11 illustrates an example of template matching used in the FRUC method. Template matching may be used to derive motion information for the current CU 1100 by finding the closest match between a template in the current picture (e.g., an upper and/or left neighboring block of the current CU) and a block in the reference picture 1110 (e.g., the same size as the template). In addition to the FRUC Merge mode described above, template matching may also be applied to AMVP mode. In both JEM and HEVC, AMVP has two candidates. Using a template matching method, new candidates can be derived. If the newly derived candidate by template matching is different from the first existing AMVP candidate, it is inserted into the very beginning of the AMVP candidate list and then the list size is set to 2 (e.g., by removing the second existing AMVP candidate). When applied to AMVP mode, only CU level search is applied.
MV candidates set at the CU level may include the following: (1) the current CU is the original AMVP candidate if it is in AMVP mode, (2) all Merge candidates, (3) some MVs in the interpolated MV field (described later) and the upper and left neighboring motion vectors.
When using bilateral matching, each valid MV of the Merge candidate may be used as an input to generate MV pairs under the assumption of bilateral matching. For example, one valid MV for the Merge candidate is at reference list a (MVa, ref)a). Then, find its paired double-edge in other reference list BReference picture ref of MVbSo that refaAnd refbTemporally on different sides of the current picture. If such a ref in list B is referencedbIf not, then refbIs determined as being equal to refaDifferent references and its temporal distance to the current picture is the smallest one in list B. In the determination of refbThen, based on the current picture and refa、refbThe temporal distance between them, MVb is derived by scaling MVa.
In some implementations, four MVs from the interpolated MV field may also be added to the CU level candidate list. More specifically, interpolation MVs at positions (0,0), (W/2,0), (0, H/2), and (W/2, H/2) of the current CU are added. When FRUC is applied to AMVP mode, the original AMVP candidate is also added to the CU-level MV candidate set. In some implementations, at the CU level, 15 MVs for AMVP CUs and 13 MVs for Merge CUs may be added to the candidate list.
The MV candidate set at the sub-CU level includes (1) MVs determined from the CU level search, (2) top, left, top-left, and top-right neighboring MVs, (3) scaled versions of collocated MVs from the reference picture, (4) one or more ATMVP candidates (e.g., up to four), and (5) one or more STMVP candidates (e.g., up to four). The scaled MVs from the reference pictures are derived as follows. Reference pictures in both lists are traversed. The MVs at collocated positions of the sub-CUs in the reference picture are scaled to the reference of the starting CU-level MV. The ATMVP and STMVP candidates may be the first four. At the sub-CU level, one or more MVs (e.g., up to 17) are added to the candidate list.
Generation of interpolated MV fields
Before encoding a frame, an interpolation motion field is generated for the whole picture based on one-sided ME. The motion field may then be used as a CU-level or sub-CU-level MV candidate afterwards.
In some embodiments, the motion field for each reference picture in the two reference lists is traversed at the 4x4 block level. Fig. 12 shows an example of unilateral Motion Estimation (ME)1200 in the FRUC approach. For each 4x4 block, if the motion associated with the block passes through a 4x4 block in the current picture (as shown in fig. 12) and the block has not been assigned any interpolated motion, the motion of the reference block is scaled to the current picture according to temporal distances TD0 and TD1 (in the same way as MV scaling of TMVP in HEVC) and the scaled motion is assigned to the block in the current frame. If no scaled MV are assigned to a 4x4 block, the motion of the block is marked as unavailable in the interpolated motion field.
Interpolation and matching costs
When the motion vector points to a fractional sample position, motion compensated interpolation is required. To reduce complexity, bilinear interpolation may be used for both bilateral matching and template matching instead of conventional 8-tap (tap) HEVC interpolation.
The matching cost is calculated slightly differently at different steps. When selecting candidates from the candidate set at the CU level, the matching cost may be the absolute sum and difference (SAD) of the bilateral matching or the template matching. After determining the starting MV, the matching cost C of the bilateral match of the sub-CU level search is calculated as follows:
Figure BDA0002189776630000151
here, w is a weighting factor. In some embodiments, w may be empirically set to 4. MV and MVsIndicating the current MV and the starting MV, respectively. SAD may still be used as the matching cost for template matching for sub-CU level searches.
In FRUC mode, the MV is derived by using only the luma samples. The derived motion will be used for both luma and chroma for MC inter prediction. After the MV is decided, the final MC is performed using an 8-tap (tap) interpolation filter for luminance and a 4-tap interpolation filter for chrominance.
MV refinement is a MV search based style with criteria of bilateral matching cost or template matching cost. In JEM, two search styles are supported-an Unrestricted Central Biased Diamond Search (UCBDS) and an adaptive Cross search (adaptive cross search) for MV refinement at the CU level and sub-CU level, respectively. For CU and sub-CU level MV refinement, MV is searched directly with quarter luma sample MV precision, and then eighth luma sample MV refinement. The search range for MV refinement for the CU and sub-CU steps is set equal to 8 luma samples.
In the bilateral matching Merge mode, bi-prediction is applied because the motion information of a CU is derived based on the closest match between two blocks along the motion trajectory of the current CU in two different reference pictures. In template matching Merge mode, the encoder may select among unidirectional prediction from list0, unidirectional prediction from list1, or bi-directional prediction for a CU. The selection may be based on the template matching cost, as follows:
if costBi & gt factor & ltmin (cost0, cost1)
Then bi-directional prediction is used;
otherwise, if cost0< ═ cost1
Then use the one-way prediction from list 0;
if not, then,
using one-way prediction from list 1;
here, cost0 is the SAD of the list0 template match, cost1 is the SAD of the list1 template match, and cost bi is the SAD of the bi-predictive template match. For example, when the value of the factor (factor) is equal to 1.25, this means that the selection process is biased towards bi-directional prediction. Inter prediction direction selection may be applied to the CU-level template matching process.
The sub-block-based prediction techniques discussed above may be used to obtain more accurate motion information for each sub-block when the size of the sub-block is small. However, smaller sub-blocks result in higher bandwidth requirements in motion compensation. On the other hand, the motion information derived for smaller sub-blocks may be inaccurate, especially when there is some noise in the block. Therefore, having a fixed sub-block size within a block may be sub-optimal.
This document describes techniques that may be used in various embodiments to address the bandwidth and precision issues introduced by fixed sub-block sizes using non-uniform and/or variable sub-block sizes. These techniques (also called interleaved prediction) use a different way of subdividing blocks so that motion information can be acquired more robustly without increasing bandwidth consumption.
The block is subdivided into sub-blocks in one or more subdivision modes using an interleaved prediction technique. The subdivision pattern indicates the manner in which the block is subdivided into sub-blocks, including the size of the sub-blocks and the location of the sub-blocks. For each subdivision pattern, a corresponding prediction block may be generated by deriving motion information for each sub-block based on the subdivision pattern. Thus, in some embodiments, multiple prediction blocks may be generated by multiple subdivision patterns, even for one prediction direction. In some embodiments, only one subdivision pattern may be applied for each prediction direction.
Fig. 13 illustrates an example of interleaved prediction with two subdivision patterns in accordance with the techniques of this disclosure. The current block 1300 may be subdivided into a plurality of patterns. For example, as shown in fig. 13, the current block is subdivided into both a pattern 0(1301) and a pattern 1 (1302). Generating two prediction blocks P0(1303) And P1(1304). Can be calculated by P0(1303) And P1(1304) To generate a final prediction block P (1305) of the current block 1300.
More generally, given X subdivision patterns, X prediction blocks, denoted P, for a current block may be generated by sub-block based prediction with X subdivision patterns0,P1,......,Px-1. The final prediction of the current block, denoted P, may be generated as
Figure BDA0002189776630000161
Here, (x, y) is the coordinate of the pixel in the block, and wi(x, y) is PiThe weight value of (2). By way of example and not limitation, the weights may be expressed as:
Figure BDA0002189776630000171
n is a non-negative value. Alternatively, the bit shift operation in equation (16) may also be expressed as:
Figure BDA0002189776630000172
the sum of weights as a power of 2 allows the weighted sum P to be calculated more efficiently by performing a bit shift operation rather than floating-point division.
The subdivision patterns may have different shapes, sizes or locations of the sub-blocks. In some embodiments, the subdivision patterns may comprise irregular sub-block sizes. Fig. 14A-14G show several examples of subdivision patterns for 16 x 16 blocks. In fig. 14A, a block is subdivided into 4x4 sub-blocks according to the disclosed technique. This pattern is also used in JEM. Fig. 14B illustrates an example of subdividing a block into 8 x 8 sub-blocks in accordance with the techniques of this disclosure. Fig. 14C illustrates an example of subdividing a block into 8 x4 sub-blocks in accordance with the disclosed techniques. Fig. 14D illustrates an example of subdividing a block into 4x 8 sub-blocks in accordance with the techniques of this disclosure. In fig. 14E, a portion of a block is subdivided into 4x4 sub-blocks in accordance with the techniques of this disclosure. The pixels at the block boundaries are subdivided into smaller sub-blocks having a size such as 2 × 4, 4 × 2 or 2 × 2. Some sub-blocks may be combined to form larger sub-blocks. Fig. 14F shows an example of neighboring sub-blocks, such as a 4x4 sub-block and a 2 x4 sub-block, which are merged to form a larger sub-block having a size such as 6 x4, 4x 6, or 6 x 6. In fig. 14G, a portion of a block is subdivided into 8 x 8 sub-blocks. The pixels at the block boundaries are subdivided into smaller sub-blocks having a size such as 8 x4, 4x 8 or 4x 4.
The shape and size of the sub-blocks in the sub-block based prediction may be determined based on the shape and/or size of the coding block and/or coding block information. For example, in some embodiments, when the current block has a size of M × N, the sub-block has a size of 4 × N (or 8 × N, etc.). That is, the sub-block has the same height as the current block. In some embodiments, when the current block has a size of M × N, the sub-block has a size of M × 4 (or M × 8, etc.). That is, the sub-block has the same width as the current block. In some embodiments, when the current block has a size of M × N (where M > N), the sub-block has a size of a × B, where a > B (e.g., 8 × 4). Alternatively, the sub-blocks may have a size of B × a (e.g., 4 × 8).
In some embodiments, the current block has a size of M × N. Sub-blocks have a size of a × B when M × N < ═ T (or Min (M, N) < = T, or Max (M, N) < = T, etc.), and a size of C × D when M × N > T (or Min (M, N) > T, or Max (M, N) > T, etc.), where a < ═ C and B < ═ D. For example, if M × N < ═ 256, the sub-block may be 4 × 4 in size. In some implementations, the sub-blocks have a size of 8 x 8.
In some embodiments, whether to apply interleaved prediction may be determined based on the inter prediction direction. For example, in some embodiments, interleaved prediction may be applied to bi-directional prediction, but not to uni-directional prediction. As another example, when applying multiple hypotheses, when there is more than one reference block, the interleaved prediction may be applied to one prediction direction.
In some embodiments, how to apply the interleaved prediction may also be determined based on the inter prediction direction. In some embodiments, a bi-prediction block with sub-block based prediction is subdivided into sub-blocks in two different subdivision patterns for two different reference lists. For example, when predicting from the reference list 0(L0), the bi-prediction block is subdivided into 4 × 8 sub-blocks, as shown in fig. 14D. When predicted from reference list 1(L1), the same block is subdivided into 8 × 4 sub-blocks, as shown in fig. 14C. The final prediction P is calculated as
Figure BDA0002189776630000181
Here, P0And P1Predictions from L0 and L1, respectively. w is a0And w1Weight values of L0 and L1, respectively. As shown in equation (16), the weight value may be determined as: w is a0(x,y)+w1(x,y)=1<<N (where N is a non-negative integer value). Because fewer sub-blocks are used for prediction in each direction (e.g., 4x 8 sub-blocks compared to 8 x 8 sub-blocks), the computation requires less bandwidth than existing sub-block based approaches. By using larger sub-blocks, the prediction is also less susceptible to noise interference.
In some embodiments, a uni-directional prediction block with sub-block based prediction is subdivided into sub-blocks in two or more different subdivision patterns for the same reference list. For example, list L (L ═ L)Prediction of P0 or 1)LIs calculated as
Figure BDA0002189776630000182
Where XL is the number of subdivision styles for list L.
Figure BDA0002189776630000183
Is generated by prediction using the ith subdivision pattern, an
Figure BDA0002189776630000184
Is that
Figure BDA0002189776630000185
The weight value of (2). For example, when XL is 2, two subdivision styles are applied to list L. In the first subdivision pattern, the block is subdivided into 4 × 8 sub-blocks, as shown in fig. 14D. In the second subdivision pattern, the block is subdivided into 8 × 4 sub-blocks as shown in fig. 14D.
In one embodiment, a bi-prediction block with sub-block based prediction is considered a combination of two uni-prediction blocks from L0 and L1, respectively. The prediction from each list can be derived as described in the above examples. The final prediction P can be calculated as
Figure BDA0002189776630000191
Here, the parameters a and b are two additional weights applied to the two intra prediction blocks. In this specific example, a and b may both be set to 1. Similar to the example above, because fewer sub-blocks are used in each direction for prediction (e.g., 4 × 8 sub-blocks compared to 8 × 8 sub-blocks), bandwidth usage is better or on par with existing sub-block based approaches. At the same time, the prediction results may be improved by using larger sub-blocks.
In some embodiments, a single non-uniform pattern may be used in each unidirectional prediction block. For example, for each list L (e.g., L0 or L1), the tiles are divided into different styles (e.g., as shown in fig. 14E or 14F). Using a smaller number of sub-blocks reduces the bandwidth requirements. The non-uniformity of the sub-blocks also increases the robustness of the prediction results.
In some embodiments, for a multi-hypothesis coding block, for each prediction direction (or reference picture list), there may be more than one prediction block generated by different subdivision patterns. Multiple prediction blocks may be used to generate a final prediction with additional weights applied. For example, the additional weight may be set to 1/M, where M is the total number of generated prediction blocks.
In some embodiments, the encoder may determine whether and how to apply interleaved prediction. The encoder may then send information corresponding to the determination to the decoder at a sequence level, picture level, view level, slice level, Coding Tree Unit (CTU) (also referred to as maximum coding unit (LCU)) level, CU level, PU level, Tree Unit (TU) level, slice (tile) level, slice group level, or region level (which may include multiple CUs/PUs/TU/LCUs). The information may be signaled in a Sequence Parameter Set (SPS), a View Parameter Set (VPS), a Picture Parameter Set (PPS), a Slice Header (SH), a picture header, a sequence header, or the first block of a slice level or a group of slices level, CTU/LCU, CU, PU, TU, or region.
In some implementations, the interleaved prediction is applicable to existing sub-block methods, such as affine prediction, ATMVP, STMVP, FRUC, or BIO. In this case, no additional signaling cost is required. In some implementations, the new sub-block Merge candidates generated by the interleaved prediction may be inserted into a Merge list, e.g., interleaved prediction + ATMVP, interleaved prediction + STMVP, interleaved prediction + FRUC, etc. In some implementations, a flag may be signaled to indicate whether to use interleaved prediction. In one example, if the current block is affine inter coded, a flag is signaled to indicate whether to use interleaved prediction. In some implementations, if the current block is affine Merge encoded and unidirectional prediction is applied, a flag may be signaled to indicate whether to use interleaved prediction. In some implementations, if the current block is affine Merge encoded, a flag may be signaled to indicate whether to use interleaved prediction. In some implementations, interleaved prediction may always be used if the current block is affine Merge encoded and unidirectional prediction is applied. In some implementations, if the current block is affine Merge encoded, then interleaved prediction may always be used.
In some implementations, the flag indicating whether to use interleaved prediction may be inherited (inherit) without signaling. Some examples include:
(i) in one example, if the current block is affine Merge encoded, inheritance may be used.
(ii) In one example, the flags may be inherited from flags of neighboring blocks that inherit the affine model.
(iii) In one example, the flag is inherited from a predetermined neighboring block, such as a left or upper neighboring block.
(iv) In one example, the flag may be inherited from a first encountered affine coded neighboring block.
(v) In one example, if no neighboring blocks are affine coded, the flag may be inferred to be zero.
(vi) In one example, the flag may be inherited only when unidirectional prediction is applied to the current block.
(vii) In one example, the flag may be inherited only if the current block and its neighboring block to be inherited are in the same CTU.
(viii) In one example, the flag may be inherited only if the current block and its neighboring block to be inherited are in the same CTU row.
(ix) In one example, when the affine model is derived from the temporal neighboring blocks, the flag may not be inherited from the flags of the neighboring blocks.
(x) In one example, the flags may not be inherited from flags of neighboring blocks that are not located in the same LCU or LCU row or video data processing unit (such as 64 x 64 or 128 x 128).
(xi) In one example, how the flag is signaled and/or derived may depend on the block dimension and/or coding information of the current block.
In some implementations, if the reference picture is a current picture, then interleaved prediction is not applied. For example, if the reference picture is a current picture, a flag indicating whether to use interleaved prediction is not signaled.
In some embodiments, the subdivision pattern to be used by the current block may be derived based on information from spatial and/or temporal neighboring blocks. For example, rather than relying on the encoder to signal the relevant information, both the encoder and decoder may employ a set of predetermined rules to obtain the subdivision pattern based on temporal adjacency (e.g., the subdivision pattern of the same block previously used) or spatial adjacency (e.g., the subdivision pattern used by neighboring blocks).
In some embodiments, the weight value w may be fixed. For example, all subdivision patterns may be weighted equally: w is ai(x, y) is 1. In some embodiments, the weight values may be determined based on the location of the blocks and the subdivision pattern used. For example, for different (x, y), wi(x, y) may be different. In some embodiments, the weight values may also depend on the sub-block prediction based encoding technique (e.g., affine or ATMVP) and/or other encoding information (e.g., skip or non-skip mode, and/or MV information).
In some embodiments, the encoder may determine the weight values and send the values to the decoder in a sequence level, picture level, slice level, CTU/LCU level, CU level, PU level, or region level (which may include multiple CUs/PUs/Tu/LCUs)). The weight value may be signaled in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a Slice Header (SH), a CTU/LCU, a CU, a PU, or a first block of a region. In some embodiments, the weight values may be derived from weight values of spatially and/or temporally neighboring blocks.
Note that the interleaved prediction techniques disclosed herein may be applied to one, some, or all of the encoding techniques in sub-block based prediction. For example, the interleaved prediction technique may be applied to affine prediction, while other coding techniques based on prediction of sub-blocks (e.g., ATMVP, STMVP, FRUC, or BIO) do not use interleaved prediction. As another example, affine, ATMVP, and STMVP all apply the interleaved prediction techniques disclosed herein.
Exemplary embodiments with partial interleaving
In some embodiments, partial interleaved prediction may be implemented as follows.
In some embodiments, interleaved prediction is applied to a portion of the current block. The prediction samples at some locations are computed as a weighted sum of two or more sub-block based predictions. The prediction samples at other locations are not used for weighted sum. For example, these prediction samples are copied from the subblock-based prediction with a particular subdivision pattern.
In some embodiments, the current block is predicted by sub-block-based prediction P1 and P2 having a subdivision pattern D0 and a subdivision pattern D1, respectively. The final prediction calculation is P-w 0 × P0+ w1 × P1. In some positions, w0 ≠ 0 and w1 ≠ 0. But at some other positions w 0-1 and w 1-0, i.e. no interleaved prediction is applied at these positions.
In some embodiments, interleaved prediction is not applied on the four corner sub-blocks, as shown in fig. 15A.
In some embodiments, interleaved prediction is not applied to the left-most column and right-most column of sub-blocks, as shown in fig. 15B.
In some embodiments, interleaved prediction is not applied to the subblocks of the top-most row and the bottom-most row, as shown in fig. 15C.
In some embodiments, interleaved prediction is not applied to the sub-blocks of the top-most row, bottom-most row, left-most column, and right-most column, as shown in fig. 15D.
In some embodiments, whether and how partial interleaved prediction is applied may depend on the size/shape of the current block.
For example, in some embodiments, if the size of the current block satisfies a certain condition, the interleaving prediction is applied to the entire block; otherwise, interleaved prediction is applied to a portion (or portions) of the block. Conditions include, but are not limited to: (assuming that the width and height of the current block are W and H, respectively, and T, T1, T2 are integer values):
w > -T1 and H > -T2;
w < ═ T1 and H < ═ T2;
w > -T1 or H > -T2;
w < ═ T1 or H < ═ T2;
W+H>=T
W+H<=T
W×H>=T
W×H<=T
in some embodiments, the partially interleaved prediction is applied to portions of the current block that are smaller than the current block. For example, in some embodiments, a portion of a block excludes sub-blocks as follows. In some embodiments, if W ≧ H, interleaved prediction is not applied to the leftmost column and rightmost column of subblocks as shown in FIG. 15B; otherwise, the interleaved prediction is not applied to the subblocks of the topmost row and the bottommost row as shown in fig. 15C.
For example, in some embodiments, if W > H, then interleaved prediction is not applied to the leftmost column and rightmost column of sub-blocks as shown in fig. 15B; otherwise, the interleaved prediction is not applied to the subblocks of the topmost row and the bottommost row as shown in fig. 15C.
In some embodiments, whether and how interleaving prediction is applied may be different for different regions in a block. For example, it is assumed that the current block is predicted through sub-block-based prediction P1 and P2 having a subdivision mode D0 and a subdivision mode D1, respectively. The final prediction was calculated as P (x, y) ═ w0 × P0(x, y) + w1 × P1(x, y). If the position (x, y) belongs to a sub-block of dimension S0 × H0 with a subdivision pattern D0; and belongs to sub-block S1 × H1 with subdivision pattern D1, setting w0 to 1 and w1 to 0 (e.g., no interleaving prediction applied at that position) if one or more of the following conditions are met:
S1<T1;
H1<T2;
s1< T1 and H1< T2; or
S1< T1 or H1< T2,
herein, T1 and T2 are integers. For example, T1-T2-4.
Examples of techniques integrated with encoder embodiments
In some embodiments, interleaved prediction is not applied in the Motion Estimation (ME) process.
For example, interleaved prediction is not applied in the ME process for 6-parameter affine prediction.
For example, if the size of the current block satisfies a certain condition such as the following, the interleaving prediction is not applied in the ME procedure. Here, it is assumed that the width and height of the current block are W and H, respectively, and T, T1, T2 are integer values:
w > -T1 and H > -T2;
w < ═ T1 and H < ═ T2;
w > -T1 or H > -T2;
w < ═ T1 or H < ═ T2;
W+H>=T
W+H<=T
W×H>=T
W×H<=T
for example, if the current block is partitioned from the parent block, and the parent block does not select an affine mode at the encoder, the interleaved prediction is omitted in the ME process.
Alternatively, if the current block is partitioned from the parent block, and the parent block does not select the affine mode at the encoder, the affine mode is not checked at the encoder.
Exemplary embodiments of MV derivation
In the following example, SatShift (x, n) is defined as
Figure BDA0002189776630000231
Shift (x, n) is defined as Shift (x, n) = (x + Shift0) > > n. In one example, shift0 and/or shift1 is set to (1< < n) > >1 or (1< (n-1)). In another example, shift0 and/or shift1 is set to 0.
In some embodiments, the MV of each sub-block in one subdivision pattern may be derived directly from an affine model, such as with equation (1), or it may be derived from the MV of the sub-block within another subdivision pattern.
(a) In one example, the MVs of sub-block B having subdivision pattern 0 may be derived from MVs of some or all of the sub-blocks within subdivision pattern 1 that overlap sub-block B.
(b) FIGS. 16A-16C showSome examples are given. In fig. 16A, the MVs of a particular sub-block within subdivision pattern 1 are to be derived1(x, y). Fig. 16B shows a subdivision pattern 0 (solid line) and a subdivision pattern 1 (dotted line) in a block, indicating that there are four subblocks within subdivision pattern 0 that overlap with a particular subblock within subdivision pattern 1. Fig. 16C shows four MVs: MVs of four sub-blocks within subdivision pattern 0 overlapping a particular sub-block within subdivision pattern 10 (x-2,y-2)、MV0 (x+2,y-2)、MV0 (x-2,y+2)And MV0 (x+2,y+2). Then MV1 (x,y)Will be driven from MV0 (x-2,y-2)、MV0 (x+2,y-2)、MV0 (x-2,y+2)And MV0 (x+2,y+2)And (6) exporting.
(c) Suppose that the MV' of one sub-block within subdivision pattern 1 is from k within subdivision pattern 0+MV0, MV1, MV2, … MVk of 1 sub-block. MV' can be derived as:
(i) MV' is MVn, and n is any number of 0 … k.
(ii) MV' ═ f (MV0, MV1, MV2, …, MVk). f is a linear function.
(iii) MV' ═ f (MV0, MV1, MV2, …, MVk). f is a non-linear function.
(iv) MV' Average (MV0, MV1, MV2, …, MVk). Average is the Average operation.
(v) MV' is Median (MV0, MV1, MV2, …, MVk). Median is an operation to get a Median value.
(vi) MV' ═ Max (MV0, MV1, MV2, …, MVk). Max is the operation to get the maximum value.
(vii) MV' ═ Min (MV0, MV1, MV2, …, MVk). Min is the operation to get the minimum.
(viii) MV' ═ MaxAbs (MV0, MV1, MV2, …, MVk). MaxAbs is the operation that yields the value with the largest absolute value.
(ix) MV ═ MinAbs (MV0, MV1, MV2, …, MVk). Minabs is the operation that results in the value with the smallest absolute value.
(x) Taking FIG. 16A as an example, MV1 (x,y)Can deriveComprises the following steps:
1.MV1 (x,y)=SatShift(MV0(x-2,y-2)+MV0(x+2,y-2)+MV0(x-2,y+2)+MV0(x+2,y+2),2);
2.MV1 (x,y)=Shift(MV0(x-2,y-2)+MV0(x+2,y-2)+MV0(x-2,y+2)+MV0(x+2,y+2),2);
3.MV1 (x,y)=SatShift(MV0(x-2,y-2)+MV0(x+2,y-2),1);
4.MV1 (x,y)=Shift(MV0(x-2,y-2)+MV0(x+2,y-2),1);
5.MV1 (x,y)=SatShift(MV0(x-2,y+2)+MV0(x+2,y+2),1);
6.MV1 (x,y)=Shift(MV0(x-2,y+2)+MV0(x+2,y+2),1);
7.MV1 (x,y)=SatShift(MV0(x-2,y-2)+MV0(x+2,y+2),1);
8.MV1 (x,y)=Shift(MV0(x-2,y-2)+MV0(x+2,y+2),1);
9.MV1 (x,y)=SatShift(MV0(x-2,y-2)+MV0(x-2,y+2),1);
10.MV1 (x,y)=Shift(MV0(x-2,y-2)+MV0(x-2,y+2),1);
11.MV1 (x,y)=SatShift(MV0(x+2,y-2)+MV0(x+2,y+2),1);
12.MV1 (x,y)=Shift(MV0(x+2,y-2)+MV0(x+2,y+2),1);
13.MV1 (x,y)=SatShift(MV0(x+2,y-2)+MV0(x-2,y+2),1);
14.MV1 (x,y)=Shift(MV0(x+2,y-2)+MV0(x-2,y+2),1);
15.MV1 (x,y)=MV0 (x-2,y-2)
16.MV1 (x,y)=MV0 (x+2,y-2)
17.MV1 (x,y)=MV0 (x-2,y+2)(ii) a Or
18.MV1 (x,y)=MV0 (x+2,y+2)
In some embodiments, how the subdivision pattern is selected may depend on the width and height of the current block.
(a) For example, if the width > T1 and the height > T2 (e.g., T1-T2-4), then two subdivision patterns are selected. Fig. 17A shows an example of two subdivision patterns.
(b) For example, if the height < ═ T2 (e.g., T2 ═ 4), two other subdivision patterns are selected. Fig. 17B shows an example of two subdivision patterns.
(c) For example, if the width < ═ T1 (e.g., T1 ═ 4), two more subdivision patterns are selected. Fig. 17C shows an example of two subdivision patterns.
In some embodiments, the MV of each sub-block within one subdivision pattern of one color component C1 may be derived from the MV of the sub-block within another subdivision pattern of another color component C0.
(a) For example, C1 refers to a color component encoded/decoded after another color component, such as Cb or Cr or U or V or R or B.
(b) For example, C0 refers to a color component, such as Y or G, that is encoded/decoded before another color component.
(c) In one example, how to derive MVs of sub-blocks within one subdivision pattern of one color component from MVs of sub-blocks within another subdivision pattern of another color component may depend on the color format, such as 4: 2: 0, or 4: 2: 2, or 4: 4: 4.
(d) in one example, after scaling down or scaling up the coordinates according to the color format, the MVs of sub-block B in color component C1 having a subdivision pattern C1Pt (t 0 or 1) may be derived from the MVs of some or all of the color components C0 within subdivision pattern C0Pr (r 0 or 1) overlapping sub-block B.
(i) In one example, C0Pr is always equal to C0P 0.
(e) Fig. 18A and 18B show two examples. The color format is 4: 2: 0. the MVs of the sub-blocks in the Cb component are derived from the MVs of the sub-blocks in the Y component.
(i) On the left side of fig. 18A, a specific Cb sub-block B within the subdivision pattern 0 is to be derivedMV ofCb0 (x’,y’). The right side of fig. 18A shows four Y sub-blocks within subdivision pattern 0, which are divided by 2: the 1 reduction overlaps Cb subblock B. Assuming that x is 2 x 'and Y is 2Y', four MVs of the four Y sub-blocks within the subdivision pattern 0: MV (Medium Voltage) data base0 (x-2,y-2),MV0 (x+2,y-2),MV0 (x-2,y+2)And MV0 (x+2,y+2)Is used to derive MVsCb0 (x’,y’)
(ii) On the left side of fig. 18B, the MV of a specific Cb subblock B within the subdivision pattern 1Cb0 (x’,y’)To be exported. The right side of fig. 18B shows four Y sub-blocks within subdivision pattern 0, when 2: the 1 reduction overlaps Cb subblock B. Assuming that x is 2 x 'and Y is 2Y', four MVs of the four Y sub-blocks within the subdivision pattern 0: MV (Medium Voltage) data base0 (x-2,y-2),MV0 (x+2,y-2),MV0 (x-2,y+2)And MV0 (x+2,y+2)For deriving MVsCb0 (x’,y’)
(f) It is assumed that the MV' of one sub-block of the color component C1 is derived from MV0, MV1, MV2, … MVk of the k +1 sub-block of the color component C0. MV' can be derived as:
(i) MV' is MVn, and n is any number of 0 … k.
(ii) MV' ═ f (MV0, MV1, MV2, …, MVk). f is a linear function.
(iii) MV' ═ f (MV0, MV1, MV2, …, MVk). f is a non-linear function.
(iv) MV' Average (MV0, MV1, MV2, …, MVk). Average is the Average operation.
(v) MV' is Median (MV0, MV1, MV2, …, MVk). Median is an operation to get a Median value.
(vi) MV' ═ Max (MV0, MV1, MV2, …, MVk). Max is the operation to get the maximum value.
(vii) MV' ═ Min (MV0, MV1, MV2, …, MVk). Min is the operation to get the minimum.
(viii) MV' ═ MaxAbs (MV0, MV1, MV2, …, MVk). MaxAbs is the operation that yields the value with the largest absolute value.
(ix) MV ═ MinAbs (MV0, MV1, MV2, …, MVk). Minabs is the operation that results in the value with the smallest absolute value.
(x) Taking fig. 18A and 18B as an example, MVCbt (x’,y’)Where t is 0 or 1, can be derived as:
1.MVCbt (x’,y’)=SatShift(MV0(x-2,y-2)+MV0(x+2,y-2)+MV0(x-2,y+2)+MV0(x+2,y+2),2);
2.MVCbt (x’,y’)=Shift(MV0(x-2,y-2)+MV0(x+2,y-2)+MV0(x-2,y+2)+MV0(x+2,y+2),2);
3.MVCbt (x’,y’)=SatShift(MV0(x-2,y-2)+MV0(x+2,y-2),1);
4.MVCbt (x’,y’)=Shift(MV0(x-2,y-2)+MV0(x+2,y-2),1);
5.MVCbt (x’,y’)=SatShift(MV0(x-2,y+2)+MV0(x+2,y+2),1);
6.MVCbt (x’,y’)=Shift(MV0(x-2,y+2)+MV0(x+2,y+2),1);
7.MVCbt (x’,y’)=SatShift(MV0(x-2,y-2)+MV0(x+2,y+2),1);
8.MVCbt (x’,y’)=Shift(MV0(x-2,y-2)+MV0(x+2,y+2),1);
9.MVCbt (x’,y’)=SatShift(MV0(x-2,y-2)+MV0(x-2,y+2),1);
10.MVCbt (x’,y’)=Shift(MV0(x-2,y-2)+MV0(x-2,y+2),1);
11.MVCbt (x’,y’)=SatShift(MV0(x+2,y-2)+MV0(x+2,y+2),1);
12.MVCbt (x’,y’)=Shift(MV0(x+2,y-2)+MV0(x+2,y+2),1);
13.MVCbt (x’,y’)=SatShift(MV0(x+2,y-2)+MV0(x-2,y+2),1);
14.MVCbt (x’,y’)=Shift(MV0(x+2,y-2)+MV0(x-2,y+2),1);
15.MVCbt (x’,y’)=MV0 (x-2,y-2)
16.MVCbt (x’,y’)=MV0 (x+2,y-2)
17.MVCbt (x’,y’)=MV0 (x-2,y+2)
18.MVCbt (x’,y’)=MV0 (x+2,y+2)
exemplary embodiments of interleaved prediction for bi-directional prediction
In some embodiments, when interleaved prediction is applied on bi-directional prediction, the following method may be applied to save the increase of internal bit depth due to different weights:
(a) for list X (X ═ 0 or 1), PX(x,y)=Shift(W0(x,y)*PX 0(x,y)+W1(x,y)*PX 1(x, y), SW), wherein PX(X, y) is a prediction of List X, PX 0(x, y) and PX 1(X, y) are the predictions of list X in subdivision pattern 0 and subdivision pattern 1, respectively. W0 and W1 are integers representing interleaved predicted weight values, and SW represents the accuracy of the weight values.
(b) The final predicted value is derived as P (x, y) ═ Shift (Wb0(x, y) × P0(x,y)+Wb1(x,y)*P1(x, y), SWB), where Wb0 and Wb1 are integers used in weighted bi-prediction, and SWB is precision. When there is no weighted bi-prediction, Wb0 Wb1 SWB 1.
(c) In some embodiments, PX 0(x, y) and PX 1(x, y) can be maintained as the accuracy of the interpolation filtering. For example, they may be unsigned integers having 16 bits. The final predicted value is derived as P (x, y) ═ Shift (Wb0(x, y) × P0(x,y)+Wb1(x,y)*P1(x, y), SWB + PB), where PB is the additional accuracy from the interpolation filtering, e.g., PB 6. In this case, W0(x, y) × PX 0(x, y) or W1(x, y) PX 1(x, y) may exceed 16 bits. Propose P toX 0(x, y) and PX 1(x, y) is first right shifted to a lower accuracy toAvoiding more than 16 bits.
(i) For example, for list X (X ═ 0 or 1), PX(x,y)=Shift(W0(x,y)*PLX 0(x,y)+W1(x,y)*PLX 1(x, y), SW), wherein PLX 0(x,y)=Shift(PX 0(x,y),M),PLX 1(x,y)=Shift(PX 1(x, y), M). Then, the final prediction is derived as P (x, y) ═ Shift (Wb0(x, y) × P0(x,y)+Wb1(x,y)*P1(x, y), SWB + PB-M). For example, M is set to 2 or 3.
(d) The above method may also be applied to other Bi-Prediction methods with different weighting factors for two reference Prediction blocks, such as Generalized Bi-Prediction (GBi, where the weights may be e.g. 3/8, 5/8), weighted Prediction (where the weights may be large values).
(e) The above method may also be applied to other multi-hypothesis uni-directional prediction or bi-directional prediction methods with different weight factors for different reference prediction blocks.
The above described embodiments and examples may be implemented in the context of the methods 1900 and 2000 described next.
Fig. 19 is an exemplary flow diagram of a method 1900 for improving motion prediction in a video system in accordance with the disclosed technology. The method 1900 includes, at 1902, selecting a set of pixels from a video frame to form a block. The method 1900 includes subdividing the block into a first set of sub-blocks according to a first pattern at 1904. The method 1900 includes generating a first inter prediction block based on the first set of sub-blocks at 1906. The method 1900 includes subdividing the block into a second set of sub-blocks according to a second pattern at 1908. At least one sub-block in the second set has a different size than the sub-blocks in the first set. The method 1900 includes generating a second intermediate prediction block based on the second set of sub-blocks at 1910. The method 1900 also includes determining a prediction block based on the first inter prediction block and the second inter prediction block, at 1912.
In some embodiments, the first inter-prediction block or the second inter-prediction block is generated using at least one of (1) an affine prediction method, (2) an optional temporal motion vector prediction method, (3) a spatio-temporal motion vector prediction method, (4) a bi-directional optical flow method, or (5) a frame rate up-conversion method.
In some embodiments, the sub-blocks in the first set or the second set have a rectangular shape. In some embodiments, the sub-blocks in the first set of sub-blocks have a non-uniform shape. In some embodiments, the sub-blocks in the second set of sub-blocks have a non-uniform shape.
In some embodiments, the method includes determining the first pattern or the second pattern based on a size of the block. In some embodiments, the method includes determining the first pattern or the second pattern based on information from a second block that is temporally or spatially adjacent to the block.
In some embodiments, the subdivision of the block into the first set of sub-blocks is performed for motion prediction of the block in a first direction. In some embodiments, the block is subdivided into a second set of sub-blocks for motion prediction of the block in a second direction.
In some embodiments, the subdividing of the block into a first set of sub-blocks and the subdividing of the block into a second set of sub-blocks is performed for motion prediction of the block in a first direction. In some embodiments, the method further comprises performing motion prediction of the block in the second direction by: the method may include subdividing a block into a third set of sub-blocks according to a third pattern, generating a third intermediate prediction block based on the third set of sub-blocks, subdividing the block into a fourth set of sub-blocks according to a fourth pattern, wherein at least one sub-block in the fourth set has a different size than the sub-blocks in the third set, generating a fourth intermediate prediction block based on the fourth set of sub-blocks, determining a second prediction block based on the third intermediate prediction block and the fourth intermediate prediction block, and determining the third prediction block based on the prediction block and the second prediction block.
In some embodiments, the method includes transmitting information of the first pattern and the second pattern of the subdivision block to an encoding device in a block-based motion prediction video system. In some embodiments, transmitting the information of the first pattern and the second pattern is performed at one of: (1) the video coding unit comprises (1) a sequence level, (2) a picture level, (3) a view level, (4) a slice level, (5) a coding tree unit, (6) a maximum coding unit level, (7) a coding unit level, (8) a prediction unit level, (10) a tree unit level, or (11) a region level.
In some embodiments, determining the prediction result includes applying a first set of weights to the first inter-prediction block to obtain a first weighted prediction block, applying a second set of weights to the second inter-prediction block to obtain a second weighted prediction block, and calculating a weighted sum of the first weighted prediction block and the second weighted prediction block to obtain the prediction block.
In some embodiments, the first set of weights or the second set of weights comprises a fixed weight value. In some embodiments, the first set of weights or the second set of weights is determined based on information from another block that is temporally or spatially adjacent to the block. In some embodiments, the first set of weights or the second set of weights are determined using a coding algorithm used to generate the first prediction block or the second prediction block. In some implementations, at least one value in the first set of weights is different from another value in the first set of weights. In some implementations, at least one value in the second set of weights is different from another value in the second set of weights. In some implementations, the sum of the weights is equal to a power of 2.
In some embodiments, the method includes transmitting the weights to an encoding device in a block-based motion prediction video system. In some embodiments, the transmission weights are performed at one of: (1) the video coding unit comprises (1) a sequence level, (2) a picture level, (3) a view level, (4) a slice level, (5) a coding tree unit, (6) a maximum coding unit level, (7) a coding unit level, (8) a prediction unit level, (10) a tree unit level, or (11) a region level.
Fig. 2000 is an example flow diagram of a method 2000 for improving block-based motion prediction in a video system in accordance with the disclosed technology. The method 2000 includes selecting a set of pixels from a video frame to form a block at 2002. The method 2000 includes subdividing 2004 a block into a plurality of sub-blocks based on a size of the block or information from another block that is spatially or temporally adjacent to the block. At least one of the plurality of sub-blocks has a different size than the other sub-blocks. The method 2000 further includes generating a motion vector prediction by applying an encoding algorithm to the plurality of sub-blocks, at 2006. In some embodiments, the encoding algorithm includes at least one of (1) an affine prediction method, (2) an optional temporal motion vector prediction method, (3) a spatio-temporal motion vector prediction method, (4) a bi-directional optical flow method, or (5) a frame rate up-conversion method.
In methods 1900 and 2000, partial interleaving may be implemented. Using this scheme, samples in a first subset of prediction samples are calculated as a weighted combination of the first inter prediction block and samples in a second subset of prediction samples are copied from the sub-block based prediction, wherein the first and second subsets are based on a subdivision pattern. The first subset and the second subset together may constitute an entire prediction block, e.g., a block currently being processed. As shown in fig. 15A-15D, in various examples, the second subset excluded from interleaving may consist of (a) corner sub-blocks or (b) top-most and bottom-most rows of sub-blocks or (c) left-most or right-most columns of sub-blocks. The size of the block currently being processed may be used as a condition for deciding whether to exclude certain sub-blocks from the interleaved prediction.
As further described in this document, the encoding process may avoid checking the affine pattern of blocks subdivided from the parent block, where the parent block itself is encoded with a pattern other than the affine pattern.
In some embodiments, a video decoder apparatus may implement a method of video decoding, wherein improved block-based motion prediction as described herein is used for video decoding. The method may include forming a block of video using a set of pixels from a video frame. The block may be subdivided into a first set of sub-blocks according to a first pattern. The first intermediate prediction block may correspond to a first set of sub-blocks. The block may contain a second set of sub-blocks according to a second pattern. At least one sub-block in the second set has a different size than the sub-blocks in the first set. The method may also determine a prediction block based on the first intermediate prediction block and a second intermediate prediction block generated from the second set of sub-blocks. Other features of the method may be similar to the method 1900 described above.
In some embodiments, a decoder-side method of video decoding may use block-based motion prediction for improving video quality by using blocks of a video frame for prediction, where a block corresponds to a set of blocks of pixels. A block may be subdivided into a plurality of sub-blocks based on the size of the block or information from another block that is spatially or temporally adjacent to the block, wherein at least one sub-block of the plurality of sub-blocks has a different size than the other sub-blocks. The decoder may use a motion vector prediction generated by applying a coding algorithm to a plurality of sub-blocks. Other features of the method are described with respect to fig. 2000 and the corresponding description.
Yet another method of video processing includes deriving one or more motion vectors for a first set of sub-blocks of a current video block, wherein each of the first set of sub-blocks has a first subdivision pattern, and reconstructing the current video block based on the one or more motion vectors.
In some embodiments, deriving the one or more motion vectors is based on an affine model.
In some embodiments, deriving the one or more motion vectors is based on motion vectors of one or more of the second set of sub-blocks, each of the second set of sub-blocks having a second subdivision pattern different from the first subdivision pattern, and one or more of the second set of sub-blocks overlapping at least one of the first set of sub-blocks. For example, one or more motion vectors in the first set of sub-blocks contain MVs1The motion vector of one or more of the second set of sub-blocks contains an MV01,MV02,MV03… and MV0KAnd K is a positive integer. In an example, MV1=f(MV01,MV02,MV03,…,MV0K). In another example, f (-) is a linear function. In yet another example, f (-) is a non-linear function. In yet another example, the MV1=average(MV01,MV02,MV03,…,MV0K) And average (·) is an averaging operation. In yet another example, the MV1=median(MV01,MV02,MV03,…,MV0K) And mean (-) is in the calculationAnd (4) operation of a bit value. In yet another example, the MV1=min(MV01,MV02,MV03,…,MV0K) And min (-) is an operation to select the minimum value from the plurality of input values. In yet another example, the MV1=MaxAbs(MV01,MV02,MV03,…,MV0K) And MaxAbs (-) is an operation of selecting the maximum absolute value from a plurality of input values.
In some embodiments, the first set of sub-blocks corresponds to a first color component, the deriving the one or more motion vectors is based on motion vectors of one or more of a second set of sub-blocks, each of the second set of sub-blocks having a second subdivision pattern different from the first subdivision pattern, the second set of sub-blocks corresponding to a second color component different from the first color component. In an example, the first color component is encoded or decoded after the third color component, and wherein the third color component is one of Cr, Cb, U, V, R, or B. In another example, the second color component is encoded or decoded before the third color component, and wherein the third color component is Y or G. In yet another example, deriving the one or more motion vectors is further based on a color format of at least one of the second set of sub-blocks. In yet another example, the color format is 4: 2: 0. 4: 2: 2 or 4: 4: 4.
in some embodiments, the first subdivision pattern is based on the height or width of the current video block.
Fig. 21 is a block diagram of the video processing apparatus 2100. The device 2100 may be used to implement one or more of the methods described herein. The device 2100 may be implemented as a smartphone, tablet computer, internet of things (IoT) receiver, and so on. The device 2100 may include one or more processors 2102, one or more memories 2104, and video processing hardware 2106. The processor(s) 2102 may be configured to implement one or more methods described in this document (including, but not limited to, methods 1900 and 2000). Memory (es) 2104 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 2106 may be used to implement some of the techniques described in this document in hardware circuits.
In some embodiments, the video encoding method may be implemented using a device implemented on a hardware platform as described with respect to fig. 21.
FIG. 22 is an example flow diagram of a method for video processing in accordance with the present technology. The method 2200 includes, at operation 2202, determining a prediction block for the current block during a transition between the current block and the encoded representation of the current block. The prediction block includes a first portion and a second portion. The second portion corresponds to a weighted combination of the first inter-prediction block in which the current block is subdivided into sub-blocks using the first pattern and the second inter-prediction block in which the current block is subdivided into sub-blocks using the second pattern. The method 2200 includes, at operation 2204, generating a current block from the first partition and the second partition.
Fig. 23 is an exemplary flow diagram of a method for video processing in accordance with the present technology. The method 2300 includes, at operation 2302, generating a prediction block for a current block, wherein the prediction block includes a first portion and a second portion. The second portion corresponds to a weighted combination of the first inter-prediction block in which the current block is subdivided into sub-blocks using the first mode and the second inter-prediction block in which the current block is subdivided into sub-blocks using the second mode. The method 2300 includes, at operation 2304, converting the prediction block into an encoded representation in a bitstream.
In some embodiments, the first portion includes a corner sub-block of the current block. In some embodiments, the first portion includes a rightmost column or a leftmost column of sub-blocks. In some embodiments, the first portion includes subblocks of a topmost row or a bottommost row. In some embodiments, the first portion includes the sub-blocks of the top-most row, bottom-most row, left-most column, and right-most column of sub-blocks.
In some embodiments, the first portion is determined based on the size of the current block not satisfying a particular condition. In some embodiments, the width and height of the current block are W and H, respectively, and T, T1, T2 are integer values, and the condition comprises one of: w > -T1 and H > -T2; w < ═ T1 and H < ═ T2; w > -T1 or H > -T2; w < ═ T1 or H < ═ T2; w + H > -T; w + H < ═ T; w × H > -T; or W × H < ═ T. In some embodiments, for a current block having a width greater than or equal to a height, the first portion includes sub-blocks of a leftmost column and a rightmost column of the current block. In some embodiments, for a current block having a height greater than or equal to a width, the first portion includes sub-blocks of a topmost row and a bottommost row of the current block.
In some embodiments, the location in the first portion corresponds to a sub-block of the current block subdivided using the second pattern. The width of the sub-block is S1, the height is H1, and the size of the sub-block satisfies one of: s1< T1, H1< T2, S1< T1 and H1< T2, or S1< T1 or H1< T2, T1 and T2 are integers. In some embodiments, T1-T2-4.
Fig. 24 is a block diagram illustrating an exemplary video processing system 2400 in which various techniques disclosed herein may be implemented. Various implementations may include some or all of the components of system 2400. System 2400 can include an input 2402 for receiving video content. The video content may be received in a raw or uncompressed format, e.g., 8 or 10 bit multi-component pixel values, or may be in a compressed or encoded format. Input 2402 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, Passive Optical Networks (PONs), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.
System 2400 can include a coding component 2404 that can implement various coding or coding methods described in this document. The encoding component 2404 may reduce the average bit rate of the video from the input 2402 to the output of the encoding component 2404 to produce an encoded representation of the video. Thus, the encoding technique is sometimes referred to as a video compression or video transcoding technique. The output of the encoding component 2404 may be stored or transmitted via a connected communication, as represented by component 2406. A stored or communicated bitstream (or encoded) representation of the video received at input 2402 can be used by component 2408 to generate pixel values or displayable video that is sent to display interface 2410. The process of generating user-viewable video from a bitstream representation is sometimes referred to as video decompression. Furthermore, while certain video processing operations are referred to as "encoding" operations or tools, it should be understood that the encoding tools or operations are used at the encoder and the corresponding decoding tools or operations that reverse the encoding results will be performed by the decoder.
Examples of a peripheral bus interface or display interface may include a Universal Serial Bus (USB) or a high-resolution multimedia interface (HDMI) or a Displayport (Displayport), etc. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described in this document may be implemented as various electronic devices, such as mobile phones, laptops, smart phones, or other devices capable of performing digital data processing and/or video display.
From the foregoing it will be appreciated that specific embodiments of the disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the disclosed technology is not to be restricted except in the spirit of the appended claims.
The disclosed and other embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory apparatus, a combination of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. Propagated signaling is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language file), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described, and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims (15)

1. A method of video processing, comprising:
determining, during a transition between a current block and an encoded representation of the current block, a prediction block for the current block, wherein the prediction block includes a first portion and a second portion, the second portion corresponding to a weighted combination of a first inter-prediction block and a second inter-prediction block, subdividing the current block into sub-blocks using a first pattern in the first inter-prediction block and subdividing the current block into sub-blocks using a second pattern in the second inter-prediction block; and
generating the current block from the first portion and the second portion.
2. A method of video processing, comprising:
generating a prediction block for a current block, wherein the prediction block comprises a first portion and a second portion, the second portion corresponding to a weight combination of a first inter-prediction block and a second inter-prediction block, the current block being subdivided into sub-blocks in the first inter-prediction block using a first pattern and the current block being subdivided into sub-blocks in the second inter-prediction block using a second pattern; and
converting the prediction block into an encoded representation in a bitstream.
3. The method of claim 1 or 2, wherein the first portion corresponds to a portion of the current block that is subdivided into sub-blocks using the first pattern.
4. The method of any of claims 1-3, wherein the first portion comprises a corner sub-block of the current block.
5. The method of any of claims 1-4, wherein the first portion comprises a rightmost column or a leftmost column of sub-blocks.
6. The method of any of claims 1-5, wherein the first portion comprises a sub-block of a topmost row or a bottommost row.
7. The method of any of claims 1-6, wherein the first portion includes sub-blocks of a top-most row, a bottom-most row, a left-most column, and a right-most column.
8. The method of any of claims 1-7, wherein the first portion is determined based on a size of the current block not satisfying a particular condition.
9. The method of claim 8 wherein the width and height of the current block are W and H, respectively, and T, T1, T2 are integer values, and wherein the condition comprises one of:
i.W > -T1 and H > -T2;
w < ═ T1 and H < ═ T2;
w > -T1 or H > -T2;
w < ═ T1 or H < ═ T2;
v.W+H>=T;
vi.W+H<=T;
w × H > -T; or
viii.W×H<=T。
10. The method of any of claims 1-9, wherein for the current block having a width greater than or equal to a height, the first portion includes sub-blocks of a leftmost column and a rightmost column of the current block.
11. The method of any of claims 1-10, wherein for the current block having a height greater than or equal to a width, the first portion includes sub-blocks of a topmost and bottommost row of the current block.
12. The method of any of claims 1-11, wherein a location in the first portion corresponds to a sub-block of the current block subdivided using the second pattern, the sub-block having a width of S1 and a height of H1, and a size of the sub-block satisfies one of:
S1<T1
H1<T2
S1<T1and H1<T2Or is or
S1<T1Or H1<T2And T is1And T2Are integers.
13. The method of claim 12, wherein T is1=T2=4。
14. A video processing device comprising a processor configured to implement the method of one or more of claims 1 to 13.
15. A non-transitory computer readable medium comprising computer program code stored thereon for performing the method recited in one or more of claims 1 to 13.
CN201910828124.1A 2018-09-03 2019-09-03 Partially interleaved prediction Active CN110876064B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN2018103770 2018-09-03
CNPCT/CN2018/103770 2018-09-03
CNPCT/CN2018/104984 2018-09-11
CN2018104984 2018-09-11
CNPCT/CN2019/070058 2019-01-02
CN2019070058 2019-01-02

Publications (2)

Publication Number Publication Date
CN110876064A true CN110876064A (en) 2020-03-10
CN110876064B CN110876064B (en) 2023-01-20

Family

ID=68281771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910828124.1A Active CN110876064B (en) 2018-09-03 2019-09-03 Partially interleaved prediction

Country Status (2)

Country Link
CN (1) CN110876064B (en)
WO (1) WO2020049446A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11025951B2 (en) * 2019-01-13 2021-06-01 Tencent America LLC Method and apparatus for video coding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252686A (en) * 2008-03-20 2008-08-27 上海交通大学 Undamaged encoding and decoding method and system based on interweave forecast
CN101491107A (en) * 2006-07-07 2009-07-22 艾利森电话股份有限公司 Video data management
US20100118943A1 (en) * 2007-01-09 2010-05-13 Kabushiki Kaisha Toshiba Method and apparatus for encoding and decoding image
CN101766030A (en) * 2007-07-31 2010-06-30 三星电子株式会社 Use video coding and the coding/decoding method and the equipment of weight estimation
CN108109629A (en) * 2016-11-18 2018-06-01 南京大学 A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101491107A (en) * 2006-07-07 2009-07-22 艾利森电话股份有限公司 Video data management
US20100118943A1 (en) * 2007-01-09 2010-05-13 Kabushiki Kaisha Toshiba Method and apparatus for encoding and decoding image
CN101766030A (en) * 2007-07-31 2010-06-30 三星电子株式会社 Use video coding and the coding/decoding method and the equipment of weight estimation
CN101252686A (en) * 2008-03-20 2008-08-27 上海交通大学 Undamaged encoding and decoding method and system based on interweave forecast
CN108109629A (en) * 2016-11-18 2018-06-01 南京大学 A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHANG,KAI 等: "CE4-related: Interweaved Prediction for Affine Motion Compensation", 《JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 11TH MEETING: LJUBLJANA, SI, 10–18 JULY 2018》, 18 July 2018 (2018-07-18), pages 1 - 2 *
ZHANG,KAI 等: "CE4-related: Simplified Affine Prediction", 《JOINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 11TH MEETING: LJUBLJANA, SI, 10–18 JULY 2018》, 18 July 2018 (2018-07-18) *

Also Published As

Publication number Publication date
WO2020049446A1 (en) 2020-03-12
TW202025726A (en) 2020-07-01
CN110876064B (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN110582000B (en) Improved pattern matched motion vector derivation
CN112913249B (en) Simplified coding and decoding of generalized bi-directional prediction index
CN110677675B (en) Method, device and storage medium for efficient affine Merge motion vector derivation
CN110620923B (en) Generalized motion vector difference resolution
CN110557640B (en) Weighted interleaved prediction
CN112956197A (en) Restriction of decoder-side motion vector derivation based on coding information
CN110740321B (en) Motion prediction based on updated motion vectors
CN110677674B (en) Method, apparatus and non-transitory computer-readable medium for video processing
CN113454999A (en) Motion vector derivation between partition modes
CN110662076A (en) Boundary enhancement of sub-blocks
CN111010570B (en) Affine motion information based size restriction
CN110809164A (en) MV precision in BIO
CN110876063B (en) Fast coding method for interleaving prediction
CN110876064B (en) Partially interleaved prediction
CN110557639B (en) Application of interleaved prediction
TWI850252B (en) Partial interweaved prediction
CN113261281B (en) Use of interleaving predictions
CN113348669B (en) Interaction between interleaving prediction and other codec tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant